Iteration in R

R
intermediate
Published

July 5, 2024

Previous attendees have said…

  • 9 previous attendees have left feedback
  • 89% would recommend this session to a colleague
  • 89% said that this session was pitched correctly

Three random comments from previous attendees
  • Great session on For Loops and how to use them, also discussed why they potentially are not used as much as in i.e. Python
  • A great into to iteration!
  • good introduction to loops - could maybe have had some more complex examples

Welcome

  • this is an 🌶🌶 intermediate-level practical session designed for those with prior R experience

Session outline

  • what’s a loop?
  • basic examples

General loop syntax

We can write a bit of code, and execute it repeatedly. Usually, we’ll do this by creating a vector - like c(1,2,3,4) - and looping over that vector. We’d do that by writing the following code:

for (variable in vector) {
  # some bit of code
}

Simple examples

for (i in 1:4) {
  print("hello world!")
}
[1] "hello world!"
[1] "hello world!"
[1] "hello world!"
[1] "hello world!"
for (i in LETTERS[5:10]) {
  print(i)
}
[1] "E"
[1] "F"
[1] "G"
[1] "H"
[1] "I"
[1] "J"
for (i in 4:1) {
  print(i)
}
[1] 4
[1] 3
[1] 2
[1] 1

Collecting output

Loops work best when you collect their output by indexing. So we start off by making an empty output vector of the right size, then assigning into that vector within our loop using indexing:

len <- 1:4

output <- vector("character", length(len))

for (i in len) {
  output[i] <- paste("the number is", i)
}

output
[1] "the number is 1" "the number is 2" "the number is 3" "the number is 4"

It’s also possible to collect output into a data.frame or similar

df_output <- data.frame(num = vector("numeric", length(len)))

for (i in 1:4) {
  df_output$num[i] <- i
}

df_output |>
  knitr::kable()
num
1
2
3
4

Benchmarking and appending output

You should avoid appending output. It’s inefficient owing to R’s copy-on-modify behaviour. We’ll set up a couple of loops in a function, and use microbenchmark to compare what happens with append (which uses c() to append its output) with pre-allocate, which indexes into a vector of the proper length.

library(ggplot2)
library(microbenchmark)

show_diff <- function(len){
  output <- vector("numeric", 0)
  output2 <- vector("numeric", len)

bench <- microbenchmark(append = {
  for (i in 1:len) {
    output <- c(output, i)
                    }
                                  },

pre_allocate = {
  for (i in 1:len) {
    output2[i] <- i
                    }
                }
, times = 20)

autoplot(bench) +
  ggtitle(paste0("Loop from 1:", len))
}

The difference isn’t that much when dealing with small loops:

show_diff(10)

But huge differences result very quickly as the loop length increases:

show_diff(100)

show_diff(500)

seq_along

The last tool we’ll look at is seq_along. This function helps make iterable vectors from other R objects. Say you had a tibble/df that you wanted to iterate over to average each column. You could manually figure out the number of columns, then iterate on that calculation:

output <- vector("numeric", length(mtcars))

for(i in 1:length(output)){
  output[i] <- mean(mtcars[,i], na.rm=T)
}

output
 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500

But, you could alternatively use seq_along to generate a neat integer vector to iterate over:

output <- vector("numeric", length(mtcars))

for(i in seq_along(mtcars)){
  output[i] <- mean(mtcars[,i], na.rm=T)
}

output
 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500