Iteration in R

intermediate

Published

July 5, 2024

Previous attendees have said…

9 previous attendees have left feedback
89% would recommend this session to a colleague
89% said that this session was pitched correctly

Three random comments from previous attendees

Great to reinforce knowledge and always pick up a few tips and tricks along the way.
more than enough to spark interest in trying this but some of it clearly melted my brain
Excellent introduction to iteration - easy to follow along during the session

Welcome

this is an 🌶🌶 intermediate-level practical session designed for those with prior R experience

Session outline

what’s a loop?
basic examples

General loop syntax

We can write a bit of code, and execute it repeatedly. Usually, we’ll do this by creating a vector - like c(1,2,3,4) - and looping over that vector. We’d do that by writing the following code:

for (variable in vector) {
  # some bit of code
}

Simple examples

for (i in 1:4) {
  print("hello world!")
}

[1] "hello world!"
[1] "hello world!"
[1] "hello world!"
[1] "hello world!"

for (i in LETTERS[5:10]) {
  print(i)
}

[1] "E"
[1] "F"
[1] "G"
[1] "H"
[1] "I"
[1] "J"

for (i in 4:1) {
  print(i)
}

[1] 4
[1] 3
[1] 2
[1] 1

Collecting output

Loops work best when you collect their output by indexing. So we start off by making an empty output vector of the right size, then assigning into that vector within our loop using indexing:

len <- 1:4

output <- vector("character", length(len))

for (i in len) {
  output[i] <- paste("the number is", i)
}

output

[1] "the number is 1" "the number is 2" "the number is 3" "the number is 4"

It’s also possible to collect output into a data.frame or similar

df_output <- data.frame(num = vector("numeric", length(len)))

for (i in 1:4) {
  df_output$num[i] <- i
}

df_output |>
  knitr::kable()

num
1
2
3
4

Benchmarking and appending output

You should avoid appending output. It’s inefficient owing to R’s copy-on-modify behaviour. We’ll set up a couple of loops in a function, and use microbenchmark to compare what happens with append (which uses c() to append its output) with pre-allocate, which indexes into a vector of the proper length.

library(ggplot2)
library(microbenchmark)

show_diff <- function(len){
  output <- vector("numeric", 0)
  output2 <- vector("numeric", len)

bench <- microbenchmark(append = {
  for (i in 1:len) {
    output <- c(output, i)
                    }
                                  },

pre_allocate = {
  for (i in 1:len) {
    output2[i] <- i
                    }
                }
, times = 20)

autoplot(bench) +
  ggtitle(paste0("Loop from 1:", len))
}

The difference isn’t that much when dealing with small loops:

show_diff(10)

But huge differences result very quickly as the loop length increases:

show_diff(100)

show_diff(500)

`seq_along`

The last tool we’ll look at is seq_along. This function helps make iterable vectors from other R objects. Say you had a tibble/df that you wanted to iterate over to average each column. You could manually figure out the number of columns, then iterate on that calculation:

output <- vector("numeric", length(mtcars))

for(i in 1:length(output)){
  output[i] <- mean(mtcars[,i], na.rm=T)
}

output

 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500

But, you could alternatively use seq_along to generate a neat integer vector to iterate over:

output <- vector("numeric", length(mtcars))

for(i in seq_along(mtcars)){
  output[i] <- mean(mtcars[,i], na.rm=T)
}

output

 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500