for (variable in vector) {
# some bit of code
}
Iteration in R
Previous attendees have said…
- 9 previous attendees have left feedback
- 89% would recommend this session to a colleague
- 89% said that this session was pitched correctly
- Great to reinforce knowledge and always pick up a few tips and tricks along the way.
- more than enough to spark interest in trying this but some of it clearly melted my brain
- Excellent introduction to iteration - easy to follow along during the session
Welcome
- this is an 🌶🌶 intermediate-level practical session designed for those with prior R experience
Session outline
- what’s a loop?
- basic examples
General loop syntax
We can write a bit of code, and execute it repeatedly. Usually, we’ll do this by creating a vector - like c(1,2,3,4)
- and looping over that vector. We’d do that by writing the following code:
Simple examples
for (i in 1:4) {
print("hello world!")
}
[1] "hello world!"
[1] "hello world!"
[1] "hello world!"
[1] "hello world!"
for (i in LETTERS[5:10]) {
print(i)
}
[1] "E"
[1] "F"
[1] "G"
[1] "H"
[1] "I"
[1] "J"
for (i in 4:1) {
print(i)
}
[1] 4
[1] 3
[1] 2
[1] 1
Collecting output
Loops work best when you collect their output by indexing. So we start off by making an empty output vector of the right size, then assigning into that vector within our loop using indexing:
<- 1:4
len
<- vector("character", length(len))
output
for (i in len) {
<- paste("the number is", i)
output[i]
}
output
[1] "the number is 1" "the number is 2" "the number is 3" "the number is 4"
It’s also possible to collect output into a data.frame or similar
<- data.frame(num = vector("numeric", length(len)))
df_output
for (i in 1:4) {
$num[i] <- i
df_output
}
|>
df_output ::kable() knitr
num |
---|
1 |
2 |
3 |
4 |
Benchmarking and appending output
You should avoid appending output. It’s inefficient owing to R’s copy-on-modify behaviour. We’ll set up a couple of loops in a function, and use microbenchmark
to compare what happens with append
(which uses c()
to append its output) with pre-allocate
, which indexes into a vector of the proper length.
library(ggplot2)
library(microbenchmark)
<- function(len){
show_diff <- vector("numeric", 0)
output <- vector("numeric", len)
output2
<- microbenchmark(append = {
bench for (i in 1:len) {
<- c(output, i)
output
}
},
pre_allocate = {
for (i in 1:len) {
<- i
output2[i]
}
}times = 20)
,
autoplot(bench) +
ggtitle(paste0("Loop from 1:", len))
}
The difference isn’t that much when dealing with small loops:
show_diff(10)
But huge differences result very quickly as the loop length increases:
show_diff(100)
show_diff(500)
seq_along
The last tool we’ll look at is seq_along
. This function helps make iterable vectors from other R objects. Say you had a tibble/df that you wanted to iterate over to average each column. You could manually figure out the number of columns, then iterate on that calculation:
<- vector("numeric", length(mtcars))
output
for(i in 1:length(output)){
<- mean(mtcars[,i], na.rm=T)
output[i]
}
output
[1] 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250
[7] 17.848750 0.437500 0.406250 3.687500 2.812500
But, you could alternatively use seq_along
to generate a neat integer vector to iterate over:
<- vector("numeric", length(mtcars))
output
for(i in seq_along(mtcars)){
<- mean(mtcars[,i], na.rm=T)
output[i]
}
output
[1] 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250
[7] 17.848750 0.437500 0.406250 3.687500 2.812500