Testing R code

R
intermediate
Published

August 20, 2025

Previous attendees have said…

  • 4 previous attendees have left feedback
  • 100% would recommend this session to a colleague
  • 100% said that this session was pitched correctly

NoteThree random comments from previous attendees
  • I think it was difficult for me to understand how to apply this to my code.
  • very well pitched and comprehensive session
  • While intermediate, this was a great introduction to testing R code and providing plenty of things to think about and go try out

Session outline

  • introduction: why test?
  • informal testing
  • unit testing
    • introduction - why automate your tests?
    • testthat walkthrough

Packages

Introduction: why test?

  • code goes wrong
    • functions change
    • data sources change
    • usage changes
  • testing guards against the commonest problems
  • that makes for more reliable code
    • more reliable code opens the way to nicer development patterns

A note

  • most discussions about testing come about as part of package development
  • we also won’t talk about debugging here (although do look out for the future training session on that)

Informal testing

  • a real-world example: Teams transcripts
  • Teams transcripts can be very useful data-sources
  • but they’re absolutely horrible to work with

Example Teams input

read_lines(here::here('r_training/data/teams_input.txt'))
 [1] "WEBVTT"                                                          
 [2] "Q1::>"                                                           
 [3] "00:00:00.000 --> 00:00:14.080"                                   
 [4] "<v Brendan Clarke> this was the first question in the transcript"
 [5] ""                                                                
 [6] "00:00:14.080 --> 00:00:32.180"                                   
 [7] "<v Someone Else> then someone replied with this answer"          
 [8] ""                                                                
 [9] "Q2::>"                                                           
[10] "00:00:32.180 --> 00:00:48.010"                                   
[11] "<v Brendan Clarke> then there was another question"              
[12] ""                                                                
[13] "00:00:48.010 --> 00:00:58.010"                                   
[14] "<v Someone Else> and another tedious response"                   

Informal testing

  • imagine that you’ve written a (horrible) Teams transcript parser
  • how would you test this code to make sure it behaves itself?

An attempt at parsing

file <- here::here('r_training/data/teams_input.txt')

output <- readLines(file) |>
    as_tibble() |>
    filter(!value == "") |>
    filter(!value == "WEBVTT") |>
    mutate(question = str_extract(value, "^(Q.*?)::>$")) |>
    fill(question, .direction = 'down') |>
    filter(!str_detect(value,  "^(Q.*?)::>$")) |>
    mutate(ind = rep(c(1, 2),length.out = n())) |>
    group_by(ind) |>
    mutate(id = row_number()) |>
    spread(ind, value) |>
    select(-id) |>
    separate("1", c("start_time", "end_time"), " --> ") |>
    separate("2", c("name", "comment"), ">") |>
    mutate(source = str_remove_all(file, "\\.txt"),
           name = str_remove_all(name, "\\<v "), 
           comment = str_trim(comment), 
           question = str_remove_all(question, "::>")) 

Output

output |>
    knitr::kable()
question start_time end_time name comment source
Q1 00:00:00.000 00:00:14.080 Brendan Clarke this was the first question in the transcript /home/runner/work/KIND-training/KIND-training/r_training/data/teams_input
Q1 00:00:14.080 00:00:32.180 Someone Else then someone replied with this answer /home/runner/work/KIND-training/KIND-training/r_training/data/teams_input
Q2 00:00:32.180 00:00:48.010 Brendan Clarke then there was another question /home/runner/work/KIND-training/KIND-training/r_training/data/teams_input
Q2 00:00:48.010 00:00:58.010 Someone Else and another tedious response /home/runner/work/KIND-training/KIND-training/r_training/data/teams_input

So, does the horrible code work??

  • as ever, it depends how low your standards are…

Informal testing

  • we could change the inputs, and look at the outputs
    • so twiddle our input file, and manually check the output
  • maybe we could also change the background conditions
    • change the R environment, or package versions, or whatever
  • but that gets tedious and erratic very quickly
  • and you need to document it somehow

Unit testing

  • = small-scale, automated, standardised, self-documenting testing
  • test how the different sections of your code work separately - think functions or subroutines or modules
  • how? testthat

First steps with testthat

Hello world

test_that("multiplication works", {
  expect_equal(2 * 2, 4)
})
Test passed 😸

Functions and testthat

  • testthat works best when you’re testing functions
  • functions in R are easy:
function_name <- function(arg1 = default1, arg2 = default2){
     arg1 * arg2 # using our argument names
}

function_name <- function(arg1 = default1, arg2 = default2) arg1 * arg2

Transform your code into functions

multo <- function(n1, n2){
  n1 * n2
}

Test your function

  • multo(2,2) should equal 4, so we use:
# then run as a test

test_that("multo works with 2 and 2", {
    expect_equal(multo(2, 2), 4)
})
Test passed 🌈

Raise your expectations

  • we can add more expectations…
test_that("multo works in general", {
    expect_equal(multo(2, 2), 4)
    expect_identical(multo(2,0.01), 0.02)
    expect_type(multo(0,2), "double")
    expect_length(multo(9,2), 1)
    expect_gte(multo(4,4), 15)
})
Test passed 🥳

A gotcha: Equal and identical

3 - 2.9
[1] 0.1
3 - 2.9 == 0.1
[1] FALSE
  • happily, there are a couple of different methods of checking equality. The standard expect_equal is more sloppy than the exacting expect_identical:
test_that("pedants corner", {
  expect_equal(multo(2, 0.01), 0.020000001) # more relaxed
  expect_identical(multo(2, 0.01), 0.02) # more precise
})
Test passed 🥇

Beyond single values

  • if you want to work with vectors, there are a number of tools for checking their contents:
x <- rownames(as.matrix(eurodist, labels=TRUE)) # odd built in dataset

test_that("check my vec", {
    expect_equal(x[1:2], c("Athens", "Barcelona"))
}) 
Test passed 😸

Beyond single values

  • you can get much more fancy with a bit of set theory (not really set theory):
y <- x

test_that("check my vec sets", {
    expect_success(expect_setequal(x, y)) # all x in y
    expect_failure(expect_mapequal(x, y)) # same names, y is proper subset x) # all x in y)
    show_failure(expect_contains(x[1:19], y)) # y proper subset x)
    expect_success(expect_in(x, y)) # x proper subset y
})    
Failed expectation:
x[1:19] (`actual`) doesn't fully contain all the values in `y` (`expected`).
* Missing from `actual`: "Stockholm", "Vienna"
* Present in `actual`:   "Athens", "Barcelona", "Brussels", "Calais", "Cherbourg", "Cologne", "Copenhagen", "Geneva", "Gibraltar", ...

Test passed 🥇

Beyond single values again

y <- sample(x, length(x)-2)

test_that("check my vec sets", {
    expect_failure(expect_setequal(x, y)) # all x in y
    expect_failure(expect_mapequal(x, y)) # same names, y is proper subset x) # all x in y)
    expect_success(expect_contains(x, y)) # y proper subset x)
    expect_failure(expect_in(x, y)) # x is a proper subset y
})    
Test passed 🥇

Testing tibbles

  • because most of the tests are powered by waldo, you shouldn’t have to do anything fancy to test on tibbles:
my_pengs <- penguins

test_that("penguin experiments", {
    expect_equal(my_pengs, penguins)
})
Test passed 😀

Types and classes etc

typeof(penguins)
[1] "list"
class(penguins) 
[1] "tbl_df"     "tbl"        "data.frame"
is.object(names(penguins)) # vectors are base types
[1] FALSE
attr(names(penguins), "class") # base types have no class
NULL
is.object(penguins) # this is some kind of object
[1] TRUE
attr(penguins, "class") # so it definitely does have a class
[1] "tbl_df"     "tbl"        "data.frame"

Testing tibbles

test_that("penguin types", {
    expect_type(penguins, "list")
    expect_s3_class(penguins, "tbl_df")
    expect_s3_class(penguins, "tbl")
    expect_s3_class(penguins, "data.frame")
    expect_type(penguins$island, "integer")
    expect_s3_class(penguins$island, "factor")
})
Test passed 🎊
  • there’s also an expect_s4_class for those with that high clear mercury sound ringing in their ears

Last tip

  • you can put bare expectations in pipes if you’re looking for something specific
penguins |>
  expect_type("list") |>
  pull(island) |>
  expect_length(344)