Testing R code

intermediate

Published

August 20, 2025

Previous attendees have said…

4 previous attendees have left feedback
100% would recommend this session to a colleague
100% said that this session was pitched correctly

Three random comments from previous attendees

useful for ideas for implementing better testing in my code. As always, I susally pick up other things unrelated to the session that end up being more useful than the intended content. e.g. package installation/loading - I seem to be doing this all wrong!
I think it was difficult for me to understand how to apply this to my code.
very well pitched and comprehensive session

Session outline

introduction: why test?
informal testing
unit testing
- introduction - why automate your tests?
- testthat walkthrough

Packages

# install.packages("testthat")
library(testthat)
library(readr)
library(dplyr)
library(tidyr)
library(stringr)
library(palmerpenguins)

Introduction: why test?

code goes wrong
- functions change
- data sources change
- usage changes
testing guards against the commonest problems
that makes for more reliable code
- more reliable code opens the way to nicer development patterns

A note

most discussions about testing come about as part of package development
- we’ll avoid that area here, but please see the three excellent chapters in the R packages book for guidance
- we’ll also steer clear of Shiny/Rmarkdown/Quarto, as things can be a bit more tricky to test there
we also won’t talk about debugging here (although do look out for the future training session on that)

Informal testing

a real-world example: Teams transcripts
Teams transcripts can be very useful data-sources
but they’re absolutely horrible to work with

Example Teams input

read_lines(here::here('r_training/data/teams_input.txt'))

 [1] "WEBVTT"                                                          
 [2] "Q1::>"                                                           
 [3] "00:00:00.000 --> 00:00:14.080"                                   
 [4] "<v Brendan Clarke> this was the first question in the transcript"
 [5] ""                                                                
 [6] "00:00:14.080 --> 00:00:32.180"                                   
 [7] "<v Someone Else> then someone replied with this answer"          
 [8] ""                                                                
 [9] "Q2::>"                                                           
[10] "00:00:32.180 --> 00:00:48.010"                                   
[11] "<v Brendan Clarke> then there was another question"              
[12] ""                                                                
[13] "00:00:48.010 --> 00:00:58.010"                                   
[14] "<v Someone Else> and another tedious response"

Informal testing

imagine that you’ve written a (horrible) Teams transcript parser
how would you test this code to make sure it behaves itself?

An attempt at parsing

file <- here::here('r_training/data/teams_input.txt')

output <- readLines(file) |>
    as_tibble() |>
    filter(!value == "") |>
    filter(!value == "WEBVTT") |>
    mutate(question = str_extract(value, "^(Q.*?)::>$")) |>
    fill(question, .direction = 'down') |>
    filter(!str_detect(value,  "^(Q.*?)::>$")) |>
    mutate(ind = rep(c(1, 2),length.out = n())) |>
    group_by(ind) |>
    mutate(id = row_number()) |>
    spread(ind, value) |>
    select(-id) |>
    separate("1", c("start_time", "end_time"), " --> ") |>
    separate("2", c("name", "comment"), ">") |>
    mutate(source = str_remove_all(file, "\\.txt"),
           name = str_remove_all(name, "\\<v "), 
           comment = str_trim(comment), 
           question = str_remove_all(question, "::>"))

Output

output |>
    knitr::kable()

question	start_time	end_time	name	comment	source
Q1	00:00:00.000	00:00:14.080	Brendan Clarke	this was the first question in the transcript	C:/non-od/KIND-training/r_training/data/teams_input
Q1	00:00:14.080	00:00:32.180	Someone Else	then someone replied with this answer	C:/non-od/KIND-training/r_training/data/teams_input
Q2	00:00:32.180	00:00:48.010	Brendan Clarke	then there was another question	C:/non-od/KIND-training/r_training/data/teams_input
Q2	00:00:48.010	00:00:58.010	Someone Else	and another tedious response	C:/non-od/KIND-training/r_training/data/teams_input

So, does the horrible code work??

as ever, it depends how low your standards are…

Informal testing

we could change the inputs, and look at the outputs
- so twiddle our input file, and manually check the output
maybe we could also change the background conditions
- change the R environment, or package versions, or whatever
but that gets tedious and erratic very quickly
and you need to document it somehow

Unit testing

= small-scale, automated, standardised, self-documenting testing
test how the different sections of your code work separately - think functions or subroutines or modules
how? testthat

First steps with `testthat`

built for R package developers
but readily usable for non-package people
helpful man pages
nice vignette
more ambitious guide to ad hoc testing with testthat

Hello world

test_that("multiplication works", {
  expect_equal(2 * 2, 4)
})

Test passed 🎊

Functions and `testthat`

testthat works best when you’re testing functions
functions in R are easy:

function_name <- function(arg1 = default1, arg2 = default2){
     arg1 * arg2 # using our argument names
}

function_name <- function(arg1 = default1, arg2 = default2) arg1 * arg2

Transform your code into functions

multo <- function(n1, n2){
  n1 * n2
}

Test your function

multo(2,2) should equal 4, so we use:
- test_that() to set up our test environment
- expect_equal() inside the test environment to check for equality

# then run as a test

test_that("multo works with 2 and 2", {
    expect_equal(multo(2, 2), 4)
})

Test passed 🥇

Raise your expectations

we can add more expectations…

test_that("multo works in general", {
    expect_equal(multo(2, 2), 4)
    expect_identical(multo(2,0.01), 0.02)
    expect_type(multo(0,2), "double")
    expect_length(multo(9,2), 1)
    expect_gte(multo(4,4), 15)
})

Test passed 😸

A gotcha: Equal and identical

beware the floating point error

3 - 2.9

[1] 0.1

3 - 2.9 == 0.1

[1] FALSE

happily, there are a couple of different methods of checking equality. The standard expect_equal is more sloppy than the exacting expect_identical:

test_that("pedants corner", {
  expect_equal(multo(2, 0.01), 0.020000001) # more relaxed
  expect_identical(multo(2, 0.01), 0.02) # more precise
})

Test passed 🎊

Beyond single values

if you want to work with vectors, there are a number of tools for checking their contents:

x <- rownames(as.matrix(eurodist, labels=TRUE)) # odd built in dataset

test_that("check my vec", {
    expect_equal(x[1:2], c("Athens", "Barcelona"))
})

Test passed 🎊

Beyond single values

you can get much more fancy with a bit of set theory (not really set theory):

y <- x

test_that("check my vec sets", {
    expect_success(expect_setequal(x, y)) # all x in y
    expect_failure(expect_mapequal(x, y)) # same names, y is proper subset x) # all x in y)
    show_failure(expect_contains(x[1:19], y)) # y proper subset x)
    expect_success(expect_in(x, y)) # x proper subset y
})

Failed expectation:
x[1:19] (`actual`) doesn't fully contain all the values in `y` (`expected`).
* Missing from `actual`: "Stockholm", "Vienna"
* Present in `actual`:   "Athens", "Barcelona", "Brussels", "Calais", "Cherbourg", "Cologne", "Copenhagen", "Geneva", "Gibraltar", ...

Test passed 🥳

Beyond single values again

y <- sample(x, length(x)-2)

test_that("check my vec sets", {
    expect_failure(expect_setequal(x, y)) # all x in y
    expect_failure(expect_mapequal(x, y)) # same names, y is proper subset x) # all x in y)
    expect_success(expect_contains(x, y)) # y proper subset x)
    expect_failure(expect_in(x, y)) # x is a proper subset y
})

Test passed 😀

Testing tibbles

because most of the tests are powered by waldo, you shouldn’t have to do anything fancy to test on tibbles:

my_pengs <- penguins

test_that("penguin experiments", {
    expect_equal(my_pengs, penguins)
})

Test passed 🎉

Types and classes etc

one massive corollary to that: if you don’t do a lot of base-R, expect a fiercely stringent test of your understanding of types and classes.

typeof(penguins)

[1] "list"

class(penguins)

[1] "tbl_df"     "tbl"        "data.frame"

is.object(names(penguins)) # vectors are base types

[1] FALSE

attr(names(penguins), "class") # base types have no class

NULL

is.object(penguins) # this is some kind of object

[1] TRUE

attr(penguins, "class") # so it definitely does have a class

[1] "tbl_df"     "tbl"        "data.frame"

Testing tibbles

test_that("penguin types", {
    expect_type(penguins, "list")
    expect_s3_class(penguins, "tbl_df")
    expect_s3_class(penguins, "tbl")
    expect_s3_class(penguins, "data.frame")
    expect_type(penguins$island, "integer")
    expect_s3_class(penguins$island, "factor")
})

Test passed 😸

there’s also an expect_s4_class for those with that high clear mercury sound ringing in their ears

Last tip

you can put bare expectations in pipes if you’re looking for something specific

penguins |>
  expect_type("list") |>
  pull(island) |>
  expect_length(344)

Previous attendees have said…

Session outline

Packages

Introduction: why test?

A note

Informal testing

Example Teams input

Informal testing

An attempt at parsing

Output

So, does the horrible code work??

Informal testing

Unit testing

First steps with testthat

Hello world

Functions and testthat

Transform your code into functions

Test your function

Raise your expectations

A gotcha: Equal and identical

Beyond single values

Beyond single values

Beyond single values again

Testing tibbles

Types and classes etc

Testing tibbles

Last tip

First steps with `testthat`

Functions and `testthat`