Testing R code

R
intermediate
Published

August 20, 2025

Previous attendees have said…

  • 5 previous attendees have left feedback
  • 100% would recommend this session to a colleague
  • 100% said that this session was pitched correctly

NoteThree random comments from previous attendees
  • useful for ideas for implementing better testing in my code. As always, I susally pick up other things unrelated to the session that end up being more useful than the intended content. e.g. package installation/loading - I seem to be doing this all wrong!
  • I think it was difficult for me to understand how to apply this to my code.
  • I thought that testing was something that I should be doing and was releived to find out that I actually do test and I further learnt some really good approaches with testthat package that will really improve my code testing. Thansk

Session outline

  • introduction: why test?
  • informal testing
  • unit testing
    • introduction - why automate your tests?
    • testthat walkthrough

Packages

Introduction: why test?

  • code goes wrong
    • functions change
    • data sources change
    • usage changes
  • testing guards against the commonest problems
  • that makes for more reliable code
    • more reliable code opens the way to nicer development patterns

A note

  • most discussions about testing come about as part of package development
  • we also won’t talk about debugging here (although do look out for the future training session on that)

Informal testing

  • a real-world example: Teams transcripts
  • Teams transcripts can be very useful data-sources
  • but they’re absolutely horrible to work with

Example Teams input

 [1] "WEBVTT"                                                          
 [2] "Q1::>"                                                           
 [3] "00:00:00.000 --> 00:00:14.080"                                   
 [4] "<v Brendan Clarke> this was the first question in the transcript"
 [5] ""                                                                
 [6] "00:00:14.080 --> 00:00:32.180"                                   
 [7] "<v Someone Else> then someone replied with this answer"          
 [8] ""                                                                
 [9] "Q2::>"                                                           
[10] "00:00:32.180 --> 00:00:48.010"                                   
[11] "<v Brendan Clarke> then there was another question"              
[12] ""                                                                
[13] "00:00:48.010 --> 00:00:58.010"                                   
[14] "<v Someone Else> and another tedious response"                   

Informal testing

  • imagine that you’ve written a (horrible) Teams transcript parser
  • how would you test this code to make sure it behaves itself?

An attempt at parsing

file <- here::here('r_training/data/teams_input.txt')

output <- readLines(file) |>
    as_tibble() |>
    filter(!value == "") |>
    filter(!value == "WEBVTT") |>
    mutate(question = str_extract(value, "^(Q.*?)::>$")) |>
    fill(question, .direction = 'down') |>
    filter(!str_detect(value,  "^(Q.*?)::>$")) |>
    mutate(ind = rep(c(1, 2),length.out = n())) |>
    group_by(ind) |>
    mutate(id = row_number()) |>
    spread(ind, value) |>
    select(-id) |>
    separate("1", c("start_time", "end_time"), " --> ") |>
    separate("2", c("name", "comment"), ">") |>
    mutate(source = str_remove_all(file, "\\.txt"),
           name = str_remove_all(name, "\\<v "), 
           comment = str_trim(comment), 
           question = str_remove_all(question, "::>")) 

Output

question start_time end_time name comment source
Q1 00:00:00.000 00:00:14.080 Brendan Clarke this was the first question in the transcript C:/non-od/KIND-training/r_training/data/teams_input
Q1 00:00:14.080 00:00:32.180 Someone Else then someone replied with this answer C:/non-od/KIND-training/r_training/data/teams_input
Q2 00:00:32.180 00:00:48.010 Brendan Clarke then there was another question C:/non-od/KIND-training/r_training/data/teams_input
Q2 00:00:48.010 00:00:58.010 Someone Else and another tedious response C:/non-od/KIND-training/r_training/data/teams_input

So, does the horrible code work??

  • as ever, it depends how low your standards are…

Informal testing

  • we could change the inputs, and look at the outputs
    • so twiddle our input file, and manually check the output
  • maybe we could also change the background conditions
    • change the R environment, or package versions, or whatever
  • but that gets tedious and erratic very quickly
  • and you need to document it somehow

Unit testing

  • = small-scale, automated, standardised, self-documenting testing
  • test how the different sections of your code work separately - think functions or subroutines or modules
  • how? testthat

First steps with testthat

Hello world

Test passed with 1 success 🎊.

Functions and testthat

  • testthat works best when you’re testing functions
  • functions in R are easy:

Transform your code into functions

Test your function

  • multo(2,2) should equal 4, so we use:
    • test_that() to set up our test environment
    • expect_equal() inside the test environment to check for equality
Test passed with 1 success 😸.

Raise your expectations

  • we can add more expectations…
Test passed with 5 successes 🥳.

A gotcha: Equal and identical

[1] 0.1
[1] FALSE
  • happily, there are a couple of different methods of checking equality. The standard expect_equal is more sloppy than the exacting expect_identical:
Test passed with 2 successes 🌈.

Beyond single values

  • if you want to work with vectors, there are a number of tools for checking their contents:
Test passed with 1 success 🥇.

Beyond single values

  • you can get much more fancy with a bit of set theory (not really set theory):
Failed expectation:
Expected `x[1:19]` to contain all values in `y`.
Actual: "Athens", "Barcelona", "Brussels", "Calais", "Cherbourg", "Cologne", "Copenhagen", "Geneva", "Gibraltar", ...
Expected: "Athens", "Barcelona", "Brussels", "Calais", "Cherbourg", "Cologne", "Copenhagen", "Geneva", "Gibraltar", ...
Missing: "Stockholm", "Vienna"
Test passed with 2 successes 🥇.

Beyond single values again

Test passed with 3 successes 🎉.

Testing tibbles

  • because most of the tests are powered by waldo, you shouldn’t have to do anything fancy to test on tibbles:
Test passed with 1 success 🎊.

Types and classes etc

[1] "list"
[1] "tbl_df"     "tbl"        "data.frame"
[1] FALSE
NULL
[1] TRUE
[1] "tbl_df"     "tbl"        "data.frame"

Testing tibbles

Test passed with 6 successes 🥇.
  • there’s also an expect_s4_class for those with that high clear mercury sound ringing in their ears

Last tip

  • you can put bare expectations in pipes if you’re looking for something specific