Testing R code

R
intermediate
Published

August 7, 2024

Session materials

Previous attendees have said…

  • 4 previous attendees have left feedback
  • 100% would recommend this session to a colleague
  • 100% said that this session was pitched correctly

Three random comments from previous attendees
  • useful for ideas for implementing better testing in my code. As always, I susally pick up other things unrelated to the session that end up being more useful than the intended content. e.g. package installation/loading - I seem to be doing this all wrong!
  • very well pitched and comprehensive session
  • While intermediate, this was a great introduction to testing R code and providing plenty of things to think about and go try out

Welcome

  • this session is for 🌶🌶 intermediate users
  • you’ll need R + Rstudio / Posit Workbench / posit.cloud to follow along

Session outline

  • introduction: why test?
  • informal testing
  • unit testing
    • introduction - why automate your tests?
    • testthat walkthrough

Introduction: why test?

  • code goes wrong
    • functions change
    • data sources change
    • usage changes
  • testing guards against the commonest problems
  • that makes for more reliable code
    • more reliable code opens the way to nicer development patterns

A note

  • most discussions about testing come about as part of package development
  • we also won’t talk about debugging here (although do look out for the future training session on that)

Informal testing

  • a real-world example: Teams transcripts
  • Teams transcripts can be very useful data-sources
  • but they’re absolutely horrible to work with:
WEBVTT
Q1::>
00:00:00.000 --> 00:00:14.080
<v Brendan Clarke> this was the first question in the transcript

00:00:14.080 --> 00:00:32.180
<v Someone Else> then someone replied with this answer

Q2::>
00:00:32.180 --> 00:00:48.010
<v Brendan Clarke> then there was another question

00:00:48.010 --> 00:00:58.010
<v Someone Else> and another tedious response
  • imagine that you’ve written a (horrible) Teams transcript parser:
  • how would you test this code to make sure it behaves itself?
file <- "data/input.txt"

readLines(file) |>
    tibble::as_tibble() |>
    dplyr::filter(!value == "") |>
    dplyr::filter(!value == "WEBVTT") |>
    dplyr::mutate(question = stringr::str_extract(value, "^(Q.*?)::>$")) |>
    tidyr::fill(question, .direction = 'down') |>
    dplyr::filter(!stringr::str_detect(value,  "^(Q.*?)::>$")) |>
    dplyr::mutate(ind = rep(c(1, 2),length.out = dplyr::n())) |>
    dplyr::group_by(ind) |>
    dplyr::mutate(id = dplyr::row_number()) |>
    tidyr::spread(ind, value) |>
    dplyr::select(-id) |>
    tidyr::separate("1", c("start_time", "end_time"), " --> ") |>
    tidyr::separate("2", c("name", "comment"), ">") |>
    dplyr::mutate(source = stringr::str_remove_all(file, "\\.txt"),
           name = stringr::str_remove_all(name, "\\<v "), 
           comment = stringr::str_trim(comment), 
           question = stringr::str_remove_all(question, "::>")) |>
    knitr::kable()
question start_time end_time name comment source
Q1 00:00:00.000 00:00:14.080 Brendan Clarke this was the first question in the transcript data/input
Q1 00:00:14.080 00:00:32.180 Someone Else then someone replied with this answer data/input
Q2 00:00:32.180 00:00:48.010 Brendan Clarke then there was another question data/input
Q2 00:00:48.010 00:00:58.010 Someone Else and another tedious response data/input
  • we could change the inputs, and look at the outputs
    • so twiddle our input file, and manually check the output
  • maybe we could also change the background conditions
    • change the R environment, or package versions, or whatever
  • but that gets tedious and erratic very quickly

testthat

  • Unit testing = automated, standardised, testing
  • the best place to start is with testthat:
library(testthat)

First steps with testthat

  • built for R package developers
  • but readily usable for non-package people
test_that("multiplication works", {
  expect_equal(2 * 2, 4)
})
Test passed 🥇

Functions and testthat

  • testthat works best when you’re testing functions
  • functions in R are easy:
function_name <- function(arg1 = default1, arg2 = default2){
     arg1 * arg2 # using our argument names
}

or include the body inline for simple functions:

function_name <- function(arg1 = default1, arg2 = default2) arg1 * arg2

Transform your code into functions

multo <- function(n1, n2){
  n1 * n2
}

Test your function

  • then test. We think that multo(2,2) should equal 4, so we use:
    • test_that() to set up our test environment
    • expect_equal() inside the test environment to check for equality
# then run as a test

test_that("multo works with 2 and 2", {
    expect_equal(multo(2, 2), 4)
})
Test passed 🌈

Raise your expectations

  • we can add more expectations
test_that("multo works in general", {
    expect_equal(multo(2, 2), 4)
    expect_identical(multo(2,0.01), 0.02)
    expect_type(multo(0,2), "double")
    expect_length(multo(9,2), 1)
    expect_gte(multo(4,4), 15)
})
Test passed 😀

Equal and identical

3 - 2.9
[1] 0.1
3 - 2.9 == 0.1
[1] FALSE
  • happily, there’s a sufficiently sloppy way of checking equality:
test_that("pedants corner", {
  expect_equal(multo(2, 0.01), 0.020000001)
  expect_identical(multo(2, 0.01), 0.02)
})
Test passed 🌈

Testing several values

  • if you want to work with vectors, there are a number of tools for checking their contents:
x <- rownames(as.matrix(eurodist, labels=TRUE)) # odd built in dataset

test_that("check my vec", {
    expect_equal(x[1:2], c("Athens", "Barcelona"))
})    
Test passed 😀
  • you can get much more fancy with a bit of set theory (not really set theory):
y <- x

test_that("check my vec sets", {
    expect_success(expect_setequal(x, y)) # all x in y
    expect_failure(expect_mapequal(x, y)) # same names, y is proper subset x) # all x in y)
    show_failure(expect_contains(x[1:19], y)) # y proper subset x)
    expect_success(expect_in(x, y)) # x proper subset y
})    
Failed expectation:
x[1:19] (`actual`) doesn't fully contain all the values in `y` (`expected`).
* Missing from `actual`: "Stockholm", "Vienna"
* Present in `actual`:   "Athens", "Barcelona", "Brussels", "Calais", "Cherbourg", "Cologne", "Copenhagen", "Geneva", "Gibraltar", ...

Test passed 🥳
y <- sample(x, length(x)-2)

test_that("check my vec sets", {
    expect_failure(expect_setequal(x, y)) # all x in y
    expect_failure(expect_mapequal(x, y)) # same names, y is proper subset x) # all x in y)
    expect_success(expect_contains(x, y)) # y proper subset x)
    expect_failure(expect_in(x, y)) # x is a proper subset y
})    
Test passed 😀

Testing tibbles

  • because most of the tests are powered by waldo, you shouldn’t have to do anything fancy to test on tibbles:
library(palmerpenguins)

my_pengs <- penguins

test_that("penguin experiments", {
    expect_equal(my_pengs, penguins)
})
Test passed 😸

Types and classes etc

typeof(penguins)
[1] "list"
class(penguins) 
[1] "tbl_df"     "tbl"        "data.frame"
is.object(names(penguins)) # vectors are base types
[1] FALSE
attr(names(penguins), "class") # base types have no class
NULL
is.object(penguins) # this is some kind of object
[1] TRUE
attr(penguins, "class") # so it definitely does have a class
[1] "tbl_df"     "tbl"        "data.frame"

Tibble tests

test_that("penguin types", {
    expect_type(penguins, "list")
    expect_s3_class(penguins, "tbl_df")
    expect_s3_class(penguins, "tbl")
    expect_s3_class(penguins, "data.frame")
    expect_type(penguins$island, "integer")
    expect_s3_class(penguins$island, "factor")
})
Test passed 🥇
  • there’s also an expect_s4_class for those with that high clear mercury sound ringing in their ears

Last tip

  • you can put bare expectations in pipes if you’re looking for something specific
penguins |>
  expect_type("list") |>
  dplyr::pull(island) |>
  expect_length(344)