Debugging R

R
beginner
troubleshooting
Published

September 5, 2024

Session materials

There’s also a practical exercise associated with this training session. It’s a compendium of different, common, R errors for you to troubleshoot. It’s not at all true-to-life, but is instead a ghastly scope-of-the-possible with something over forty distinct problems to fix.

No feedback found for this session

Debugging R

My sequence of R errors while writing this training session. These pages are published via a GitHub action which can be a bit complex to troubleshoot. I’m definitely still a beginner at that
  • R code often misbehaves
    • that’s true of code in general - it’s not specific to R
  • learning to manage errors and problems effectively is a long-term learning issue
    • most programmers, no matter their level of expertise, spent much of their time fixing problems
  • this session is designed to give you a basic toolkit of strategies for dealing with broken code

Session outline

  • Avoiding errors
  • Types of broken code
  • Pin-pointing problems
  • Isolating errors
  • Finding help
  • Fancier stuff
Note
  • most of the troubleshooting advice is fairly general, and should apply to R no matter how you’re running it
  • however, there are a few Rstudio / Posit-specific tips in the section on non-running code

Avoiding errors

The best kind of troubleshooting is no troubleshooting. You can avoid errors by:

  • planning carefully before writing proper code - I write the script in comments first, then translate into code. This has the useful side-effect of annotating the code for no extra effort
  • working in a logical sequence from start to finish
  • keeping code simple and legible. That means short scripts with source(), and using functions and functional programming to avoid repetitious code
  • avoiding creating loads of variables (do a lot of things directly if you don’t need the data for later), and careful variable naming
  • refactoring messy code at the time to keep it clean
  • keeping code formatted nicely

I see a lot of broken code, and the commonest cause by far is excessive script length and complexity. Learning to write functions is the single greatest thing you can do to make your code simpler and thus avoid bugs.

How is your code broken?

My code won’t run at all

This is the commonest, hard-to-solve error that most of us encounter. It usually means that R is having some unusual difficulty in recognizing the R code you’re written as being R code. It’s hard to solve because you don’t have a nice informative error message (or anything else) to look at. Things to check at this point are:

A list of generic R troubleshooting strategies
  1. make sure the last line of the console has a > rather than a +. If you see +, R is waiting for you to do something. Press Esc a couple of times, then try your code again
  2. try running some new code in the console (2 + 2 if more than enough)
  3. restart R (Ctrl + Shift + F10)
  4. terminate R (Session menu)
  5. close and reopen Rstudio (or hard-refresh Ctrl+F5 the page for Posit)
  6. make sure the code you’re trying to run is saved as an .R file
  7. if you’re on the desktop, make sure you have R installed on your machine
  8. try opening a different project and seeing if that code will run

I’m getting an error message

Most R error messages tell you what’s wrong by providing an error message in the console. In fact, many R functions will also tell you what’s gone right there too. The term of art here is that R code is generally surly - it complains a lot, even when things are going right. Here, we’re going to ignore those informative messages, and concentrate on proper errors that prevent your code from running properly.

My output is wonky

The opposite problem: your code seems to run perfectly, but you don’t get the output you’re expecting. Given that you don’t have an error message to work from, you’ll need to try and work backwards from the output you’re expecting. But a few examples of output that’s gone wrong:

  • numeric values are bigger/smaller than expected
  • text is incorrect, poorly-formatted, squishedtogether, or IN THE WRONG CASE
  • graphs miss values or mis-represent values
  • the possibilities are endless

In general, finding and fixing output problems is a manual business. You should expect to have output errors in new code, and it’s important to allow yourself time and space to both check for errors, and to resolve them.

Pin-pointing problems

To solve an error you’ll almost always need to refine your understanding of a) what’s gone wrong, and b) where it’s gone wrong.

Read the error message

Obvious but takes practice. Most R error messages give you clues as to how to find and fix the problem. The key here is that a) the message will contain the word Error and b) your code will stop running. Many errors from more recent packages have nice explanations and fixes. In general, the older the function, the more difficult the error. That wasn’t always the case, and some older R functions have horrid error messages. Here’s a short dictionary for some classics:

  • could not find function ‘thing’ = you’ve mis-spelled a function name or failed to load a package
  • there is no package called ‘glok’ = you’ve mis-spelled a package name - or you haven’t installed it
  • object 'mpg' not found = you haven’t created an object, or you’ve missed a pipe somewhere
  • non-numeric argument to binary operator = you’ve tried to do maths with words somewhere
  • longer object length is not a multiple of shorter object length or length of 'dimnames' [2] not equal to array extent = you’ve fallen into vector recycling rules
  • lots of weird errors about filter - make sure you’re using dplyr::filter specifically

Finding errors

Unlike many programming languages, R doesn’t give you a line number for errors. It is possible, but a bit messy, to add line numbers, but might be worth it if you’re coming from Python or similar. You can usually see the malfunctioning line of code immediately above the error message in the console.

If your error message isn’t that helpful, you can try running the traceback() command in the console, which will show you the series of steps that led to your last error. The trick here is to read bottom-to-top: the top of the traceback() output was the last function that ran before R fell over.

You can also, if you’re getting desperate, try some old-fashioned print debugging, by putting some print("some label") statements into your code. They’ll appear in the console, so you might be able to detect the location of your error if all else fails. But really, if you’re at this point you should probably move onto a more involved strategy…

Check the basic stuff

If that hasn’t worked, we’re into debugging proper. So, at this point, make sure that you’ve restart R to clean up your environment, run your code again, and then looked carefully at your output/error message to decide what’s wrong with it. At this point, I’d also try to write the problem down, which is an effective but irrational strategy that often seems to yield results.

Assuming that fails, we can run through a set of quick checks that are always worth trying:

A list of basic code errors
  1. check your brackets - Code > Rainbow parentheses (or rainbow brackets for those of us in British-inflected English) are fantastic for this
  2. look for easy transpositions like < / >, + / -
  3. check that your function names are correct, and you’re not accidentally using false-friends (class() and typeof() are really nice examples)
    1. a quick check of each function’s help at this point, just to check it really does what you think it does, is highly recommended
  4. ensure all your packages are installed and loaded. If in doubt, use namespacing package::function
  5. make sure that you’re actually looking at this code’s output, and haven’t accidentally calculated something without showing it…
result <- 1+5+8+9
print(results) # slightly different variable name

Isolating errors

If you can’t find your problem fairly easily, then you need to cut your code down to size to make your search easier.

Make a minimal example

The best advice for tricky problems is to make your problem as specific as problem. That means removing any irrelevant code. We’ll call this the minimal example. Assuming that you’ve already tried the sequence of basic troubleshooting steps above, you’d go about it as follows:

  1. switch off any unnecessary parts of your code to pin down where the error is happening
    1. you can do that by adding # or Ctrl+Shift+c
    2. or, probably cognitively better, copy the misbehaving code to a new script
  2. try to run that extracted piece of code (again, probably after restarting R)
    1. you’ll almost certainly need to load packages, and add other pieces of code to get it to run
    2. whatever you do, do the minimum. No loading all the usual packages, no copy-pasting great gobbets of code around the place, just try and keep the script as simple as possible
  3. when you’ve got the code to run, you’ll either:
    1. have the same error again, in which case you’re in a much better position to do the same thing again to localise the error within that new, simpler, script you’ve just built
    2. have fixed the error, in which case you now know what you need to look elsewhere in your original code for the source of the problem
    3. have given yourself a brand-new error to focus on. Lucky you! Definitely try to fix that new error before proceeding

If that doesn’t work, it’s time to look up the chain to the inputs of this piece of code. I’d work as follows:

  1. check for types - so if you’re expecting your code to add numbers together, ensure that your inputs are both numbers
  2. check that the variable names line up
  3. check that there’s no other code that could potentially alter any of your inputs
  4. finally, repeat the isolation exercise above for each input

Reprex

= reproducible example. See the documentation for the reprex package, especially the helpful set of do’s and don’t.

If the minimal example method above doesn’t find your problem, you’re probably going to need to ask for help. Luckily, the preparatory work you’ll need for that is itself directly helpful in solving your problem.

Generally, when people want to help you with your R code, the first thing they’ll want to do is to run it. A reprex is a way of packaging up everything your code needs so that anyone can copy and paste it. This is important, because a bare snippet of code doesn’t tell the full story. For example, say this bit of code isn’t working for me:

my_data |>
  filter(flavours = "chocolate") |>
  transmogrify() 

A potential ally and helper won’t be able to assist me with this as it stands. They won’t know about my data, they won’t know which packages I’m using, and (last line) they won’t even recognize the function name that I just made up. To save a lot of annoying back and forth, you should turn this into a minimal example (see above), and then use reprex to help your helpers. I need at least the following bits to make this code reproducible:

  • the data
  • an idea of which package the filter() comes from
  • a definition of transmogrify()

That breaks down as follows:

  • provide a short snippet of data
  • include any relevant package loading
  • include the function definition for any non-package functions
my_data <- dplyr::tibble(flavours = c("chocolate", "onion", "chocolate"),
                  scoop_size = c(100,120,100),
                  count = c(200,500,400)) # give a preview of the data in a copy/pastable format

transmogrify <- function(df){
  df |>
    dplyr::summarise(total_vol_l = sum(scoop_size * count)/1000) # add the function definition
}

my_data |> # I can now run this and find the error!
  dplyr::filter(flavours = "chocolate") |> # dplyr's filter
  transmogrify() 

This usually means that you’ve used = instead of ==.

Getting data together for a reprex is the most painful bit, particularly if you’re dealing with something real-world and lengthy. I like the datapasta package for this, as it has the useful function tribble_format that will take some data and turn it into a copy/pastable format that you can use in a reprex.

datapasta::tribble_format(head(mtcars, 4))

Finding help

You should definitely read the incredibly short (+ free) chapter in R for Data Science as a start. R is a widely-used language, which means that most errors will have cropped-up in some form before. In general, I’d suggest a thorough search for your term before asking for specific help. Some search tips:

  • look in the help before anything online
  • be specific - mention package/function names and error messages. Often including the whole error message in quotes can sharpen things up a bit
  • include “R” in your search string to ensure you’re getting relevant results
  • you might also use the “site: stackoverflow.com” to restrict searches to one of the most useful and relevant sources there is

Otherwise, assuming this fails, you should ask for help. I’d probably think about one of three venues for that:

  • if you’re reading this, you already know about the KIND network. We have a large R community - come and chat!
  • if you’re NHS, the NHS-R community is great and full of friendly experts. They have an active Slack channel
  • otherwise, the big player is stackoverflow. They can be fierce, but definitely some serious experts out there

Wherever you ask for help, please help yourself:

  • be sure you know about any local rules governing asking for help
  • search carefully to make sure you’re not the twentieth person this week asking the same question
  • a specific, but concise, question is the thing. Include the error message, give basic details about your setup (R 4.4.1, Rstudio, Windows, e.g) and include your reprex so that potential helpers don’t have to ask you for it

Fancier stuff

As you write more complex code, I would want to add three more sets of tools. The first of these is browser()/breakpoints. These are mainly used to understand what’s going on inside functions. I’d recommend the chapter in Advanced R as an entry point to these techniques.

I’d also want to start thinking more formally about methods and classes as this is a common cause of errors. The chapter on S3 in Advanced R is a good place to start, as is fooling around with the sloop package.

The final strategy is to formalize and automate your testing. Again, well beyond the scope of this beginner’s session, but if you’re interested in this powerful approach for more advanced applications you can review our unit testing materials for an accessible introduction. - unit testing