Scope of the possible with R

R
overview
Published

December 9, 2025

No feedback found for this session

Welcome

  • this session is a non-technical overview designed for service leads

Session outline

  • Introducing R, and a bit of chat about the aims of this session
  • Practical demo - take some data, load, tidy, analyse, produce outputs
  • Strengths and weaknesses
    • obvious
    • less obvious
  • Alternatives
  • Skill development

Introducing R

  • free and open-source statistical programming language
  • multi-platform
  • large user base
  • prominent in health, industry, biosciences

Why this session?

  • R can be confusing
    • it’s code-based, and most of us don’t have much code experience
    • it’s used for some inherently complicated tasks
    • it’s a big product with lots of add-ons and oddities
  • But R is probably the best general-purpose toolbox we have for data work at present
    • big user base in health and social care
    • focus on health and care-like applications
    • not that hard to learn
    • extensible and flexible
    • capable of enterprise-y, fancy uses
  • yet there’s signficant resistant to using R in parts of Scotland’s health and care sector

R demo

  • this is about showing what’s possible, and give you a flavour of how R works
  • we won’t explain code in detail during this session
  • using live open data https://www.opendata.nhs.scot/dataset/weekly-accident-and-emergency-activity-and-waiting-times

Hello world!

There are lots of ways to run R. In this session, we’ll demonstrate using the Rstudio Desktop IDE on Windows.

"Hello world!"

[1] “Hello world!”

Packages

R is highly extensible using packages: collections of useful code

library(readr) # plus, nice to see R's folksy daft names for things in action

Create variables

url <- "https://www.opendata.nhs.scot/dataset/0d57311a-db66-4eaa-bd6d-cc622b6cbdfa/resource/a5f7ca94-c810-41b5-a7c9-25c18d43e5a4/download/weekly_ae_activity_20251123.csv"

Load some open data

That’s a link to data about weekly A+E activity. It’s large-ish (approximately 40000 rows)

ae_activity <- read_csv(url)

One small bit of cheating: renaming

names(ae_activity) <- c("date", "country", "hb", "loc", "type", "attend", "n", "n_in_4", "n_4", "perc_4", "n_8", "perc_8", "n_12", "perc_12")

Preview

Preview of data
date country hb loc type attend n n_in_4 n_4 perc_4 n_8 perc_8 n_12 perc_12
20210815 S92000003 S08000032 L106H Type 1 All 1270 891 379 70.2 47 3.7 5 0.4
20191201 S92000003 S08000031 G405H Type 1 Unplanned 1939 1368 571 70.6 131 6.8 33 1.7
20171015 S92000003 S08000030 T202H Type 1 All 440 436 4 99.1 0 0.0 0 0.0
20241013 S92000003 S08000020 N101H Type 1 All 1066 474 592 44.5 251 23.5 87 8.2
20200412 S92000003 S08000017 Y146H Type 1 All 311 300 11 96.5 1 0.3 0 0.0

Removing data

library(dplyr)

ae_activity <- ae_activity |>
    select(!c(country, contains("perc_")))
Preview of data
date hb loc type attend n n_in_4 n_4 n_8 n_12
20240609 S08000031 C313H Type 1 All 579 453 126 33 7
20190224 S08000032 L308H Type 1 Unplanned 1424 1219 205 30 6
20240825 S08000020 N411H Type 1 Unplanned 565 405 160 33 7
20150809 S08000031 G513H Type 1 Unplanned 855 850 5 0 0
20180909 S08000020 N411H Type 1 All 528 508 20 1 0

Tidying data

library(lubridate)

ae_activity <- ae_activity |>
    mutate(date = ymd(date))
Preview of data
date hb loc type attend n n_in_4 n_4 n_8 n_12
2016-09-18 S08000032 L308H Type 1 All 1368 1270 98 8 0
2020-04-05 S08000026 Z102H Type 1 Unplanned 61 57 4 0 0
2016-05-15 S08000016 B120H Type 1 Unplanned 615 595 20 0 0
2021-12-19 S08000024 S308H Type 1 All 1019 746 273 56 15
2025-05-04 S08000031 G405H Type 1 New planned 31 22 9 4 1

Subset data

We’ll take a selection of 5 health boards to keep things tidy:

boards_sample <- c("NHS Borders", "NHS Fife", "NHS Grampian", "NHS Highland", "NHS Lanarkshire")

Joining data

Those board codes (like S08000020) aren’t very easy to read. Luckily, we can add the proper “NHS Thing & Thing” board names from another data source.

boards <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv")
NHS boards
HB HBName HBDateEnacted HBDateArchived Country
S08000026 NHS Shetland 20140401 NA S92000003
S08000015 NHS Ayrshire and Arran 20140401 NA S92000003
S08000031 NHS Greater Glasgow and Clyde 20190401 NA S92000003
S08000032 NHS Lanarkshire 20190401 NA S92000003
S08000018 NHS Fife 20140401 20180201 S92000003

We can do something very similar with the A&E locations:

locs <- read_csv("https://www.opendata.nhs.scot/dataset/a877470a-06a9-492f-b9e8-992f758894d0/resource/1a4e3f48-3d9b-4769-80e9-3ef6d27852fe/download/ae_hospital_site_list_09_09_2025.csv") |>
  select(2:4)

names(locs) <- c("loc_name", "loc", "postcode") # a bit of renaming to make the names easier

And we can add the postcodes:

locs <- locs |>
  rowwise() |>
  mutate(long = PostcodesioR::postcode_lookup(postcode)$longitude,
         lat = PostcodesioR::postcode_lookup(postcode)$latitude) # adding in some location information

We can then join our three datasets together to give us data with the NHS Board names, A&E names, and locations:

ae_activity_locs <- ae_activity |>
  filter(attend == "All") |>
  left_join(boards, by = join_by(hb == HB)) |>
  filter(HBName %in% boards_sample) |>
  select(date, HBName, loc, type, n, contains("n_")) |>
  mutate(date = ymd(date)) |>
  left_join(locs) 
Data with NHS board, location, and location names
date HBName loc type n n_in_4 n_4 n_8 n_12 loc_name postcode long lat
2020-07-26 NHS Highland H202H Type 1 667 622 45 1 0 Raigmore Hospital IV2 3UJ -4.192470 57.47381
2016-05-08 NHS Borders B120H Type 1 586 564 22 0 0 Borders General Hospital TD6 9BS -2.741945 55.59548
2021-01-10 NHS Grampian N121H Type 1 138 137 1 0 0 Royal Aberdeen Children’s Hospital AB252ZG -2.135214 57.15358
2019-12-15 NHS Lanarkshire L302H Type 1 1224 790 434 138 56 University Hospital Hairmyres G75 8RG -4.221963 55.75982
2023-05-28 NHS Grampian N411H Type 1 607 495 112 15 1 Dr Gray’s Hospital IV301SN -3.329946 57.64538

Basic plots

library(ggplot2)

ae_activity_locs |>
  filter(HBName == "NHS Highland") |>
  ggplot() +
  geom_line(aes(x = date, y = n, colour = loc_name))

Looking across different measures

library(tidyr)

ae_activity_locs |>
  filter(loc_name == "Raigmore Hospital") |>
  select(date, n_in_4:n_12) |>
  pivot_longer(-date) |>
  group_by(date = floor_date(date, "month"), name) |>
  summarise(value = mean(value)) |>
  ggplot(aes(x = date, y = value, colour = name)) +
  geom_line() +
  geom_smooth(se = F)

Making that re-usable

graphmo <- function(hbname = "NHS Highland"){
  
  ae_activity_locs |>
    filter(HBName %in% hbname) |>
    ggplot() +
    geom_line(aes(x = date, y = n, colour = loc_name)) +
    theme(legend.position = "bottom") +
    xlab("Location") +
    labs(colour = "") # hide the label

}

graphmo(c("NHS Grampian", "NHS Fife"))

A rubbish map

We’ve got latitude and longitude information for our A&E sites, which means we can plot them on a map:

ae_activity_locs |>
  ggplot(aes(x = long, y = lat, label = loc_name, colour = HBName)) +
  geom_point() +
  theme_void() +
  theme(legend.position = "none")

That’s not the most useful map I’ve ever seen. Luckily, there’s a package to help us:

Add to a map

ae_activity_locs |>
    leaflet::leaflet() |>
    leaflet::addTiles() |>
    leaflet::addMarkers(~long, ~lat, label = ~loc_name)

Simple map

Then make that map more useful

ae_activity_locs |>
    group_by(loc_name) |>
    summarise(n = sum(n), long = min(long), lat = min(lat)) |>
    mutate(rate = paste(loc_name, "averages", n, "attendees")) |>
    leaflet::leaflet() |>
    leaflet::addTiles() |>
    leaflet::addMarkers(~long, ~lat, label = ~rate)

More useful map

Strengths

R offers enormous scope and flexibility, largely because of two features. First, R is based on the idea of packages, where you’re encouraged to outsource specialist functions to your R installation in a repeatable and standard way. There’s basically a package for everything - something over 20000 at present. Second, R encourages reproducible analytics: the idea being you write your script once, and then run it many times as your data changes, producing standardised outputs by design.

Together, that design makes R a force-multiplier for fancier data work: use packages to replicate your existing work in a reproducible way, then use the time saved in your routine reporting to improve and extend the work. There are other features of code-based analytics which make collaborating and developing more complex projects typically much smoother than they would be in non-code tools like Excel.

Weaknesses

  • it’s code, and it takes some time (months to years) to achieve real fluency
  • potentially harder to learn than some competitor languages and tools (Power BI, Python)
  • very patchy expertise across H+SC Scotland
  • complex IG landscape
  • messy skills development journey