Scope of the possible with R

R
overview
Published

December 9, 2025

No feedback found for this session

Welcome

  • this session is a non-technical overview designed for service leads

Session outline

  • Introducing R, and a bit of chat about the aims of this session
  • Practical demo - take some data, load, tidy, analyse, produce outputs
  • Strengths and weaknesses
    • obvious
    • less obvious
  • Alternatives
  • Skill development

Introducing R

  • free and open-source statistical programming language
  • multi-platform
  • large user base
  • prominent in health, industry, biosciences

Why this session?

  • R can be confusing
    • it’s code-based, and most of us don’t have much code experience
    • it’s used for some inherently complicated tasks
    • it’s a big product with lots of add-ons and oddities
  • But R is probably the best general-purpose toolbox we have for data work at present
    • big user base in health and social care
    • focus on health and care-like applications
    • not that hard to learn
    • extensible and flexible
    • capable of enterprise-y, fancy uses
  • yet there’s signficant resistant to using R in parts of Scotland’s health and care sector

R demo

  • this is about showing what’s possible, and give you a flavour of how R works
  • we won’t explain code in detail during this session
  • using live open data https://www.opendata.nhs.scot/dataset/weekly-accident-and-emergency-activity-and-waiting-times

Hello world!

There are lots of ways to run R. In this session, we’ll demonstrate using the Rstudio Desktop IDE on Windows.

"Hello world!"

[1] “Hello world!”

Packages

R is highly extensible using packages: collections of useful code

library(readr) # plus, nice to see R's folksy daft names for things in action

Create variables

url <- "https://www.opendata.nhs.scot/dataset/0d57311a-db66-4eaa-bd6d-cc622b6cbdfa/resource/a5f7ca94-c810-41b5-a7c9-25c18d43e5a4/download/weekly_ae_activity_20251123.csv"

Load some open data

That’s a link to data about weekly A+E activity. It’s large-ish (approximately 40000 rows)

ae_activity <- read_csv(url)

One small bit of cheating: renaming

names(ae_activity) <- c("date", "country", "hb", "loc", "type", "attend", "n", "n_in_4", "n_4", "perc_4", "n_8", "perc_8", "n_12", "perc_12")

Preview

Preview of data
date country hb loc type attend n n_in_4 n_4 perc_4 n_8 perc_8 n_12 perc_12
20170430 S92000003 S08000032 L106H Type 1 Unplanned 1383 1266 117 91.5 1 0.1 0 0.0
20250914 S92000003 S08000031 C313H Type 1 New planned 1 1 0 100.0 0 0.0 0 0.0
20160313 S92000003 S08000025 R103H Type 1 Unplanned 102 102 0 100.0 0 0.0 0 0.0
20230903 S92000003 S08000028 W107H Type 1 All 137 134 3 97.8 0 0.0 0 0.0
20220828 S92000003 S08000017 Y144H Type 1 All 312 282 30 90.4 4 1.3 1 0.3

Removing data

library(dplyr)

ae_activity <- ae_activity |>
    select(!c(country, contains("perc_")))
Preview of data
date hb loc type attend n n_in_4 n_4 n_8 n_12
20160221 S08000019 V217H Type 1 All 1184 1112 72 1 0
20191117 S08000020 N101H Type 1 All 1295 1094 201 11 1
20190127 S08000024 S314H Type 1 Unplanned 2333 2037 296 21 4
20200621 S08000022 H202H Type 1 All 512 495 17 1 0
20220724 S08000020 N101H Type 1 All 1018 538 480 130 23

Tidying data

library(lubridate)

ae_activity <- ae_activity |>
    mutate(date = ymd(date))
Preview of data
date hb loc type attend n n_in_4 n_4 n_8 n_12
2018-04-08 S08000031 C313H Type 1 All 593 558 35 3 0
2015-09-06 S08000032 L302H Type 1 All 1226 1167 59 3 0
2018-06-24 S08000020 N121H Type 1 All 369 366 3 0 0
2024-06-23 S08000022 H103H Type 1 Unplanned 203 160 43 19 15
2019-12-08 S08000015 A210H Type 1 Unplanned 666 433 233 87 48

Subset data

We’ll take a selection of 5 health boards to keep things tidy:

boards_sample <- c("NHS Borders", "NHS Fife", "NHS Grampian", "NHS Highland", "NHS Lanarkshire")

Joining data

Those board codes (like S08000020) aren’t very easy to read. Luckily, we can add the proper “NHS Thing & Thing” board names from another data source.

boards <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv")
NHS boards
HB HBName HBDateEnacted HBDateArchived Country
S08000029 NHS Fife 20180202 NA S92000003
S08000026 NHS Shetland 20140401 NA S92000003
S08000024 NHS Lothian 20140401 NA S92000003
S08000018 NHS Fife 20140401 20180201 S92000003
S08000022 NHS Highland 20140401 NA S92000003

We can do something very similar with the A&E locations:

locs <- read_csv("https://www.opendata.nhs.scot/dataset/a877470a-06a9-492f-b9e8-992f758894d0/resource/1a4e3f48-3d9b-4769-80e9-3ef6d27852fe/download/ae_hospital_site_list_09_09_2025.csv") |>
  select(2:4)

names(locs) <- c("loc_name", "loc", "postcode") # a bit of renaming to make the names easier

And we can add the postcodes:

locs <- locs |>
  rowwise() |>
  mutate(long = PostcodesioR::postcode_lookup(postcode)$longitude,
         lat = PostcodesioR::postcode_lookup(postcode)$latitude) # adding in some location information

We can then join our three datasets together to give us data with the NHS Board names, A&E names, and locations:

ae_activity_locs <- ae_activity |>
  filter(attend == "All") |>
  left_join(boards, by = join_by(hb == HB)) |>
  filter(HBName %in% boards_sample) |>
  select(date, HBName, loc, type, n, contains("n_")) |>
  mutate(date = ymd(date)) |>
  left_join(locs) 
Data with NHS board, location, and location names
date HBName loc type n n_in_4 n_4 n_8 n_12 loc_name postcode long lat
2020-02-16 NHS Highland C121H Type 1 102 99 3 0 0 Lorn & Islands Hospital PA344HH -5.474916 56.40040
2022-08-28 NHS Highland C121H Type 1 204 187 17 1 0 Lorn & Islands Hospital PA344HH -5.474916 56.40040
2021-02-28 NHS Lanarkshire L308H Type 1 1137 906 231 46 16 University Hospital Wishaw ML2 0DP -3.941738 55.77369
2021-03-07 NHS Lanarkshire L308H Type 1 1178 963 215 32 6 University Hospital Wishaw ML2 0DP -3.941738 55.77369
2020-03-22 NHS Grampian N411H Type 1 332 312 20 3 1 Dr Gray’s Hospital IV301SN -3.329946 57.64538

Basic plots

library(ggplot2)

ae_activity_locs |>
  filter(HBName == "NHS Highland") |>
  ggplot() +
  geom_line(aes(x = date, y = n, colour = loc_name))

Looking across different measures

library(tidyr)

ae_activity_locs |>
  filter(loc_name == "Raigmore Hospital") |>
  select(date, n_in_4:n_12) |>
  pivot_longer(-date) |>
  group_by(date = floor_date(date, "month"), name) |>
  summarise(value = mean(value)) |>
  ggplot(aes(x = date, y = value, colour = name)) +
  geom_line() +
  geom_smooth(se = F)

Making that re-usable

graphmo <- function(hbname = "NHS Highland"){
  
  ae_activity_locs |>
    filter(HBName %in% hbname) |>
    ggplot() +
    geom_line(aes(x = date, y = n, colour = loc_name)) +
    theme(legend.position = "bottom") +
    xlab("Location") +
    labs(colour = "") # hide the label

}

graphmo(c("NHS Grampian", "NHS Fife"))

A rubbish map

We’ve got latitude and longitude information for our A&E sites, which means we can plot them on a map:

ae_activity_locs |>
  ggplot(aes(x = long, y = lat, label = loc_name, colour = HBName)) +
  geom_point() +
  theme_void() +
  theme(legend.position = "none")

That’s not the most useful map I’ve ever seen. Luckily, there’s a package to help us:

Add to a map

ae_activity_locs |>
    leaflet::leaflet() |>
    leaflet::addTiles() |>
    leaflet::addMarkers(~long, ~lat, label = ~loc_name)

Simple map

Then make that map more useful

ae_activity_locs |>
    group_by(loc_name) |>
    summarise(n = sum(n), long = min(long), lat = min(lat)) |>
    mutate(rate = paste(loc_name, "averages", n, "attendees")) |>
    leaflet::leaflet() |>
    leaflet::addTiles() |>
    leaflet::addMarkers(~long, ~lat, label = ~rate)

More useful map

Strengths

R offers enormous scope and flexibility, largely because of two features. First, R is based on the idea of packages, where you’re encouraged to outsource specialist functions to your R installation in a repeatable and standard way. There’s basically a package for everything - something over 20000 at present. Second, R encourages reproducible analytics: the idea being you write your script once, and then run it many times as your data changes, producing standardised outputs by design.

Together, that design makes R a force-multiplier for fancier data work: use packages to replicate your existing work in a reproducible way, then use the time saved in your routine reporting to improve and extend the work. There are other features of code-based analytics which make collaborating and developing more complex projects typically much smoother than they would be in non-code tools like Excel.

Weaknesses

  • it’s code, and it takes some time (months to years) to achieve real fluency
  • potentially harder to learn than some competitor languages and tools (Power BI, Python)
  • very patchy expertise across H+SC Scotland
  • complex IG landscape
  • messy skills development journey