No feedback found for this session
Scope of the possible with R
Welcome
- this session is a non-technical overview designed for service leads
Session outline
- Introducing R, and a bit of chat about the aims of this session
- Practical demo - take some data, load, tidy, analyse, produce outputs
- Strengths and weaknesses
- obvious
- less obvious
- Alternatives
- Skill development
Introducing R
- free and open-source statistical programming language
- multi-platform
- large user base
- prominent in health, industry, biosciences
Why this session?
- R can be confusing
- it’s code-based, and most of us don’t have much code experience
- it’s used for some inherently complicated tasks
- it’s a big product with lots of add-ons and oddities
- But R is probably the best general-purpose toolbox we have for data work at present
- big user base in health and social care
- focus on health and care-like applications
- not that hard to learn
- extensible and flexible
- capable of enterprise-y, fancy uses
- yet there’s signficant resistant to using R in parts of Scotland’s health and care sector
R demo
- this is about showing what’s possible, and give you a flavour of how R works
- we won’t explain code in detail during this session
- using live open data
Hello world!
There are lots of ways to run R. In this session, we’ll demonstrate using the Rstudio Desktop IDE on Windows.
"Hello world!"[1] “Hello world!”
Packages
R is highly extensible using packages: collections of useful code
Create variables
url <- "https://www.opendata.nhs.scot/dataset/0d57311a-db66-4eaa-bd6d-cc622b6cbdfa/resource/a5f7ca94-c810-41b5-a7c9-25c18d43e5a4/download/weekly_ae_activity_20251123.csv"Load some open data
That’s a link to data about weekly A+E activity. It’s large-ish (approximately 40000 rows)
ae_activity <- read_csv(url)One small bit of cheating: renaming
Preview
| date | country | hb | loc | type | attend | n | n_in_4 | n_4 | perc_4 | n_8 | perc_8 | n_12 | perc_12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20171210 | S92000003 | S08000022 | H202H | Type 1 | All | 600 | 570 | 30 | 95.0 | 9 | 1.5 | 0 | 0.0 |
| 20150802 | S92000003 | S08000022 | H202H | Type 1 | All | 646 | 616 | 30 | 95.4 | 1 | 0.2 | 0 | 0.0 |
| 20210207 | S92000003 | S08000015 | A210H | Type 1 | All | 430 | 367 | 63 | 85.3 | 19 | 4.4 | 6 | 1.4 |
| 20210207 | S92000003 | S08000030 | T101H | Type 1 | New planned | 26 | 26 | 0 | 100.0 | 0 | 0.0 | 0 | 0.0 |
| 20220626 | S92000003 | S08000031 | G405H | Type 1 | Unplanned | 1809 | 756 | 1053 | 41.8 | 302 | 16.7 | 71 | 3.9 |
Removing data
| date | hb | loc | type | attend | n | n_in_4 | n_4 | n_8 | n_12 |
|---|---|---|---|---|---|---|---|---|---|
| 20180916 | S08000020 | N411H | Type 1 | Unplanned | 479 | 463 | 16 | 0 | 0 |
| 20250316 | S08000032 | L302H | Type 1 | New planned | 1 | 1 | 0 | 0 | 0 |
| 20230219 | S08000024 | S314H | Type 1 | All | 2246 | 914 | 1332 | 699 | 411 |
| 20250302 | S08000020 | N121H | Type 1 | All | 394 | 352 | 42 | 0 | 0 |
| 20170423 | S08000032 | L106H | Type 1 | All | 1351 | 1288 | 63 | 2 | 0 |
Tidying data
| date | hb | loc | type | attend | n | n_in_4 | n_4 | n_8 | n_12 |
|---|---|---|---|---|---|---|---|---|---|
| 2018-04-08 | S08000029 | F704H | Type 1 | All | 1226 | 1185 | 41 | 0 | 0 |
| 2023-01-08 | S08000022 | H212H | Type 1 | New planned | 3 | 3 | 0 | 0 | 0 |
| 2022-12-11 | S08000030 | T101H | Type 1 | Unplanned | 1219 | 926 | 293 | 32 | 3 |
| 2019-05-12 | S08000026 | Z102H | Type 1 | All | 140 | 136 | 4 | 0 | 0 |
| 2017-08-06 | S08000024 | S314H | Type 1 | All | 2240 | 2114 | 126 | 28 | 3 |
Subset data
We’ll take a selection of 5 health boards to keep things tidy:
boards_sample <- c("NHS Borders", "NHS Fife", "NHS Grampian", "NHS Highland", "NHS Lanarkshire")Joining data
Those board codes (like S08000020) aren’t very easy to read. Luckily, we can add the proper “NHS Thing & Thing” board names from another data source.
boards <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv")| HB | HBName | HBDateEnacted | HBDateArchived | Country |
|---|---|---|---|---|
| S08000015 | NHS Ayrshire and Arran | 20140401 | NA | S92000003 |
| S08000027 | NHS Tayside | 20140401 | 20180201 | S92000003 |
| S08000030 | NHS Tayside | 20180202 | NA | S92000003 |
| S08000017 | NHS Dumfries and Galloway | 20140401 | NA | S92000003 |
| S08000028 | NHS Western Isles | 20140401 | NA | S92000003 |
We can do something very similar with the A&E locations:
And we can add the postcodes:
We can then join our three datasets together to give us data with the NHS Board names, A&E names, and locations:
| date | HBName | loc | type | n | n_in_4 | n_4 | n_8 | n_12 | loc_name | postcode | long | lat |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2023-09-17 | NHS Lanarkshire | L106H | Type 1 | 1374 | 916 | 458 | 90 | 18 | University Hospital Monklands | ML6 0JS | -3.999588 | 55.86588 |
| 2025-05-11 | NHS Borders | B120H | Type 1 | 663 | 472 | 191 | 40 | 15 | Borders General Hospital | TD6 9BS | -2.741945 | 55.59548 |
| 2018-05-20 | NHS Highland | H202H | Type 1 | 692 | 663 | 29 | 1 | 0 | Raigmore Hospital | IV2 3UJ | -4.192470 | 57.47381 |
| 2015-05-31 | NHS Highland | H212H | Type 1 | 198 | 192 | 6 | 1 | 0 | Belford Hospital | PH336BS | -5.104701 | 56.81941 |
| 2017-06-25 | NHS Fife | F704H | Type 1 | 1308 | 1245 | 63 | 2 | 0 | Victoria Hospital (NHS Fife) | KY2 5AH | -3.160138 | 56.12511 |
Basic plots

Looking across different measures
library(tidyr)
ae_activity_locs |>
filter(loc_name == "Raigmore Hospital") |>
select(date, n_in_4:n_12) |>
pivot_longer(-date) |>
group_by(date = floor_date(date, "month"), name) |>
summarise(value = mean(value)) |>
ggplot(aes(x = date, y = value, colour = name)) +
geom_line() +
geom_smooth(se = F)
Making that re-usable

A rubbish map
We’ve got latitude and longitude information for our A&E sites, which means we can plot them on a map:
ae_activity_locs |>
ggplot(aes(x = long, y = lat, label = loc_name, colour = HBName)) +
geom_point() +
theme_void() +
theme(legend.position = "none")
That’s not the most useful map I’ve ever seen. Luckily, there’s a package to help us:
Add to a map
ae_activity_locs |>
leaflet::leaflet() |>
leaflet::addTiles() |>
leaflet::addMarkers(~long, ~lat, label = ~loc_name)
Then make that map more useful

Strengths
R offers enormous scope and flexibility, largely because of two features. First, R is based on the idea of packages, where you’re encouraged to outsource specialist functions to your R installation in a repeatable and standard way. There’s basically a package for everything - something over 20000 at present. Second, R encourages reproducible analytics: the idea being you write your script once, and then run it many times as your data changes, producing standardised outputs by design.
Together, that design makes R a force-multiplier for fancier data work: use packages to replicate your existing work in a reproducible way, then use the time saved in your routine reporting to improve and extend the work. There are other features of code-based analytics which make collaborating and developing more complex projects typically much smoother than they would be in non-code tools like Excel.
Weaknesses
- it’s code, and it takes some time (months to years) to achieve real fluency
- potentially harder to learn than some competitor languages and tools (Power BI, Python)
- very patchy expertise across H+SC Scotland
- complex IG landscape
- messy skills development journey