No feedback found for this session
Scope of the possible with R
Welcome
- this session is a non-technical overview designed for service leads
Session outline
- Introducing R, and a bit of chat about the aims of this session
- Practical demo - take some data, load, tidy, analyse, produce outputs
- Strengths and weaknesses
- obvious
- less obvious
- Alternatives
- Skill development
Introducing R
- free and open-source statistical programming language
- multi-platform
- large user base
- prominent in health, industry, biosciences
Why this session?
- R can be confusing
- it’s code-based, and most of us don’t have much code experience
- it’s used for some inherently complicated tasks
- it’s a big product with lots of add-ons and oddities
- But R is probably the best general-purpose toolbox we have for data work at present
- big user base in health and social care
- focus on health and care-like applications
- not that hard to learn
- extensible and flexible
- capable of enterprise-y, fancy uses
- yet there’s signficant resistant to using R in parts of Scotland’s health and care sector
R demo
- this is about showing what’s possible, and give you a flavour of how R works
- we won’t explain code in detail during this session
- using live open data
Hello world!
There are lots of ways to run R. In this session, we’ll demonstrate using the Rstudio Desktop IDE on Windows.
"Hello world!"[1] “Hello world!”
Packages
R is highly extensible using packages: collections of useful code
Create variables
url <- "https://www.opendata.nhs.scot/dataset/0d57311a-db66-4eaa-bd6d-cc622b6cbdfa/resource/a5f7ca94-c810-41b5-a7c9-25c18d43e5a4/download/weekly_ae_activity_20251123.csv"Load some open data
That’s a link to data about weekly A+E activity. It’s large-ish (approximately 40000 rows)
ae_activity <- read_csv(url)One small bit of cheating: renaming
Preview
| date | country | hb | loc | type | attend | n | n_in_4 | n_4 | perc_4 | n_8 | perc_8 | n_12 | perc_12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20170430 | S92000003 | S08000032 | L106H | Type 1 | Unplanned | 1383 | 1266 | 117 | 91.5 | 1 | 0.1 | 0 | 0.0 |
| 20250914 | S92000003 | S08000031 | C313H | Type 1 | New planned | 1 | 1 | 0 | 100.0 | 0 | 0.0 | 0 | 0.0 |
| 20160313 | S92000003 | S08000025 | R103H | Type 1 | Unplanned | 102 | 102 | 0 | 100.0 | 0 | 0.0 | 0 | 0.0 |
| 20230903 | S92000003 | S08000028 | W107H | Type 1 | All | 137 | 134 | 3 | 97.8 | 0 | 0.0 | 0 | 0.0 |
| 20220828 | S92000003 | S08000017 | Y144H | Type 1 | All | 312 | 282 | 30 | 90.4 | 4 | 1.3 | 1 | 0.3 |
Removing data
| date | hb | loc | type | attend | n | n_in_4 | n_4 | n_8 | n_12 |
|---|---|---|---|---|---|---|---|---|---|
| 20160221 | S08000019 | V217H | Type 1 | All | 1184 | 1112 | 72 | 1 | 0 |
| 20191117 | S08000020 | N101H | Type 1 | All | 1295 | 1094 | 201 | 11 | 1 |
| 20190127 | S08000024 | S314H | Type 1 | Unplanned | 2333 | 2037 | 296 | 21 | 4 |
| 20200621 | S08000022 | H202H | Type 1 | All | 512 | 495 | 17 | 1 | 0 |
| 20220724 | S08000020 | N101H | Type 1 | All | 1018 | 538 | 480 | 130 | 23 |
Tidying data
| date | hb | loc | type | attend | n | n_in_4 | n_4 | n_8 | n_12 |
|---|---|---|---|---|---|---|---|---|---|
| 2018-04-08 | S08000031 | C313H | Type 1 | All | 593 | 558 | 35 | 3 | 0 |
| 2015-09-06 | S08000032 | L302H | Type 1 | All | 1226 | 1167 | 59 | 3 | 0 |
| 2018-06-24 | S08000020 | N121H | Type 1 | All | 369 | 366 | 3 | 0 | 0 |
| 2024-06-23 | S08000022 | H103H | Type 1 | Unplanned | 203 | 160 | 43 | 19 | 15 |
| 2019-12-08 | S08000015 | A210H | Type 1 | Unplanned | 666 | 433 | 233 | 87 | 48 |
Subset data
We’ll take a selection of 5 health boards to keep things tidy:
boards_sample <- c("NHS Borders", "NHS Fife", "NHS Grampian", "NHS Highland", "NHS Lanarkshire")Joining data
Those board codes (like S08000020) aren’t very easy to read. Luckily, we can add the proper “NHS Thing & Thing” board names from another data source.
boards <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv")| HB | HBName | HBDateEnacted | HBDateArchived | Country |
|---|---|---|---|---|
| S08000029 | NHS Fife | 20180202 | NA | S92000003 |
| S08000026 | NHS Shetland | 20140401 | NA | S92000003 |
| S08000024 | NHS Lothian | 20140401 | NA | S92000003 |
| S08000018 | NHS Fife | 20140401 | 20180201 | S92000003 |
| S08000022 | NHS Highland | 20140401 | NA | S92000003 |
We can do something very similar with the A&E locations:
And we can add the postcodes:
We can then join our three datasets together to give us data with the NHS Board names, A&E names, and locations:
| date | HBName | loc | type | n | n_in_4 | n_4 | n_8 | n_12 | loc_name | postcode | long | lat |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2020-02-16 | NHS Highland | C121H | Type 1 | 102 | 99 | 3 | 0 | 0 | Lorn & Islands Hospital | PA344HH | -5.474916 | 56.40040 |
| 2022-08-28 | NHS Highland | C121H | Type 1 | 204 | 187 | 17 | 1 | 0 | Lorn & Islands Hospital | PA344HH | -5.474916 | 56.40040 |
| 2021-02-28 | NHS Lanarkshire | L308H | Type 1 | 1137 | 906 | 231 | 46 | 16 | University Hospital Wishaw | ML2 0DP | -3.941738 | 55.77369 |
| 2021-03-07 | NHS Lanarkshire | L308H | Type 1 | 1178 | 963 | 215 | 32 | 6 | University Hospital Wishaw | ML2 0DP | -3.941738 | 55.77369 |
| 2020-03-22 | NHS Grampian | N411H | Type 1 | 332 | 312 | 20 | 3 | 1 | Dr Gray’s Hospital | IV301SN | -3.329946 | 57.64538 |
Basic plots

Looking across different measures
library(tidyr)
ae_activity_locs |>
filter(loc_name == "Raigmore Hospital") |>
select(date, n_in_4:n_12) |>
pivot_longer(-date) |>
group_by(date = floor_date(date, "month"), name) |>
summarise(value = mean(value)) |>
ggplot(aes(x = date, y = value, colour = name)) +
geom_line() +
geom_smooth(se = F)
Making that re-usable

A rubbish map
We’ve got latitude and longitude information for our A&E sites, which means we can plot them on a map:
ae_activity_locs |>
ggplot(aes(x = long, y = lat, label = loc_name, colour = HBName)) +
geom_point() +
theme_void() +
theme(legend.position = "none")
That’s not the most useful map I’ve ever seen. Luckily, there’s a package to help us:
Add to a map
ae_activity_locs |>
leaflet::leaflet() |>
leaflet::addTiles() |>
leaflet::addMarkers(~long, ~lat, label = ~loc_name)
Then make that map more useful

Strengths
R offers enormous scope and flexibility, largely because of two features. First, R is based on the idea of packages, where you’re encouraged to outsource specialist functions to your R installation in a repeatable and standard way. There’s basically a package for everything - something over 20000 at present. Second, R encourages reproducible analytics: the idea being you write your script once, and then run it many times as your data changes, producing standardised outputs by design.
Together, that design makes R a force-multiplier for fancier data work: use packages to replicate your existing work in a reproducible way, then use the time saved in your routine reporting to improve and extend the work. There are other features of code-based analytics which make collaborating and developing more complex projects typically much smoother than they would be in non-code tools like Excel.
Weaknesses
- it’s code, and it takes some time (months to years) to achieve real fluency
- potentially harder to learn than some competitor languages and tools (Power BI, Python)
- very patchy expertise across H+SC Scotland
- complex IG landscape
- messy skills development journey