Scope of the possible with R

overview

Published

August 22, 2025

Session materials

all materials
slides html / pdf

Slides for this session / .pdf slides for this session

No feedback found for this session

Welcome

this session is a non-technical overview designed for service leads

Session outline

Why R, and why this session?
R demo - take some data, load, tidy, analyse
Strengths and weaknesses
- obvious
- less obvious
Alternatives
Skill development

R

free and open-source
multi-platform
large user base
prominent in health, industry, biosciences

Why this session?

R can be confusing
- it’s code-based, and most of us don’t have much code experience
- it’s used for some inherently complicated tasks
- it’s a big product with lots of add-ons and oddities
But R is probably the best general-purpose toolbox we have for data work at present
- big user base in health and social care
- focus on health and care-like applications
- not that hard to learn
- extensible and flexible
- capable of enterprise-y, fancy uses

R demo

this is about showing what’s possible, and give you a flavour of how R works
we won’t explain code in detail during this session
using live open data

Load that data

library(readr)
ae_activity <- read_csv("data/weekly_ae_activity_20240609.csv")

One small bit of cheating: renaming

names(ae_activity) <- c("date", "country", "hb", "loc", "type", "attend", "n_within", "n_4", "perc_4", "n_8", "perc_8", "n_12", "perc_12")

Preview

Preview of data
date	country	hb	loc	type	attend	n_within	n_4	perc_4	n_8	perc_8	n_12	perc_12
20151101	S92000003	S08000029	F704H	Emergency Department	1213	1146	67	94.5	5	0.4	0	0.0
20150419	S92000003	S08000030	T202H	Emergency Department	485	483	2	99.6	1	0.2	1	0.2
20190421	S92000003	S08000022	H212H	Emergency Department	249	236	13	94.8	2	0.8	0	0.0
20230326	S92000003	S08000024	S308H	Emergency Department	1120	777	343	69.4	138	12.3	65	5.8
20170326	S92000003	S08000016	B120H	Emergency Department	532	497	35	93.4	4	0.8	3	0.6

Removing data

ae_activity <- ae_activity |>
    select(!c(country, contains("perc_")))

Preview of data
date	hb	loc	type	attend	n_within	n_4	n_8
20150607	S08000025	R103H	Emergency Department	102	102	0	0
20220403	S08000030	T101H	Emergency Department	1121	1022	99	3
20170514	S08000032	L308H	Emergency Department	1451	1331	120	9
20180826	S08000031	G107H	Emergency Department	1889	1691	198	2
20170827	S08000030	T101H	Emergency Department	1024	997	27	0

Tidying data

ae_activity <- ae_activity |>
    mutate(date = lubridate::ymd(date))

Preview of data
date	hb	loc	type	attend	n_within	n_4	n_8	n_12
2017-04-09	S08000029	F704H	Emergency Department	1281	1137	144	15	0
2020-05-03	S08000019	V217H	Emergency Department	745	714	31	0	0
2022-12-25	S08000022	C121H	Emergency Department	165	146	19	3	1
2021-03-14	S08000030	T101H	Emergency Department	715	689	26	0	0
2021-06-06	S08000016	B120H	Emergency Department	673	585	88	16	6

Subset data

we’ll take a random selection of 5 health boards to keep things tidy

ae_activity <- ae_activity |>
    filter(hb %in% boards)

Preview of data
date	hb	loc	type	attend	n_within	n_4	n_8	n_12
2021-04-04	S08000031	G107H	Emergency Department	1456	1308	148	6	0
2021-01-10	S08000015	A111H	Emergency Department	803	612	191	102	69
2021-02-28	S08000031	C313H	Emergency Department	429	380	49	3	0
2015-05-31	S08000024	S314H	Emergency Department	2227	2055	172	20	0
2021-03-28	S08000031	C418H	Emergency Department	1031	900	131	11	0

Basic plots

library(ggplot2)
ae_activity |>
    ggplot() +
    geom_line(aes(x = date, y = attend, colour = hb, group = loc))

Joining data

ae_activity |>
    left_join(read_csv("data/boards_data.csv"), by = c("hb" = "HB")) |>
    select(!any_of(c("_id", "HB", "HBDateEnacted", "HBDateArchived", "Country"))) |>
    ggplot() +
    geom_line(aes(x = date, y = attend, colour = HBName, group = loc))

and again…

Add to a map

ae_activity_loc |>
    leaflet::leaflet() |>
    leaflet::addTiles() |>
    leaflet::addMarkers(~longitude, ~latitude, label = ~HospitalName)

Then make that map more useful

ae_activity_loc |>
    group_by(HospitalName) |>
    summarise(attend = sum(attend), n_within = sum(n_within), longitude = min(longitude), latitude = min(latitude)) |>
    mutate(rate = paste(HospitalName, "averages", scales::percent(round(n_within / attend, 1)))) |>
    leaflet::leaflet() |>
    leaflet::addTiles() |>
    leaflet::addMarkers(~longitude, ~latitude, label = ~rate)

Then add to reports, dashboards…

Strengths

enormous scope and flexibility
a force-multiplier for fancier data work
- helps collaboration within teams, between teams, between orgs
- reproducible analytics
- modular approaches to large projects
decreasing pain curve: the fancier the project, the better

Weaknesses

harder to learn than competitors
very patchy expertise across H+SC Scotland
complex IG landscape
messy skills development journey