No feedback found for this session
Scope of the possible with R
R
overview
Session materials
- all materials
- slides
html / pdf
Slides for this session / .pdf slides for this session
Welcome
- this session is a non-technical overview designed for service leads
Session outline
- Why R, and why this session?
- R demo - take some data, load, tidy, analyse
- Strengths and weaknesses
- obvious
- less obvious
- Alternatives
- Skill development
R
- free and open-source
- multi-platform
- large user base
- prominent in health, industry, biosciences
Why this session?
- R can be confusing
- it’s code-based, and most of us don’t have much code experience
- it’s used for some inherently complicated tasks
- it’s a big product with lots of add-ons and oddities
- But R is probably the best general-purpose toolbox we have for data work at present
- big user base in health and social care
- focus on health and care-like applications
- not that hard to learn
- extensible and flexible
- capable of enterprise-y, fancy uses
R demo
- this is about showing what’s possible, and give you a flavour of how R works
- we won’t explain code in detail during this session
- using live open data
Load that data
library(readr)
<- read_csv("data/weekly_ae_activity_20240609.csv") ae_activity
One small bit of cheating: renaming
names(ae_activity) <- c("date", "country", "hb", "loc", "type", "attend", "n_within", "n_4", "perc_4", "n_8", "perc_8", "n_12", "perc_12")
Preview
date | country | hb | loc | type | attend | n_within | n_4 | perc_4 | n_8 | perc_8 | n_12 | perc_12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
20220109 | S92000003 | S08000024 | S319H | Emergency Department | 733 | 695 | 38 | 94.8 | 1 | 0.1 | 0 | 0.0 |
20240310 | S92000003 | S08000024 | S314H | Emergency Department | 2273 | 1078 | 1195 | 47.4 | 384 | 16.9 | 177 | 7.8 |
20170611 | S92000003 | S08000024 | S314H | Emergency Department | 2265 | 2181 | 84 | 96.3 | 7 | 0.3 | 1 | 0.0 |
20190310 | S92000003 | S08000028 | W107H | Emergency Department | 120 | 118 | 2 | 98.3 | 0 | 0.0 | 0 | 0.0 |
20180617 | S92000003 | S08000022 | H212H | Emergency Department | 246 | 235 | 11 | 95.5 | 0 | 0.0 | 0 | 0.0 |
Removing data
<- ae_activity |>
ae_activity select(!c(country, contains("perc_")))
date | hb | loc | type | attend | n_within | n_4 | n_8 | n_12 |
---|---|---|---|---|---|---|---|---|
20210502 | S08000020 | N101H | Emergency Department | 904 | 752 | 152 | 14 | 1 |
20230115 | S08000022 | C121H | Emergency Department | 138 | 129 | 9 | 0 | 0 |
20220313 | S08000032 | L302H | Emergency Department | 1198 | 744 | 454 | 87 | 41 |
20240414 | S08000016 | B120H | Emergency Department | 584 | 364 | 220 | 63 | 39 |
20240114 | S08000031 | G513H | Emergency Department | 1295 | 1235 | 60 | 0 | 0 |
Tidying data
<- ae_activity |>
ae_activity mutate(date = lubridate::ymd(date))
date | hb | loc | type | attend | n_within | n_4 | n_8 | n_12 |
---|---|---|---|---|---|---|---|---|
2024-01-28 | S08000025 | R103H | Emergency Department | 126 | 117 | 9 | 0 | 0 |
2021-02-21 | S08000024 | S308H | Emergency Department | 831 | 667 | 164 | 41 | 7 |
2020-12-13 | S08000028 | W107H | Emergency Department | 70 | 68 | 2 | 0 | 0 |
2021-05-16 | S08000032 | L106H | Emergency Department | 1412 | 1216 | 196 | 25 | 5 |
2020-01-19 | S08000031 | G513H | Emergency Department | 1197 | 1177 | 20 | 0 | 0 |
Subset data
- we’ll take a random selection of 5 health boards to keep things tidy
<- ae_activity |>
ae_activity filter(hb %in% boards)
date | hb | loc | type | attend | n_within | n_4 | n_8 | n_12 |
---|---|---|---|---|---|---|---|---|
2022-10-23 | S08000019 | V217H | Emergency Department | 1099 | 409 | 690 | 379 | 211 |
2022-12-11 | S08000029 | F704H | Emergency Department | 1356 | 772 | 584 | 168 | 45 |
2024-03-17 | S08000020 | N101H | Emergency Department | 1062 | 474 | 588 | 279 | 69 |
2017-09-17 | S08000020 | N121H | Emergency Department | 304 | 300 | 4 | 0 | 0 |
2021-03-07 | S08000024 | S314H | Emergency Department | 1825 | 1468 | 357 | 39 | 3 |
Basic plots
library(ggplot2)
|>
ae_activity ggplot() +
geom_line(aes(x = date, y = attend, colour = hb, group = loc))
Joining data
|>
ae_activity left_join(read_csv("data/boards_data.csv"), by = c("hb" = "HB")) |>
select(!any_of(c("_id", "HB", "HBDateEnacted", "HBDateArchived", "Country"))) |>
ggplot() +
geom_line(aes(x = date, y = attend, colour = HBName, group = loc))
and again…
Add to a map
|>
ae_activity_loc ::leaflet() |>
leaflet::addTiles() |>
leaflet::addMarkers(~longitude, ~latitude, label = ~HospitalName) leaflet
Then make that map more useful
|>
ae_activity_loc group_by(HospitalName) |>
summarise(attend = sum(attend), n_within = sum(n_within), longitude = min(longitude), latitude = min(latitude)) |>
mutate(rate = paste(HospitalName, "averages", scales::percent(round(n_within / attend, 1)))) |>
::leaflet() |>
leaflet::addTiles() |>
leaflet::addMarkers(~longitude, ~latitude, label = ~rate) leaflet
Then add to reports, dashboards…
Strengths
- enormous scope and flexibility
- a force-multiplier for fancier data work
- helps collaboration within teams, between teams, between orgs
- reproducible analytics
- modular approaches to large projects
- decreasing pain curve: the fancier the project, the better
Weaknesses
- harder to learn than competitors
- very patchy expertise across H+SC Scotland
- complex IG landscape
- messy skills development journey