No feedback found for this session
Why bother with R?
Session outline
- this session is 🥬 - a non-technical introduction for pre-beginners
- two minute overview of R
- one minute history of R
- five use cases in H&SC
- R code and practice and a bit of R vs not-R
- next steps and training
A brief overview of R
R is a free software environment for statistical computing and graphics (r-project.org)
- free and open-source
- multiplatform
- large user base
- prominent in health, industry, biosciences
Who owns R?
Architecture
- R is modular
- 20 thousand + packages
- Enormous scope of applications
- Design questions often hinge on finding the right package
- e.g. this presentation was created in R Quarto using Revealjs
- So there’s a good reason that R can be confusing: it’s not one thing, but many
A brief history of R
it starts with the S programming language
Aimed: to “turn ideas into software, quickly and faithfully” (Chambers 1998)
Robert Gentleman and Ross Ihaka (Ihaka 1998)
- Interest in extending the tools available in S
- Early transition to free software model
- 1997 “core group”
Use cases
Statistical testing
college | finrela |
---|---|
degree | below average |
no degree | below average |
degree | below average |
library(infer)
chisq_test(gss, college ~ finrela) |>
::kable() knitr
statistic | chisq_df | p_value |
---|---|---|
30.68252 | 5 | 1.08e-05 |
chi-squared test of independence using (Couch et al. 2021)
PRISMA 2020 flow diagrams
PRISMA flow diagram using PRISMA2020 (Haddaway, Pritchard, and McGuinness 2021)
Mapping
Choropleth map using geojsonio
Posters
Conference poster using posterdown (Thorne 2019), waffle (Rudis and Gandy 2017), and ggalluvial (Brunson and Read 2020)
Natural language processing / wordclouds
Sentiment analysis with wordcloud using tidytext (Silge and Robinson 2016), lexicon, sentimentr, and wordcloud
Uses
Who is using it?
- PHS
- widely across SG
- sporadically used across NHSS boards
Why are they using it?
- to replace troublesome systems
- SPSS and other propriatary systems (licencing costs)
- replacing end-of-life systems (Access e.g.)
- to harmonise analysis across a team (common analysis platform)
- SG committment to open-source code under DSSS
Simplifying analysis
(looking at you, Excel…)
- R separates data from analysis
- “there’s a package for…”
- reproducible analysis
- collaboration
- XmR charts in Excel and in R are a useful object lesson…
XmR charts in Excel (front)
XmR charts in Excel (back)
XmR charts (R)
XmR charts in R
library(NHSRplotthedots)
::read_csv("data/spc/spc2.csv") |>
readrmutate(date = lubridate::dmy(date)) |>
ptd_spc(members, date, target=50, fix_after_n_points=20)
R and not-R
- Excel - pain curve…
- Power BI - dashboard, data, built-in to M365 licences
- Tableau - specialist, cost
- Python
- (hundreds of proprietary data analysis platforms)
Pain curves
R code in practice
- really interesting data on poisonous books (via Data is Plural)
# load the downloaded data
<- readr::read_csv("data/ArsenicalBooks_CSVtoshare_2024-05-04.xlsx - EmeraldGreenBooks.csv") |>
books mutate(Date = as.numeric(Date))
- we’ve got data about 266 books
- the oldest book in the set is from 1797, and the youngest from 1892
Where are the poisonous bits of books?
|>
books count(`Arsenical Material`, sort = T) |>
slice(1:7) |>
::kable() knitr
Arsenical Material | n |
---|---|
bookcloth | 130 |
paper cover | 49 |
decorative green onlay | 19 |
textblock edges | 16 |
paper covering | 5 |
paper over boards | 5 |
paper spine label | 5 |
When were poisonous books a thing?
|>
books ggplot() +
geom_histogram(aes(x = Date), binwidth = 10)
Where were poisonous books a thing?
|>
books ::separate(Imprint, into = c("City", "Publisher"), sep = ": ") |>
tidyrcount(City) |>
slice_max(n, n = 5) |>
::kable() knitr
City | n |
---|---|
London | 69 |
New York | 29 |
Boston | 18 |
Edinburgh | 16 |
Philadelphia | 16 |
Next steps
- self-paced
- Swirl
- R4DS (Wickham and Grolemund 2017)
- write/fail/stackoverflow cycle
- PHS Data Science Knowledge Base
- KIND Learning Network training
R beginners club
- social learning on Teams to get you started in R
- see schedule and materials on our community pages
- all welcome and aimed at complete R beginners