R from scratch

R
beginner
Published

September 24, 2024

Previous attendees have said…

  • 36 previous attendees have left feedback
  • 100% would recommend this session to a colleague
  • 94% said that this session was pitched correctly

Three random comments from previous attendees
  • Thanks for the training. As a neurodivergent learnerI found it really hard to keep up in the previous R session. I however saw your explaination for not recording and I understand why. The idea of notetaking works as it reinforces and also reminds me of parts that I missed. I also found that because I had set up the environment for the last session , it was easier to follow the examples. So it really is a matter of exposure , then familiarity which only comes with pratice and time spent on practicing. Thank you
  • Base R broken down into easy to understand language so it’s not confusing. As an R beginner I found this very helpful and look forward to the next session to build on my knowledge and understanding. I work within Data Management where R hasn’t been geared towards with basic training, as the training available is usually geared towards Analytical staff who have more knowledge of R, and with R becoming utilised more within Data Management these kind of sessions are essential allowing for personal development to help with your workstreams.
  • a Ronseal training session - did exactly as it says on the tin :)

Getting started

  • you’ll need some kind of R setup for this training
    • if you already have some type of R installation available (PHS workbench, Rstudio desktop etc) please feel free to use that
    • otherwise, create a free account on posit.cloud
  • Then confirm that you’re able to log-in to that account using the device and the network connection that you’re going to be using during the training session. Of note, some board VPNs (PHS, Lothian, couple of others) seem to cause trouble, and it can take a bit of time to work out fixes and workarounds.
  • Once logged in, please attempt a 2 minute test (below) so that we can be sure everything will be in working order before the first training session:
  1. sign in to your account at https://posit.cloud/
  2. create a New RStudio Project from the big blue button
    create a New RStudio Project
  3. after 30 seconds or so you should have a new R project open in front of you
    new R project
  4. in the console - the bottom-left part of the window with all the text in - please type demo(image) at the > prompt and press enter
  5. keep pressing enter until you see graphs start appearing (2-3 times should do it)
    demo plots

hello world

  • R has a terminal where you can write code, and R will run it for you and show you the results
  • we’ll start with a string, which is how we keep words in R code (single or double quotes, your choice)
  • type a string in the terminal
"hello world"
[1] "hello world"


“hello world” in the terminal

  • but we usually write R in script files
  • start a new R script
    New R script
  • now press Ctrl + Enter to run your script
"hello world"
[1] "hello world"
  • (for the coders, R has implicit print)
# variable
# assignment operator
hw <- "hello world"
hw
[1] "hello world"
# R is case sensitive
# R runs from top to bottom - you can't use an object until you've made it
try(HW) # a way of running broken code and capturing the error messages it provokes
Error in eval(expr, envir, enclos) : object 'HW' not found
HW <- "HELLO WORLD"
HW
[1] "HELLO WORLD"

functions

# print function
# variables persist
print(hw)
[1] "hello world"
# help for all functions
?print
# look at the help and try to make substr work to pull out "hello"
# arguments
substr(hw, 1, 5)
[1] "hello"
# return value
# assign out of functions
hi <- substr(hw, 1, 5)
# vectors
# combine
c(hw, hw)
[1] "hello world" "hello world"
length(c(hw, hw))
[1] 2
# most functions are vectorised
substr(c(hw, hw), 1, 5)
[1] "hello" "hello"
hh <- substr(c(hw, hw), 1, 5)
# especially nice for maths stuff
# logic
c(4,3,7,55) * 2
[1]   8   6  14 110
c(4,3,7,55) > 10
[1] FALSE FALSE FALSE  TRUE
# indexing
# range operator
hh[1]
[1] "hello"
c("this", "is", "another", "indexing", "example")[3]
[1] "another"
c("this", "is", "another", "indexing", "example")[3:4]
[1] "another"  "indexing"
#length
# typeof
length(hh)
[1] 2
typeof(hh)
[1] "character"

data types and classes

# vectors
# homogenous - only one kind of thing per vector
typeof("this is a string")
[1] "character"
typeof(1L)
[1] "integer"
typeof(1)
[1] "double"
typeof(TRUE)
[1] "logical"
# factors - the odd one
# mainly a way of storing categorical data, especially when you need it in non-alphabetical order
factor(c("thing", "string", "wing", "bling")) # alphabetical
[1] thing  string wing   bling 
Levels: bling string thing wing
ing_things <- factor(c("thing", "string", "wing", "bling"), levels = c("wing", "bling", "string", "thing")) # alphabetical
ing_things
[1] thing  string wing   bling 
Levels: wing bling string thing
ing_things[2]
[1] string
Levels: wing bling string thing
# the list = a vector of vectors
# ragged - can store different kinds of values together
list(hh, hi, hw, ing_things)
[[1]]
[1] "hello" "hello"

[[2]]
[1] "hello"

[[3]]
[1] "hello world"

[[4]]
[1] thing  string wing   bling 
Levels: wing bling string thing
# names
named_list <- list("hw" = hh, 
                   "hi" = hi, 
                   "hw" = hw, 
                   "silly_name" = ing_things)
named_list
$hw
[1] "hello" "hello"

$hi
[1] "hello"

$hw
[1] "hello world"

$silly_name
[1] thing  string wing   bling 
Levels: wing bling string thing
# different indexing required for lists
class(named_list[4]) #gets you a smaller list
[1] "list"
# two easy ways of getting vectors out of lists
named_list$silly_name
[1] thing  string wing   bling 
Levels: wing bling string thing
named_list[[4]]
[1] thing  string wing   bling 
Levels: wing bling string thing
# and you can flatten a list into a vector
unlist(named_list)
          hw1           hw2            hi            hw   silly_name1 
      "hello"       "hello"       "hello" "hello world"           "4" 
  silly_name2   silly_name3   silly_name4 
          "3"           "1"           "2" 

session 1 reminders

  • console vs script (Ctrl + )
  • assign values to variables with <-
  • vectors = simplest R data structure (like an ordered group)
  • functions = R verbs that:
    • are case-sensitive, whitespace-insensitive
    • have arguments (roughly = options)
    • return results
    • are usually vectorised (so can be used over all members of a vector)

make some play data

a <- 2
numbers <- c(3,6,5,4,3)
string <- "just a string"
longer_string <- c("this", "is", "a", "length", "seven", "character", "vector")

core numeric operators and functions

arithmetic

1 + 3
[1] 4
numbers * 5 # they're vectorised
[1] 15 30 25 20 15
4 / 3
[1] 1.333333
5 - numbers
[1]  2 -1  0  1  2
8 ^ 0.5
[1] 2.828427

range operator

The range operator is an easy way of making integer sequences:

1:4
[1] 1 2 3 4
5:2
[1] 5 4 3 2

There’s always a fancier way too:

seq(1,3,0.2)
 [1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0

equals/inequality

Really important for lots of programming things

a <- 2
a == 2
[1] TRUE
a < 3
[1] TRUE
a >= 2
[1] TRUE
a != 2
[1] FALSE
numbers > 3
[1] FALSE  TRUE  TRUE  TRUE FALSE
numbers[numbers > 3] # filtering with equalities/inequalities
[1] 6 5 4

fancy operator bits

5 %% 3 # remainder / modulo for remainder-after-division
[1] 2
5 %/% 3 # integer division
[1] 1

core numeric functions

Note that most of these functions are vectorised, but will require you to use c() if you want to supply your values directly (i.e. if you don’t want to make a variable containing your values first). sum() is a rare exception:

sum(1,5,10) # works okay
[1] 16
sum(c(1,5,10)) # but this works fine too, and is easy
[1] 16
mean(c(1,5,10)) # and is the general way you'll need to work if you're supplying values directly to the function
[1] 5.333333
sqrt(a) # square root
[1] 1.414214
sum(numbers)
[1] 21
cumsum(numbers)
[1]  3  9 14 18 21
sqrt(numbers) # square roots
[1] 1.732051 2.449490 2.236068 2.000000 1.732051
mean(numbers) 
[1] 4.2
median(numbers)
[1] 4
min(numbers)
[1] 3
max(numbers)
[1] 6

For odd reasons, there’s no built-in function to find the statistical mode of some numbers. It can be done, but the code is ugly (and exactly the sort of thing we’d usually avoid in beginner’s sessions). Included here for interest only:

# mode
as.numeric(names(sort(-table(numbers)))[1])
[1] 3

There are also a few other fairly basic functions that you might find helpful:

sd(numbers) # standard deviation
[1] 1.30384
range(numbers) # min and max in one
[1] 3 6
summary(numbers) # good for rapid numeric summaries
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    3.0     3.0     4.0     4.2     5.0     6.0 
table(numbers) # good for finding out what you've got in more complicated vectors
numbers
3 4 5 6 
2 1 1 1 

interlude: joining code together

There are three main ways of doing this. Traditionally, you’d bracket together several functions, and read from the inside out. Fastest to write, hardest to read and fix:

round(sqrt(c(1,5,10)))
[1] 1 2 3

or you can make intervening variables. Messy, but good if you need to be extra careful:

n <- c(1,5,10)
o <- sqrt(n)
p <- round(o)
p
[1] 1 2 3

or, probably the best way, pipe the code together. Ctrl + Shift + m will give you a pipe symbol:

c(1,5,10) |>
  sqrt() |>
  round()
[1] 1 2 3

Note that the pipe method doesn’t automatically save your output. You’ll need to assign with <- to do that:

temp <- c(1,5,10) |>
  sqrt() |>
  round()

temp
[1] 1 2 3

basic string functions

tolower(hw)
[1] "hello world"
toupper(hw)
[1] "HELLO WORLD"
tolower(longer_string)
[1] "this"      "is"        "a"         "length"    "seven"     "character"
[7] "vector"   
toupper(longer_string)
[1] "THIS"      "IS"        "A"         "LENGTH"    "SEVEN"     "CHARACTER"
[7] "VECTOR"   
paste(hw, hw)
[1] "hello world hello world"
paste(string, "ed instrument")
[1] "just a string ed instrument"
paste0("question ", numbers)
[1] "question 3" "question 6" "question 5" "question 4" "question 3"
rep(hw, 10)
 [1] "hello world" "hello world" "hello world" "hello world" "hello world"
 [6] "hello world" "hello world" "hello world" "hello world" "hello world"

a few more interesting string functions

strsplit(hw, " ") # split a string into pieces and get a list back
[[1]]
[1] "hello" "world"
strsplit(hw, " ") |> # split and unlist back to vector
  unlist()
[1] "hello" "world"
grep("seven", longer_string) # tell me where in a vector a search term is found
[1] 5
grepl("seven", longer_string) # tell me if a vector contains a search term
[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
sort(longer_string) # into alphabetical order
[1] "a"         "character" "is"        "length"    "seven"     "this"     
[7] "vector"   
table(longer_string) # as with numbers
longer_string
        a character        is    length     seven      this    vector 
        1         1         1         1         1         1         1 

Don’t repeat your code. Long code is hard to read and understand. Three basic design patterns: the function, the loop, the if/else.

write a function

# basic function syntax
# need to run the definition before calling it

function_name <- function(argument){
  # some code doing something to the argument
  argument + 4 # the function will return the last value it produces
}

function_name(3)
[1] 7
# challenge - I'm bored of writing na.rm = TRUE. Could you make mean() automatically ignore the missing values?

new_mean <- function(x){
  mean(x, na.rm = TRUE)
}

new_mean(c(1,4,2,4,NA))
[1] 2.75
# arguments and defaults
# simple function syntax without {}

multo <- function(n1, n2 = 7) n1 * n2

multo(3)  
[1] 21
multo(3, 6)  
[1] 18
# anonymous function syntax is useful, but way too evil for a beginner's course

(\(x) x^2)(5)
[1] 25
(\(x, y) x^y)(5,3)
[1] 125

write a loop

# basic syntax 
for(i in 1:5){
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
# seq_along as a sensible safe way to work with vectors
for(i in seq_along(numbers)){ # seq_along converts a vector into sequential integers 1,2,3,4... up to the length of the vector
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
# indexing is the key to working with vectors inside loops
for(i in seq_along(numbers)){
  print(numbers[i]) # with indexing
}
[1] 3
[1] 6
[1] 5
[1] 4
[1] 3
# need to create output first, then collect

loop_output <- vector(mode = "numeric", length = length(numbers))

for(i in seq_along(numbers)){
  loop_output[i] <- numbers[i] * 9 # insert each bit of output into the right place in the output vector
}

loop_output
[1] 27 54 45 36 27

write an if/else

# basic syntax
a <- 3

if(a == 3){
  "a is three"
} else {
  "nope"
}
[1] "a is three"
# even easier if your condition is logical
c <- TRUE

if(c){
  "c is true"
} else {
  "nope"
}
[1] "c is true"
# combine conditions
b <- 2

if(a == 3 | b == 3){
  "a and/or b are three"
} else {
  "nope"
}
[1] "a and/or b are three"
if(a == 3 & b == 3){
  "a and/or b are three"
} else {
  "nope"
}
[1] "nope"
d <- c(4,3)

if(d == 3){
  "if/else isn't vectorised"
} else {
  "so this won't work"
}
# use the ifelse type if you need vectorised if/else
d <- c(4,3)

ifelse(d == 3, "...and we'll get this one second", "the first item is false, so we'll get this first")
[1] "the first item is false, so we'll get this first"
[2] "...and we'll get this one second"                

We can bring in external code to help us with R. That external code is known as a package. There are thousands of packages in current use, as the relevant pages on CRAN will tell you.

penguins

We need to install packages before we can use them. That only needs to be done once for your R setup. To illustrate, let’s install a package, called palmerpenguins, which contains some interesting data:

install.packages("palmerpenguins")

Once that package is installed, we can use the data (and functions) it contains by attaching them to our current script:

library(palmerpenguins)

Once we’ve done that, we’ll have several new items available to use. The most important here is the main penguins dataset:

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

That’s tabular data - so formed into rows and columns, rectangular (so all columns the same lengths etc), and with each column containing only one type of data. Tabular data is probably the most widely used type of data in R. That means that there are lots of tools for working with it. Some basic examples:

nrow(penguins)
[1] 344
ncol(penguins)
[1] 8
head(penguins)
# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <fct>, year <int>
names(penguins)
[1] "species"           "island"            "bill_length_mm"   
[4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
[7] "sex"               "year"             

dplyr

As well as those base-R functions, there are also many packages for working with tabular data. Probably the best-known package is dplyr, which we install and attach in the same way as palmerpenguins:

install.packages("dplyr")
library(dplyr)

The reason that dplyr is so popular is that some of the base-R ways of working with tabular data are a bit messy and hard to read:

penguins$species[1:4] # just to show the first few
[1] Adelie Adelie Adelie Adelie
Levels: Adelie Chinstrap Gentoo
penguins[["island"]][1:4] # just the first few, again
[1] Torgersen Torgersen Torgersen Torgersen
Levels: Biscoe Dream Torgersen

dplyr generally produces much easier-to-read code, especially when using the pipe to bring together lines of code:

penguins |>
  select(island) # pick out a column by providing a column name
# A tibble: 344 × 1
   island   
   <fct>    
 1 Torgersen
 2 Torgersen
 3 Torgersen
 4 Torgersen
 5 Torgersen
 6 Torgersen
 7 Torgersen
 8 Torgersen
 9 Torgersen
10 Torgersen
# ℹ 334 more rows
penguins |>
  select(-island) # remove a column
# A tibble: 344 × 7
   species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex   
   <fct>            <dbl>         <dbl>             <int>       <int> <fct> 
 1 Adelie            39.1          18.7               181        3750 male  
 2 Adelie            39.5          17.4               186        3800 female
 3 Adelie            40.3          18                 195        3250 female
 4 Adelie            NA            NA                  NA          NA <NA>  
 5 Adelie            36.7          19.3               193        3450 female
 6 Adelie            39.3          20.6               190        3650 male  
 7 Adelie            38.9          17.8               181        3625 female
 8 Adelie            39.2          19.6               195        4675 male  
 9 Adelie            34.1          18.1               193        3475 <NA>  
10 Adelie            42            20.2               190        4250 <NA>  
# ℹ 334 more rows
# ℹ 1 more variable: year <int>
penguins |>
  select(species, flipper_length_mm, island) # select and re-order columns
# A tibble: 344 × 3
   species flipper_length_mm island   
   <fct>               <int> <fct>    
 1 Adelie                181 Torgersen
 2 Adelie                186 Torgersen
 3 Adelie                195 Torgersen
 4 Adelie                 NA Torgersen
 5 Adelie                193 Torgersen
 6 Adelie                190 Torgersen
 7 Adelie                181 Torgersen
 8 Adelie                195 Torgersen
 9 Adelie                193 Torgersen
10 Adelie                190 Torgersen
# ℹ 334 more rows
penguins |>
  select(home_island = island) # select and rename
# A tibble: 344 × 1
   home_island
   <fct>      
 1 Torgersen  
 2 Torgersen  
 3 Torgersen  
 4 Torgersen  
 5 Torgersen  
 6 Torgersen  
 7 Torgersen  
 8 Torgersen  
 9 Torgersen  
10 Torgersen  
# ℹ 334 more rows

A note here: the penguins object that we’re working with is technically called a tibble. dplyr is specifically adapted to work with tibbles, and many of the functions won’t work properly on other kinds of data structure. The main idea underlying dplyr is that the many functions it contains should all work consistently, and work well together. So once you’ve got the hang of select there’s not much new to say about filter, which picks rows based on their values:

penguins |>
  filter(species == "Adelie") # find some rows about Adelie penguins
# A tibble: 152 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 142 more rows
# ℹ 2 more variables: sex <fct>, year <int>
penguins |>
  filter(bill_length_mm > 55) # find the big bills
# A tibble: 5 × 8
  species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>     <fct>           <dbl>         <dbl>             <int>       <int>
1 Gentoo    Biscoe           59.6          17                 230        6050
2 Gentoo    Biscoe           55.9          17                 228        5600
3 Gentoo    Biscoe           55.1          16                 230        5850
4 Chinstrap Dream            58            17.8               181        3700
5 Chinstrap Dream            55.8          19.8               207        4000
# ℹ 2 more variables: sex <fct>, year <int>
penguins |>
  filter(is.na(bill_length_mm)) # find missing data
# A tibble: 2 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen             NA            NA                NA          NA
2 Gentoo  Biscoe                NA            NA                NA          NA
# ℹ 2 more variables: sex <fct>, year <int>

And mutate - which makes new columns - will work in the same way:

penguins |>
  mutate(new_col = 11) |> # every row the same
  select(species, new_col) # so that we can see the new values in the preview
# A tibble: 344 × 2
   species new_col
   <fct>     <dbl>
 1 Adelie       11
 2 Adelie       11
 3 Adelie       11
 4 Adelie       11
 5 Adelie       11
 6 Adelie       11
 7 Adelie       11
 8 Adelie       11
 9 Adelie       11
10 Adelie       11
# ℹ 334 more rows
penguins |>
  mutate(bill_vol = bill_length_mm * bill_depth_mm^2) |> # some calculation
  select(species, bill_vol)
# A tibble: 344 × 2
   species bill_vol
   <fct>      <dbl>
 1 Adelie    13673.
 2 Adelie    11959.
 3 Adelie    13057.
 4 Adelie       NA 
 5 Adelie    13670.
 6 Adelie    16677.
 7 Adelie    12325.
 8 Adelie    15059.
 9 Adelie    11172.
10 Adelie    17138.
# ℹ 334 more rows
penguins |>
  mutate(label = paste("From", island, "island, a penguin of the species", species)) |>
  select(label, body_mass_g) # mutate and then select. You can use your new columns immediately.
# A tibble: 344 × 2
   label                                                  body_mass_g
   <chr>                                                        <int>
 1 From Torgersen island, a penguin of the species Adelie        3750
 2 From Torgersen island, a penguin of the species Adelie        3800
 3 From Torgersen island, a penguin of the species Adelie        3250
 4 From Torgersen island, a penguin of the species Adelie          NA
 5 From Torgersen island, a penguin of the species Adelie        3450
 6 From Torgersen island, a penguin of the species Adelie        3650
 7 From Torgersen island, a penguin of the species Adelie        3625
 8 From Torgersen island, a penguin of the species Adelie        4675
 9 From Torgersen island, a penguin of the species Adelie        3475
10 From Torgersen island, a penguin of the species Adelie        4250
# ℹ 334 more rows

As before, we need to assign with <- to save our changes. Let’s add the bill_vol column to the data now

penguins_vol <- penguins |>
  mutate(bill_vol = bill_length_mm * bill_depth_mm^2)

arrange sorts columns:

penguins_vol |>
  arrange(bill_vol)
# A tibble: 344 × 9
   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
 1 Gentoo  Biscoe           42.9          13.1               215        5000
 2 Gentoo  Biscoe           42            13.5               210        4150
 3 Gentoo  Biscoe           40.9          13.7               214        4650
 4 Adelie  Dream            32.1          15.5               188        3050
 5 Gentoo  Biscoe           43.3          13.4               209        4400
 6 Gentoo  Biscoe           44.9          13.3               213        5100
 7 Gentoo  Biscoe           42.6          13.7               213        4950
 8 Gentoo  Biscoe           42.7          13.7               208        3950
 9 Gentoo  Biscoe           46.1          13.2               211        4500
10 Gentoo  Biscoe           44            13.6               208        4350
# ℹ 334 more rows
# ℹ 3 more variables: sex <fct>, year <int>, bill_vol <dbl>

The nice thing about dplyr is that there are several other packages which work in similar ways. This package ecosystem gets called the tidyverse, and is extremely widely used to do data science work in R. A close relative of dplyr is the readr package, which reads in data to R and makes it into tibbles:

install.packages("readr")
library(readr)

penguins_raw <- read_csv("https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins_raw.csv")