No feedback found for this session
R lists
R
beginner
lists
data structures
Introduction
- lists are one of R’s built-in data structures
- lists are roughly equivalent to a vector of vectors
Make and change lists
- list construction can be done in a couple of different ways:
list("1", "2")
[[1]]
[1] "1"
[[2]]
[1] "2"
<- vector("list", 3L)
empty empty
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
- you can assign into lists
2]] <- 9
empty[[# ironic empty
[[1]]
NULL
[[2]]
[1] 9
[[3]]
NULL
2]] <- NULL
empty[[# balance restored empty
[[1]]
NULL
[[2]]
NULL
- lists are ordered:
<- list("one", "two", "three")
simple_list 1] simple_list[
[[1]]
[1] "one"
So what’s good about lists?
- so far, so like vectors
- unlike vectors and df/tibbles, lists are ragged arrays - can store different lengths and kinds of values together
- let’s start with some vectors of different types:
<- "hello world"
hw <- substr(hw, 1, 5)
hi <- c(hi, hw)
hh <- sum(nchar(c(hh, hi, hw)))
hcount <- factor(c("thing", "string", "wing", "bling"), levels = c("wing", "bling", "string", "thing")) ing_things
- now build a list, and discover that you can store these dissimilar items together
list(hh, hi, hw, ing_things, hcount)
[[1]]
[1] "hello" "hello world"
[[2]]
[1] "hello"
[[3]]
[1] "hello world"
[[4]]
[1] thing string wing bling
Levels: wing bling string thing
[[5]]
[1] 32
Names
- you can name list items
<- list("hh" = hh,
named_list "hi" = hi,
"hw" = hw,
"silly_name" = ing_things,
"total_letter_score" = hcount)
- it turns out that you already knew that: df/tibbles are a special case of lists. That means that some familiar friends will work with lists - list using
$
to get at named list items:
$hw named_list
[1] "hello world"
$hi named_list
[1] "hello"
paste("we've got a total of", named_list$hcount, "characters in our list, including", named_list$silly_name)
[1] "we've got a total of characters in our list, including thing"
[2] "we've got a total of characters in our list, including string"
[3] "we've got a total of characters in our list, including wing"
[4] "we've got a total of characters in our list, including bling"
Indexing lists
- there’s some additional care required when working with indexes and lists though:
$hi # a vector named_list
[1] "hello"
2] # that's a list named_list[
$hi
[1] "hello"
class(mtcars[2]) # likewise, a 1-column df
[1] "data.frame"
- two easy ways of getting vectors out of lists
$silly_name # like df named_list
[1] thing string wing bling
Levels: wing bling string thing
2]] # double brackets named_list[[
[1] "hello"
Unmaking lists
- if you’re not too fussy about the resulting structure, you can flatten a list into a named vector
unlist(named_list)
hh1 hh2 hi hw
"hello" "hello world" "hello" "hello world"
silly_name1 silly_name2 silly_name3 silly_name4
"4" "3" "1" "2"
total_letter_score
"32"
- each item is named by a concatenation of the original list item name (like “silly_name”) and an index number - so “silly_name4”. You can retrieve elements from the vector by these names:
unlist(named_list)["silly_name4"]
silly_name4
"2"
- note that everything has been coerced to character - a reminder that vectors can only contain a single data type
- you can also turn a list with equal/similar length items into a tidy array like a tibble:
::tibble(glurt = list(hh, hh, hh)) |>
tidyr::unnest_wider(col = glurt, names_sep = "_") tidyr
# A tibble: 3 Ă— 2
glurt_1 glurt_2
<chr> <chr>
1 hello hello world
2 hello hello world
3 hello hello world
- things get much more messy when you’re dealing with lists with mixed lengths and types. That’s beyond us today - have a look at R for data science for an introduction to rectangling
What are lists for?
Lists are particularly useful in three cases:
- where you don’t know what type of output you’re going to get. This is the reason that
purrr::map
returns a list by default - you can collect any output type in a list.
<- function(n){
rando ifelse(runif(1) > 0.5, list(n), as.character(n)) # sometimes numeric, sometimes char
}rando(4)
[[1]]
[1] 4
c(4,3,rando(8)) # unfortunate coercion that might happen with vectors
[1] "4" "3" "8"
list(4,3,rando(8)) # happier and safer in a list
[[1]]
[1] 4
[[2]]
[1] 3
[[3]]
[[3]][[1]]
[1] 8
- when you’ve got several related bits of data you want to keep together, especially if you’d like to save them to a single portable object:
|>
named_list ::write_rds("data/named_list.rds")
readrrm(list = ls()) # clear environment
<- readr::read_rds("data/named_list.rds")
named_list
|>
named_list list2env(envir = .GlobalEnv) # don't tell any proper programmers about this
<environment: R_GlobalEnv>
- when you’re dealing with wildly complicated data structures. Nice example from R for data science, which is too long-winded for a practical session like this, but a good example of real-world horrible data
::gmaps_cities |> # list columns on list columns
repurrrsive::unnest_wider(col = json) |>
tidyr::unnest_longer(col = results) |>
tidyr::unnest_wider(col = results) tidyr
# A tibble: 7 Ă— 7
city address_components formatted_address geometry place_id types status
<chr> <list> <chr> <list> <chr> <list> <chr>
1 Hous… <list [4]> Houston, TX, USA <named list> ChIJAYW… <list> OK
2 Wash… <list [2]> Washington, USA <named list> ChIJ-bD… <list> OK
3 Wash… <list [4]> Washington, DC, … <named list> ChIJW-T… <list> OK
4 New … <list [3]> New York, NY, USA <named list> ChIJOwg… <list> OK
5 Chic… <list [4]> Chicago, IL, USA <named list> ChIJ7cv… <list> OK
6 Arli… <list [4]> Arlington, TX, U… <named list> ChIJ05g… <list> OK
7 Arli… <list [4]> Arlington, VA, U… <named list> ChIJD6e… <list> OK
Again, getting beyond the session’s aims, but the purrr package is a great help for this kind of nested-nested-nested data:
::gmaps_cities |>
repurrrsive::pluck(2,1,1,1,"formatted_address") purrr
[1] "Houston, TX, USA"