R lists

R
beginner
lists
data structures
Published

September 20, 2024

No feedback found for this session

Introduction

  • lists are one of R’s built-in data structures
  • lists are roughly equivalent to a vector of vectors

Make and change lists

  • list construction can be done in a couple of different ways:
list("1", "2")
[[1]]
[1] "1"

[[2]]
[1] "2"
empty <- vector("list", 3L)
empty
[[1]]
NULL

[[2]]
NULL

[[3]]
NULL
  • you can assign into lists
empty[[2]] <- 9
empty # ironic
[[1]]
NULL

[[2]]
[1] 9

[[3]]
NULL
empty[[2]] <- NULL
empty # balance restored
[[1]]
NULL

[[2]]
NULL
  • lists are ordered:
simple_list <- list("one", "two", "three")
simple_list[1]
[[1]]
[1] "one"

So what’s good about lists?

  • so far, so like vectors
  • unlike vectors and df/tibbles, lists are ragged arrays - can store different lengths and kinds of values together
  • let’s start with some vectors of different types:
hw <- "hello world"
hi <- substr(hw, 1, 5)
hh <- c(hi, hw)
hcount <- sum(nchar(c(hh, hi, hw)))
ing_things <- factor(c("thing", "string", "wing", "bling"), levels = c("wing", "bling", "string", "thing"))
  • now build a list, and discover that you can store these dissimilar items together
list(hh, hi, hw, ing_things, hcount)
[[1]]
[1] "hello"       "hello world"

[[2]]
[1] "hello"

[[3]]
[1] "hello world"

[[4]]
[1] thing  string wing   bling 
Levels: wing bling string thing

[[5]]
[1] 32

Names

  • you can name list items
named_list <- list("hh" = hh, 
                   "hi" = hi, 
                   "hw" = hw, 
                   "silly_name" = ing_things,
                   "total_letter_score" = hcount)
  • it turns out that you already knew that: df/tibbles are a special case of lists. That means that some familiar friends will work with lists - list using $ to get at named list items:
named_list$hw
[1] "hello world"
named_list$hi
[1] "hello"
paste("we've got a total of", named_list$hcount, "characters in our list, including", named_list$silly_name)
[1] "we've got a total of  characters in our list, including thing" 
[2] "we've got a total of  characters in our list, including string"
[3] "we've got a total of  characters in our list, including wing"  
[4] "we've got a total of  characters in our list, including bling" 

Indexing lists

  • there’s some additional care required when working with indexes and lists though:
named_list$hi # a vector
[1] "hello"
named_list[2] # that's a list
$hi
[1] "hello"
class(mtcars[2]) # likewise, a 1-column df
[1] "data.frame"
  • two easy ways of getting vectors out of lists
named_list$silly_name # like df
[1] thing  string wing   bling 
Levels: wing bling string thing
named_list[[2]] # double brackets
[1] "hello"

Unmaking lists

  • if you’re not too fussy about the resulting structure, you can flatten a list into a named vector
unlist(named_list)
               hh1                hh2                 hi                 hw 
           "hello"      "hello world"            "hello"      "hello world" 
       silly_name1        silly_name2        silly_name3        silly_name4 
               "4"                "3"                "1"                "2" 
total_letter_score 
              "32" 
  • each item is named by a concatenation of the original list item name (like “silly_name”) and an index number - so “silly_name4”. You can retrieve elements from the vector by these names:
unlist(named_list)["silly_name4"]
silly_name4 
        "2" 
  • note that everything has been coerced to character - a reminder that vectors can only contain a single data type
  • you can also turn a list with equal/similar length items into a tidy array like a tibble:
tidyr::tibble(glurt = list(hh, hh, hh)) |>
  tidyr::unnest_wider(col = glurt, names_sep = "_")
# A tibble: 3 Ă— 2
  glurt_1 glurt_2    
  <chr>   <chr>      
1 hello   hello world
2 hello   hello world
3 hello   hello world
  • things get much more messy when you’re dealing with lists with mixed lengths and types. That’s beyond us today - have a look at R for data science for an introduction to rectangling

What are lists for?

Lists are particularly useful in three cases:

  1. where you don’t know what type of output you’re going to get. This is the reason that purrr::map returns a list by default - you can collect any output type in a list.
rando <- function(n){
  ifelse(runif(1) > 0.5, list(n), as.character(n)) # sometimes numeric, sometimes char
}
rando(4)
[[1]]
[1] 4
c(4,3,rando(8)) # unfortunate coercion that might happen with vectors
[1] "4" "3" "8"
list(4,3,rando(8)) # happier and safer in a list
[[1]]
[1] 4

[[2]]
[1] 3

[[3]]
[[3]][[1]]
[1] 8
  1. when you’ve got several related bits of data you want to keep together, especially if you’d like to save them to a single portable object:
named_list |>
  readr::write_rds("data/named_list.rds")
rm(list = ls()) # clear environment

named_list <- readr::read_rds("data/named_list.rds")

named_list |>
  list2env(envir = .GlobalEnv) # don't tell any proper programmers about this
<environment: R_GlobalEnv>
  1. when you’re dealing with wildly complicated data structures. Nice example from R for data science, which is too long-winded for a practical session like this, but a good example of real-world horrible data
repurrrsive::gmaps_cities |> # list columns on list columns
  tidyr::unnest_wider(col = json) |>
  tidyr::unnest_longer(col = results) |>
  tidyr::unnest_wider(col = results) 
# A tibble: 7 Ă— 7
  city  address_components formatted_address geometry     place_id types  status
  <chr> <list>             <chr>             <list>       <chr>    <list> <chr> 
1 Hous… <list [4]>         Houston, TX, USA  <named list> ChIJAYW… <list> OK    
2 Wash… <list [2]>         Washington, USA   <named list> ChIJ-bD… <list> OK    
3 Wash… <list [4]>         Washington, DC, … <named list> ChIJW-T… <list> OK    
4 New … <list [3]>         New York, NY, USA <named list> ChIJOwg… <list> OK    
5 Chic… <list [4]>         Chicago, IL, USA  <named list> ChIJ7cv… <list> OK    
6 Arli… <list [4]>         Arlington, TX, U… <named list> ChIJ05g… <list> OK    
7 Arli… <list [4]>         Arlington, VA, U… <named list> ChIJD6e… <list> OK    

Again, getting beyond the session’s aims, but the purrr package is a great help for this kind of nested-nested-nested data:

repurrrsive::gmaps_cities |>
  purrr::pluck(2,1,1,1,"formatted_address")
[1] "Houston, TX, USA"