R lists

beginner

lists

data structures

Published

September 20, 2024

No feedback found for this session

Introduction

lists are one of R’s built-in data structures
lists are roughly equivalent to a vector of vectors

Make and change lists

list construction can be done in a couple of different ways:

list("1", "2")

[[1]]
[1] "1"

[[2]]
[1] "2"

empty <- vector("list", 3L)
empty

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

you can assign into lists

empty[[2]] <- 9
empty # ironic

[[1]]
NULL

[[2]]
[1] 9

[[3]]
NULL

empty[[2]] <- NULL
empty # balance restored

[[1]]
NULL

[[2]]
NULL

lists are ordered:

simple_list <- list("one", "two", "three")
simple_list[1]

[[1]]
[1] "one"

So what’s good about lists?

so far, so like vectors
unlike vectors and df/tibbles, lists are ragged arrays - can store different lengths and kinds of values together
let’s start with some vectors of different types:

hw <- "hello world"
hi <- substr(hw, 1, 5)
hh <- c(hi, hw)
hcount <- sum(nchar(c(hh, hi, hw)))
ing_things <- factor(c("thing", "string", "wing", "bling"), levels = c("wing", "bling", "string", "thing"))

now build a list, and discover that you can store these dissimilar items together

list(hh, hi, hw, ing_things, hcount)

[[1]]
[1] "hello"       "hello world"

[[2]]
[1] "hello"

[[3]]
[1] "hello world"

[[4]]
[1] thing  string wing   bling 
Levels: wing bling string thing

[[5]]
[1] 32

Names

you can name list items

named_list <- list("hh" = hh, 
                   "hi" = hi, 
                   "hw" = hw, 
                   "silly_name" = ing_things,
                   "total_letter_score" = hcount)

it turns out that you already knew that: df/tibbles are a special case of lists. That means that some familiar friends will work with lists - list using $ to get at named list items:

named_list$hw

[1] "hello world"

named_list$hi

[1] "hello"

paste("we've got a total of", named_list$hcount, "characters in our list, including", named_list$silly_name)

[1] "we've got a total of  characters in our list, including thing" 
[2] "we've got a total of  characters in our list, including string"
[3] "we've got a total of  characters in our list, including wing"  
[4] "we've got a total of  characters in our list, including bling"

Indexing lists

there’s some additional care required when working with indexes and lists though:

named_list$hi # a vector

[1] "hello"

named_list[2] # that's a list

$hi
[1] "hello"

class(mtcars[2]) # likewise, a 1-column df

[1] "data.frame"

two easy ways of getting vectors out of lists

named_list$silly_name # like df

[1] thing  string wing   bling 
Levels: wing bling string thing

named_list[[2]] # double brackets

[1] "hello"

Unmaking lists

if you’re not too fussy about the resulting structure, you can flatten a list into a named vector

unlist(named_list)

               hh1                hh2                 hi                 hw 
           "hello"      "hello world"            "hello"      "hello world" 
       silly_name1        silly_name2        silly_name3        silly_name4 
               "4"                "3"                "1"                "2" 
total_letter_score 
              "32"

each item is named by a concatenation of the original list item name (like “silly_name”) and an index number - so “silly_name4”. You can retrieve elements from the vector by these names:

unlist(named_list)["silly_name4"]

silly_name4 
        "2"

note that everything has been coerced to character - a reminder that vectors can only contain a single data type
you can also turn a list with equal/similar length items into a tidy array like a tibble:

tidyr::tibble(glurt = list(hh, hh, hh)) |>
  tidyr::unnest_wider(col = glurt, names_sep = "_")

# A tibble: 3 × 2
  glurt_1 glurt_2    
  <chr>   <chr>      
1 hello   hello world
2 hello   hello world
3 hello   hello world

things get much more messy when you’re dealing with lists with mixed lengths and types. That’s beyond us today - have a look at R for data science for an introduction to rectangling

What are lists for?

Lists are particularly useful in three cases:

where you don’t know what type of output you’re going to get. This is the reason that purrr::map returns a list by default - you can collect any output type in a list.

rando <- function(n){
  ifelse(runif(1) > 0.5, list(n), as.character(n)) # sometimes numeric, sometimes char
}
rando(4)

[1] "4"

c(4,3,rando(8)) # unfortunate coercion that might happen with vectors

[1] "4" "3" "8"

list(4,3,rando(8)) # happier and safer in a list

[[1]]
[1] 4

[[2]]
[1] 3

[[3]]
[1] "8"

when you’ve got several related bits of data you want to keep together, especially if you’d like to save them to a single portable object:

named_list |>
  readr::write_rds("data/named_list.rds")
rm(list = ls()) # clear environment

named_list <- readr::read_rds("data/named_list.rds")

named_list |>
  list2env(envir = .GlobalEnv) # don't tell any proper programmers about this

<environment: R_GlobalEnv>

when you’re dealing with wildly complicated data structures. Nice example from R for data science, which is too long-winded for a practical session like this, but a good example of real-world horrible data

repurrrsive::gmaps_cities |> # list columns on list columns
  tidyr::unnest_wider(col = json) |>
  tidyr::unnest_longer(col = results) |>
  tidyr::unnest_wider(col = results)

# A tibble: 7 × 7
  city  address_components formatted_address geometry     place_id types  status
  <chr> <list>             <chr>             <list>       <chr>    <list> <chr> 
1 Hous… <list [4]>         Houston, TX, USA  <named list> ChIJAYW… <list> OK    
2 Wash… <list [2]>         Washington, USA   <named list> ChIJ-bD… <list> OK    
3 Wash… <list [4]>         Washington, DC, … <named list> ChIJW-T… <list> OK    
4 New … <list [3]>         New York, NY, USA <named list> ChIJOwg… <list> OK    
5 Chic… <list [4]>         Chicago, IL, USA  <named list> ChIJ7cv… <list> OK    
6 Arli… <list [4]>         Arlington, TX, U… <named list> ChIJ05g… <list> OK    
7 Arli… <list [4]>         Arlington, VA, U… <named list> ChIJD6e… <list> OK

Again, getting beyond the session’s aims, but the purrr package is a great help for this kind of nested-nested-nested data:

repurrrsive::gmaps_cities |>
  purrr::pluck(2,1,1,1,"formatted_address")

[1] "Houston, TX, USA"