Tidy data in Excel
excel
beginner
Session materials
Previous attendees have said…
- 19 previous attendees have left feedback
- 95% would recommend this session to a colleague
- 100% said that this session was pitched correctly
Three random comments from previous attendees
- Enjoyed this today - learned a few new tips’n’tricks! There’s alwasy somehting you pickup even when you think you know the topic. - ‘Text to columns’ skip column functionality - Data Validation Use Case and Circles - Excel ‘Float’ menu keyboard interaction
- Easy to follow and although I would describe myself as intermediate Excel user, I still picked up a few new tricks.
- Good beginnner session. Learned some new techniques that will help my excel use going forward.
Welcome
- this session is for 🌶 Excel beginners
- this is a mainly-practical session, and you’ll need Excel of some sort to follow along
Where does this fit in?
- for this session, you’ll need to be familiar with the Excel basics (getting around in Excel, opening/saving/closing files, and a little bit of A1 referencing)
- we’re going to dodge formatting and formulas as much as possible today
Session outline
- understanding tidy data
- a word of warning
- a practical introduction to making tidy data:
- values
- tables
- autofill
- exercises and demos
Understanding tidy data
- tidy data is not specific to Excel (e.g. this session is partly adapted from chapter 5 of R for Data Science)
- if we tidy our data, we’ll be able to compute it
- standards for tidying data help make for safe and effective analysis
A word of warning
- tidying data can be very slow and complicated
- in Excel, there are lots of advanced tools that can speed things up
- PowerQuery especially
- this is a beginner’s session, so we’ll avoid the more fancy tools
- but if your process takes lots of manual work, it’s definitely worth exploring alternative ways of working
Values
values
is the word we use to describe each bit of information in an Excel spreadsheet. Some examples:- a date, like
2024-06-28
- a number, like
11.2
- a name, like
NHS Grampian
- a cost, like
£12.50
- …
- a date, like
- each value should have its own cell
Entering values
- how would you enter this data:
2018, 2019, 2020, 2021, 2022
? - please now:
- open Excel
- start a new workbook
- add a column header
year
in cellA1
- then add each of those five values in the five cells underneath (down to
A6
)
More values
we’re going to be using some birthrate data from the NRS for this session. We’ll start by adding some birth rate data
this is given as births per 1,000 women in five year age brackets. We’ll start with 25-29 year old mothers
please add the header
25-29yrs
in cellB1
here are the values for our five years:
73.4, 71, 66.8, 69.6, 66.7
Tables
- you should keep your new data in a table
Insert
>Table
- tables allow you to sort and filter your data
- they also act as a useful ‘container’ (or data structure) for your data
Extending tables
30-34yrs |
---|
90.9 |
88.6 |
83.4 |
85.9 |
84.4 |
- please could you add some new data to your table showing births for another age group
- you should be able to drag the blue table corner over your new column to extend the table
Autofill
- one last way of adding values: autofill
- drag again to make a new empty column, and label it
difference
- in
D2
(the first ‘proper’ cell), copy this formula:=C2-B2
. This will calculate the difference in birth rates between the two columns - finally, click the small green corner of that newly-filled cell to autofill the column
Back to tidy data
- we’ve now got some data with:
- each value in a cell
- each variable in a column
- here, this is a maternal age bracket
- each observation in a row
- here, this is a year
- we could work through and extend this data by hand, but we’ll now switch over to some supplied data to save all the typing
- errors are common in manual data-entry
- if you can import data, that’s usually better than re-typing it
Many values per cell
- We often find useful data with more than one value per cell
- this can be helpful for humans
- Excel can’t do anything with this data
Exercise one: many values per cell
- find the
Exercise one
sheet in the exercise file - try calculating an average for each of the groups
- or, if you’re more confident, try plotting the data
Nothing works!
What’s the solution?
- make this data have one value per cell
- we could remove all the percentages by hand - but that’s going to be very slow
Text to Columns
gives us a better tool for the job
Text to Columns
- select a column
- in the
Data
tab of the ribbon, you should find theText to Columns
tool
- note that you can keep, or remove, the percentage column. We’ll skip it, to keep things simple
Try working with that data again
Exercise two: transposed data
- sometimes you’ll find data where the columns and rows have been flipped
- that’s slow to fix by hand, but luckily you can transpose it, which swaps rows and columns
- select your data, and copy/paste into a new cell
- then use the transpose option
- you might need to fix formatting and labels:
- the
Clear formats
option might help this - if you run into trouble, please note that transposing only works on data that is not in a table
Demo one: validation and really messy data
- one of the most time-consuming bits of tidying is checking your values
- we’ll briefly demonstrate the data validation tool
- this allows you to describe what format you think your data should be in, and then highlights anything that doesn’t fit
- we select the
years
column - then open the data validation tool
- then set appropriate validation options, so Whole number between 2010 and 2030
- then select
Circle Invalid Data
- we can now go through and fix anything circled
Demo two: reshaped data
- there are several ways of reshaping data that’s not in a tidy format
- we’ll look at the manual way here, but - as it’s horrible - I’d be keen to encourage you to investigate Power Query or Pivot Tables to reshape if this is a regular part of your working day. PQ takes < 10 seconds, PT not much longer
Manual reshape
- sort the data by year
- then copy and paste blocks of data, making sure to keep the years aligned
- make sure you then copy the age brackets to label the column
- then delete the spare years columns, and the age brackets