PowerQuery - a practical introduction
17/07/2024
Welcome
🌶 this session is a beginner-level introduction to PowerQuery
we’ll get going properly at 14.35
you’ll either Excel desktop (>2007) or Power BI desktop
you’ll also need to have basic competency in Excel or Power BI
if you can’t access the chat, you might need to join our Teams channel: tinyurl.com/kindnetwork
you can find session materials at tinyurl.com/kindtrp
The KIND network
- a social learning space for staff working with knowledge, information, and data across health, social care, and housing in Scotland
- we offer social support, free training, mentoring, community events, …
- Teams channel / mailing list
Excel / Power BI training sessions
- this session is based on our Power BI for Beginners course
- but PowerQuery isn’t specific to Power BI - it’s brilliant for Excel too
Session outline
- what is PowerQuery, and why might you use it?
- getting data into PowerQuery
- basic data wrangling
- joins (merges)
- pivots
How I think of PowerQuery
- the right way to load any real data into Excel
Why might you use it?
- powerful
- easy, to start with at least…
- allows you to standardise data loading from different sources
- allows you to repeat data loading
Getting data into PowerQuery (Excel)
- Open Excel and create a new, blank, workbook
Data
> Get & Transform Data
- Pick
From Web
and supply the following URL - https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-07-25/scurvy.csv
- From the preview, select
Transform data
to open PowerQuery
Getting data into PowerQuery (Power BI)
- Open Power BI and start a new report
- Select
Get data
> From Web
button in the ribbon
- Supply the following URL -
https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-07-25/scurvy.csv
- From the preview, select
Transform data
to open PowerQuery
- If you’ve already got some data in your report that you’d like to transform, use
Tranform data
in the ribbon to open PowerQuery:
:::
Preview your Data
(make sure to click Transform data
)
First view of PowerQuery
First view of PowerQuery
- Queries (= datasets) on the left
- ribbon menu at the top
- Query settings (= the changes you’ll make to your data) on the right
- Excel-ish data preview in the middle
Basic data wrangling
- right click the
study_id
column and select Remove
- the column will disappear
- but you’ll also gain a step in your Applied Steps field
Basic data wrangling
- you can edit this (and any) step
- if you’re code-confident, you can edit the formula to read
= Table.RemoveColumns(#"Changed Type",{"study_id", "treatment"})
to also remove the treatment column
- and you can remove the step using the X next to the step name
- you should also try renaming and reordering your columns in the same way
- once you’ve had a play, please remove the steps until you’re left with the data you started with (so delete back to “Changed Type”)
Replacing values
- we can also work with individual values
- our values are a number, then an underscore, then a word
- we can
Split column
at the underscore
- we could also replace the odd underscores in the treatment column
Reshaping data
- we could go column by column here splitting, but an easier approach would be to re-shape the data
- select the first three columns, and select
Unpivot Other Columns
- then split, change type, and generally tidy up
Add columns
Our treatment and dosing columns are annoying. Let’s make a new, joined, column for them:
- select
Custom Column
- you can click the gearwheel next to the step if you’d like to go back and tweak
Close and load / Close and Apply sends your data back to Excel/Power BI
Getting more fancy: a bigger dataset
A challenge
Can you find the number of responses to each question for your local NHS board? You should be able to re-work this data to give something like:
Some hints
- filtering works like Excel, but is destructive
- unpivoting, doing something to all the rows, then re-pivoting is a great way to work across lots of columns
- you can join structurally similar datasets using
Append Queries as New
- you can deduplicate entire datasets by selecting all the columns and using
Remove Rows
> Remove Duplicates
Feedback and resources
- please can I ask for some feedback - takes less than a minute, completely anonymous, helps people like you find the right training for them