PowerQuery - a practical introduction

excel
intermediate
Published

July 17, 2024

Session materials

Previous attendees have said…

  • 11 previous attendees have left feedback
  • 100% would recommend this session to a colleague
  • 82% said that this session was pitched correctly

Three random comments from previous attendees
  • Great intro session to get an understanding of the basics.
  • issues in loading data and due to speed of which trainer was moving at could not follow along. transpired i needed an addon to excel and had to leave the session
  • very good, cannot wait for the next level.

Welcome

  • 🌶 this session is a beginner-level introduction to PowerQuery
  • you’ll either Excel desktop (>2007) or Power BI desktop
  • you’ll also need to have basic competency in Excel or Power BI
  • this session is based on our Power BI for Beginners course
  • but PowerQuery isn’t specific to Power BI - it’s brilliant for Excel too

Session outline

  • what is PowerQuery, and why might you use it?
  • getting data into PowerQuery
  • basic data wrangling
  • joins (merges)
  • pivots

What is PowerQuery?

How I think of PowerQuery

  • the right way to load any real data into Excel

Why might you use it?

  • powerful
  • easy, to start with at least…
  • allows you to standardise data loading from different sources
  • allows you to repeat data loading

Getting data into PowerQuery

  1. Open Excel and create a new, blank, workbook
  2. Data > Get & Transform Data
    Get & Transform Data
  3. Pick From Web and supply the following URL - https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-07-25/scurvy.csv
  4. From the preview, select Transform data to open PowerQuery
  1. Open Power BI and start a new report
  2. Select Get data > From Web button in the ribbon Get data from web
  3. Supply the following URL - https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-07-25/scurvy.csv
    supply a URL
  4. From the preview, select Transform data to open PowerQuery
    Select Transform data to open PowerQuery
  5. If you’ve already got some data in your report that you’d like to transform, use Tranform data in the ribbon to open PowerQuery:
    Tranform data in the ribbon

Preview your Data

make sure to click Transform data
(make sure to click Transform data)

First view of PowerQuery

First view of PowerQuery
  • Queries (= datasets) on the left
  • ribbon menu at the top
  • Query settings (= the changes you’ll make to your data) on the right
  • Excel-ish data preview in the middle

Basic data wrangling

  • right click the study_id column and select Remove
    • the column will disappear
    • but you’ll also gain a step in your Applied Steps field
  • you can edit this step
    • if you’re code-confident, you can edit the formula to read = Table.RemoveColumns(#"Changed Type",{"study_id", "treatment"}) to also remove the treatment column
    • and you can remove the step using the X next to the step name
  • you should also try renaming and reordering your columns in the same way
  • once you’ve had a play, please remove the steps until you’re left with the data you started with (so delete back to “Changed Type”)

Replacing values

  • we can also work with individual values
  • our values are a number, then an underscore, then a word
  • we can Split column at the underscore
  • we could also replace the odd underscores in the treatment column

Reshaping data

  • we could go column by column here splitting, but an easier approach would be to re-shape the data
  • select the first three columns, and select Unpivot Other Columns
  • then split, change type, and generally tidy up

Add columns

Our treatment and dosing columns are annoying. Let’s make a new, joined, column for them:

  • select Custom Column
    Custom Column
  • you can click the gearwheel next to the step if you’d like to go back and tweak

Close and load / Close and Apply sends your data back to Excel/Power BI

Getting more fancy: a bigger dataset

A challenge

Can you find the number of responses to each question for your local NHS board? You should be able to re-work this data to give something like:

Some hints

  • filtering works like Excel, but is destructive
  • unpivoting, doing something to all the rows, then re-pivoting is a great way to work across lots of columns
  • you can join structurally similar datasets using Append Queries as New
    Append Queries as New
  • you can deduplicate entire datasets by selecting all the columns and using Remove Rows > Remove Duplicates