This is part of our intermediate-level Power BI desktop course. The course structure is as follows:
- session 1 (this session): a simple build-a-dashboard session as a warmup, reminder about core Power BI techniques, and sharing some suggestions for good workflow practices
- session 2: DAX language foundations, including FILTER and CALCULATE
- session 3 (this session): core PowerQuery functions to manage sources, join data sets, pivot, and tidy values
- session 4: PowerQuery M language foundations, including a basic overview of syntax, a core function toolkit, advanced mode, and reproducibility
- session 5: a second, more ambitious, build-a-dashboard session to put all that PowerQuery and DAX knowledge into practice
No feedback found for this session
Session aim
- to cover core Power Query (PQ) functions and techniques
Where does this fit in?
- session 1: a simple build-a-dashboard session as a warmup, reminder about core Power BI techniques, and sharing some suggestions for good workflow practices
- session 2: DAX language foundations, including FILTER and CALCULATE
- session 3: core PowerQuery functions to manage sources, join data sets, pivot, and tidy values
- session 4: PowerQuery M language foundations, including a basic overview of syntax, a core function toolkit, advanced mode, and reproducibility
- session 5: a second, more ambitious, build-a-dashboard session to put all that PowerQuery and DAX knowledge into practice
Session outline
- thinking about queries - sources, connections, code folding, duplicating/referencing
- data formats for Power BI - tidy data, and key-value pairs, and interconverting with pivots
- joins and relationships
- normalising and the star schema
Getting started
- open Power BI
- create a blank report
- then click
Tranform data
to open PQ
Loading data from Power Query
Tables and sheets
- we want table 6 (
TableQ6_Deaths_by_sex_cause_and_NHS_Board_area
)
- please import that from the navigator
Tidying up
- remove the three
All
rows
- check the data types
- fix the query name to
cause_death
This data isn’t quite tidy
Shaping your data
- two main approaches found in Power BI:
- Excel-like tidy data
- SQL-like relational data
- tidy data is conceptually easier to build
- joins/merges
- splitting columns
- but really Power BI prefers relational data
- more performant
- easier to manage complex datasets
Tidy data
- largely covered in session one, but a quick reminder about splitting and merging
Key-value pair data
Key-value pair data
Pneumonia |
144 |
Asthma |
20 |
Influenza 3 |
2 |
Relational data in two minutes
- break your data into multiple tables
- give each item a key
- build relationships between those keys
Take some wide data
NHS A&A |
3 |
AED Plus Fully Automatic Defibrillator |
ZOLL |
1305 |
NHS A&A |
4 |
AED Plus Fully Automatic Defibrillator |
ZOLL |
975 |
NHS A&A |
5 |
Philips HeartStart FRx Semi-Automatic Defibrillator with Carry Case |
Philips |
1200 |
Data from sja.org.uk
Break into tables
NHS A&A |
3 |
NHS A&A |
4 |
NHS A&A |
5 |
AED Plus Fully Automatic Defibrillator |
ZOLL |
1305 |
AED Plus Fully Automatic Defibrillator |
ZOLL |
975 |
Philips HeartStart FRx Semi-Automatic Defibrillator with Carry Case |
Philips |
1200 |
Add unique keys
NHS A&A |
3 |
ZOLL-1 |
NHS A&A |
4 |
ZOLL-2 |
NHS A&A |
5 |
PHIL-1 |
ZOLL-1 |
AED Plus Fully Automatic Defibrillator |
ZOLL |
1305 |
ZOLL-2 |
AED Plus Fully Automatic Defibrillator |
ZOLL |
975 |
PHIL-1 |
Philips HeartStart FRx Semi-Automatic Defibrillator with Carry Case |
Philips |
1200 |
Star schema
- you might also see Power BI data described as a star schema
image: Wikimedia
Reshaping our data
- we’ll break out the cause of death parts of the data into a new table
- we’ll use the ICD 10 codes as a key
- duplicate query and rename that query to
ICD10
- remove anything other than ICD10 data in
ICD10
(and a note about Table.SelectColumns
)
- remove duplicates via
Remove Rows > Remove Duplicates
- rename
ICD 10 Summary List
to ICD10
- then remove the
Cause of death
column in cause_death
, and rename the ICD10
column to link with the new table
Duplicating vs referencing queries
- a duplicate query is an independent copy of a query that can be tweaked/extended
- a referenced query takes the output of a query, and uses it as the starting-point for additional work
Reshape cause_death
to key-value
- remove the
Scotland
column
- then unpivot everything but
ICD10
and Sex
- rename your new Attribute/Value columns to
board
and n
Create national data for benchmarking
- duplicate
cause_death
- name that new query
national
- remove everything but the national data with
= Table.SelectColumns(#"Filtered Rows",{"ICD 10 Summary List", "Scotland", "Sex"})
More data, same source
- go back to source step of the
cause_death
query
- find the table data in the row
TableQ2b_Vital_events_by_NHS_Board_area
- right-click the
Table
, and select Add as New Query
- rename that new query
population
Make a population table
- remove everything but the first three columns
- remove the
null
row and the SCOTLAND
row
- rename
NHS Board area
to board
- clean and trim the
board
values
- unpivot the population columns to
sex
and population
- replace values and change case in the
sex
column to leave Male
and Female
values