BI for Beginners (service version) Session 3

Power BI
beginner
Published

September 26, 2024

No feedback found for this session

Session outline

  • In this session, we’re going to build part of the GP practice size dashboard from last time
  • But we’ll do this from raw data, rather than the play data we used last time
  • We’ll bring that data in directly from the web
  • We’ll need to wrangle that data to get it to behave

A word of warning

  • data wrangling is one area where the web version of Power BI is much less capable than the desktop version
  • that means that we’re going to use Excel/PowerQuery to do most of the heavy lifing for us here
  • if the desktop version of Power BI is any guide, I’d hope that these tools will eventually be integrated into the web version of Power BI before too long

Data wrangling

Load our data

Task
  1. Open Excel and create a blank workbook
    create a blank workbook
  2. In the Data tab of the ribbon menu, select the From Web option
    select the From Web option
  3. Supply the first URL
    Supply the first URL
  4. Select Load
  5. Repeat with the remaining URLs
  6. Save your workbook

Inspect our data

  • You should be able to see each of your datasets in Excel as separate worksheets (with pretty horrible names) new worksheets in Excel

PowerQuery

Next, we’ll open these three sets of data in PowerQuery by selecting Launch Power Query Editor... from the Get Data menu:
Launch Power Query Editor…

Renaming our data

  • To make life easier, we’ll rename each query by right-clicking the query title on the left-hand side of the screen
    rename each query:
    • call the GP practice details practices
    • call the NHS boards data practices
    • call the demographic data demographics

Task list

  • we can discard some of the practices data
  • we’ll need to join boards and practices to get nice health board names in our dashboard
  • we’ll need to re-shape demographics
  • and we’ll need to send our wrangled data to Power BI

First steps in PowerQuery

  • find your boards data
  • (reminder about the duty of data minimization)
  • we can safely lose several columns:
    • _id, HBDateEnacted, HBDateArchived and Country are all spare
    • right click the column header to Remove
      remove columns

The big idea

  • once you’ve removed a column, you’ll see a change to the Applied Steps area:
    Applied steps are
  • we’ve gained an extra step, showing that we’ve removed a column

Steps

  • if you click an earlier step, you can see the state of your data at that point
  • if you click the X to the left of the step, you can delete (effectively undo) that step
  • if you click the gearwheel to the right of the step name, you can alter the way that step worked
    • if you’re code-confident, you can also tweak the PowerQuery M formula that appears in the formula bar

PowerQuery -> Excel

  • Once you’ve removed the unnecessary columns, click Close & Load to commit your changes
    Commit your changes
  • that will return your data to Excel

Excel -> Power BI

  • from Excel, rename your tables
    • call the GP practice details practices
    • call the NHS boards data practices
    • call the demographic data demographics
  • ensure the .xlsx is saved
  • connect to Power BI as before

Service vs Desktop

  • we’ll need to do this slighly ungainly Excel -> PowerQuery -> Excel -> PowerBI route whenever we tweak our data
  • on the desktop, PowerQuery is integrated into Power BI, which makes things much smoother
  • this might be a good point to explore setting up a connection between your OneDrive and Power BI service as it’ll reduce the messiness of re-re-uploading data as you tweak it

Joining data

  • there’s an HB column in both the boards and practices data
  • we can link these up so that we can use the actual proper health board names in our dashboard
Task
  1. select the HB column and select Merge Queries
    select Merge Queries
  2. fill in the options in the merge pop-up
    merge pop-up
  3. note that this gives us lots of information about how the join is likely to work
  • a slight puzzle: the join initially looks pretty unpromising
    nested columns
  • here, Power Query is effectively telling you that there are several columns packed into this new column
  • we can unpack all or some of the columns from each row using the menu in the column header.
  • I think we only want HBName - but feel free to experiment as you can always backtrack from the query settings pane
Task
  1. select the column(s) you want to keep from the column header
    Select your columns from the expand button
  • you can also access lots of other Excel-ish options by right-clicking the column
  • something we can do there is to rename this column to give a nicer name (which we’ll need to use later while building our visualisation)
Task
  1. please rename your new column to “HBName”
  2. note that you can also do this inside the formula bar at the top of the screen
    Formula bar
    This formula is written in Power Query M, which we’ll discuss later
  3. check that you now have nice board names in the practices data
  4. remember to close & apply

Re-shaping data

  • our demographics data is going to be hard to plot
  • Power BI generally expects data with one variable per column
  • in this data, our useful variables (like the count of patients in particular age or sex brackets) are spread across several columns
  • we’ll reshape it using Power Query to make our life easier
Task
  1. As before, please Transform Data
  2. Use the Select columns tool (or delete as before) to remove the id, Date, HSPC, HB, and SexQF columns
  3. You can do this in Power Query directly with = Table.RemoveColumns(#"Changed Type",{"_id", "Date", "HB", "HSCP", "SexQF"})
  4. Select the PracticeCode and Sex columns, then select Unpivot Other Columns
    Unpivot Other Columns
  • That should give us our data in long form, so that it looks like this:
    long form data
  • this will make it easier to plot in Power BI as we usually need to have one variable (like number of patients in an age/sex bracket) in a single column
  • we can now remove unwanted values by filtering
    remove unwanted values by filtering
Task
  1. From the Attribute column, please remove the QF rows,
  2. From the Attribute column, please remove the AllAges rows
  3. From the Sex column, please remove the All rows

Transforming values

  • the last type of reshaping to show off is about transforming values
Task
  1. Split the Attribute column by Digit to Non-digit
    Digit to non-digit
  2. replace values in the new Attribute.1 column to remove the “Ages”
  3. then in Attribute.2, two replacements to remove the “to”s and replace the “plus” with 99
  4. then rename the columns to age_upper and age_lower
  5. finally, change the data type to whole number for each of those new columns

Tidying up

  • we should now be able to use this data to re-make our GP bubble map from last time.
  • there are some minor bits of tidying up that need to happen first
  • there are a few duplicate practice names, which might cause us problems
    Duplicate practice names
Task
  1. select Transform data
  2. add a Custom column called Practice Name
    Add a custom column
  3. use & to concatenate values: [GPPracticeName] & " (" & [Postcode] & ")"
  4. close and apply
  • Power BI likes to summarize data wherever possible
  • this can cause problems (although I don’t think there’s anything in this example that suffers from it)
Task
  1. Go to the data view
  2. In the practices table, note that you can change the summary options for the PracticeListSize column
    Summary defaults

Re-making our visualisation

Task
  1. Add a map, and set the Postcode to Location, boards.HBName to Legend, PracticeListSize to Bubble Size
  2. Format the map to greyscale
  3. Rename for this visual to suit
  4. add a slicer for Health board
  5. add cards for the list size and practice name
  6. add a stacked column chart for practice sizes

Finishing up

  • next time, we’ll use some of that data we wrangled today to extend our GP dashboard
  • please save your files from today somewhere safe
  • if you have time during the week, come back to the visualizations that we sketched today, and play. What would you like to add to the dashboard?
    • there’s a lot of demographic data to play with
  • Let me know, and we’ll try adding those things next time