Tableau for Beginners (session 3)

Tableau
beginner
Published

September 26, 2024

Warning

This is currently a draft version of the materials for this training session, and so may contain minor errors and inaccuracies.

No feedback found for this session

Session outline

  • today’s session is all about data wrangling
  • we’ll take the raw data files used to produce the data we used last week
  • we’ll import them into Tableau and organize them
  • then we’ll tidy them up as needed
  • we’ll aim to re-create the GP practice map from last week as a final objective

Session files

Getting started

Task
  1. go to https://public.tableau.com/app/
  2. select Web Authoring from the Create menu
    Web Authoring
  3. from the Data menu, select New Data Source
    Select `New Data Source
  4. drag and drop (or use the upload button) to add the 1 - practices.csv file
  5. repeat to add the other three files

Add each files to the data model

Task
  1. now double-click each of your data files to add them to the data model
    add each data file to the data model
  2. Tableau should guess the relationships in your data, and show these relationships by unbroken lines
  3. but one of these won’t work properly potentially broken relationship
  4. link the two Practice Code fields together to relate these parts of the databuild a relationship between practice code fields
  5. you should end up with four data sources, all related together four related data sources

Trying out our data

Task
  1. create a new worksheet create a new worksheet
  2. re-create the map:
    1. Latitude to Rows - and ensure you set this as a dimensioncset latitude as a dimension
    2. Longitude to Columns - again, dimension
    3. GP Practice Name to Detail
    4. HB Name to Colour

Demographics

Task
  1. demographics has lots of cols
  2. unlike the example in session 2, we don’t have a Measure Values item to play with
  3. go back to the data, which is hard to work with
  4. hide all the QFs
  5. now pivot the data
  6. it’ll take some time to perform the extract, but once you’ve returned to your worksheet, you should now be able to plot the demographics
    but it really might take some time

Going an easier way

  • we recommend using Excel and PowerQuery (or other tools) to pivot your data if possible
  • if you don’t have this available, you can find a pre-pivoted dataset in the training materials (3b - demographics_p.csv)
Task
  1. Open your dataset (3 - demographics.csv) with Excel
  2. convert the data to a table by selecting somewhere within it, and pressing Ctrl+T
    Convert to a table
  3. then select Data >> From Table / Range to open PowerQuery
    Open PowerQuery
  4. Use the Remove columns tool to remove the id, Date, HSPC, and HB columns
    Remove unneeded columns

Re-shaping data

Task
  1. Select the PracticeCode and Sex columns, then select Unpivot Other Columns
    unpivot other columns
  • That should give us our data in long form:
    Long form data
  • we can now remove unwanted values by filtering
    Remove values by filtering
Task
  1. From the Attribute column, please remove the QF rows,
  2. From the Attribute column, please remove the AllAges rows
  3. From the Sex column, please remove the All rows

Transforming values

  • the last type of reshaping to show off is about transforming values
Task
  1. Split the Attribute column by Digit to Non-digit
    Split the attribute column
  2. replace values in the new Attribute.1 column to remove the “Ages”
  3. then in Attribute.2, two replacements to remove the “to”s and replace the “plus” with 99
  4. then rename the columns to age_upper and age_lower
  5. finally, change the data type to whole number for each of those new columns

Homework tasks