Example of displaying health data for Greater Manchester over OpenStreetMap data

Visualising data about Greater Manchester authorities on maps with QGIS

This post explains the techniques used at a recent Data Expedition; how a a group visualised grant award data and where the spending might be most needed for the ten authorities that make up Greater Manchester. I will explain the process of preparing the health data set we used, preparing the map, linking the data sets together, and presenting it all in a GIS package. It is intended as an introductory demonstration of such techniques, inspiring you to go further by yourself. It assumes you have a good working knowledge of pivot tables and formulas in Excel.

  1. Requirements
  2. Preparing the source data
  3. Preparing the base map
  4. Joining the source data and the base map
  5. Styling the map
  6. Going further

1. Requirements

The tools you will require to analyse and present your data.
  • A spreadsheet tool capable of creating pivot tables, such as Microsoft Excel.
  • A text editor, such as Notepad (Windows) or TextEdit (Mac).
  • The data you wish to study, with some kind of geographic identifier (such as post code, authority code, country name).
  • A set of shapefiles for the geographic areas in your study, with the same kind of identifier as the data you wish to study, or a way to convert to it.
  • A GIS package, such as the open source QGIS.

^ back to contents

2. Preparing the source data

Some discovery tools include the option to visualise their data on maps directly. This is great, but we want to be able to collect all the data of our study to visualise it for ourselves. This will allow us to create new variables (perhaps ratios of datatypes from two different sources) and to aggregate every source to the same levels.

Find and download a CSV file with source data.

We browsed MappingGM, a wonderful tool that displays many available data sets relating to Greater Manchester. Under the ‘People and communities’ section we found the Access to Health Assets and Hazards Index to be our preferred measure of health across Greater Manchester. The metadata section of MappingGM provided a link to the source data. We created an account with the data holder and downloaded the CSV file. We are interested in the first and last columns: area ‘lsoa11’ and health index ‘ahah’, so you can delete the other columns.

The data we have is detailed for Lower Super Output Areas (LSOA) (2011 definition), that is, areas with a mean population of 1,500. It is also for the whole of England and Wales. We want data at Ward level (2015 definition) for the ten authorities of Greater Manchester. We can download a table of conversion from LSOA (2011) to Ward (2015) as another CSV file, load that in Excel, define it as a Table, then apply it to a VLOOKUP for our health data set, matching on the LSOA2011 field.

The health data is arranged by LSOA2011. Download a conversion table, open in Excel…
If the LSOA to Ward data is defined as Table 1, you can get the LAD15CD and LAD15NM columns in your health data set with VLOOKUP.

Now, the health data set has two new columns populated with VLOOKUP data for the ward area code ‘LAD15CD’ and ward name ‘LAD15NM’. You can filter the name column to select each of the ten districts Bury, Bolton, and so on. This can be a slow operation, so be patient with Excel, remembering to save as a standard Excel workbook xlsx file.

For the filtered, copied health data, use a pivot table to group the wards and get average health index.

To summarise the data into the ten wards, we will make a pivot table. Select and copy all your health data (including the headings), add a new sheet, then do ‘Paste As Values’ (may appear under ‘Paste Special’). This will permanently remove any filtered out wards outside of Greater Manchester, and “fix” the VLOOKUP data in place. The new data needs to be defined as a Table.

In a new sheet, insert a Pivot Table reading from the newly created Table with the following details:

  • In the ‘Rows’ box, choose the ward code ‘LAD15CD’ then ward name ‘LAD15NM’.
  • In the ‘Values’ box, choose the health index ‘ahah’. Summarise the health data with ‘Average’ instead of ‘Sum
  • In the ‘Pivot Table Design’ tab: set ‘Report Layout’ as ‘Show in Tabular Form’, set ‘Subtotals’ as ‘Don’t Show Subtotals’, set ‘Grand Totals’ as ‘Off for Rows and Columns’.
  • Rename the new columns to something sensible (‘LAD15CD’, ‘LAD15NM’ and ‘ahah_avg’).
Export the values of the pivot table to a CSV file. (Yes, Salford is missing in these images, apologies! Well done if you noticed.)

The last stage of preparation in Excel is to select all the pivot table data, copy, create a new sheet, select ‘Paste As Values’. This will ensure that the data you see are the exact cells that we will be giving to the GIS package. Finally, save this sheet as a CSV file, ignoring any warnings about formatting and other sheets, then close the workbook.

^ back to contents

3. Preparing the base map

Thanks to Jamie Whyte and Open Data Manchester for their helpful prior work.

After reading a post from a previous Open Data Manchester session, I found a set of shapes for Greater Manchester authorities (in GeoJSON format). This file contains vectors with the outlines of all ten authorities, each labelled with the LD15CD code that we have prepared already. Click the ‘Download’ link and save the .geojson file.

Add a new vector layer with the Manchester shapefile. Change the projection to get a familiar skew.

Open QGIS and created a new, blank project. To bring in the shapefile, click ‘Layer > Add Layer > Add Vector Layer’, browse for your downloaded .geojson file (you may need to change the ‘Files of type’ filter to show GeoJSON files). The shapes should appear, all the same colour (not necessarily the purple in the image above).

You may be thinking that the shape of Greater Manchester is skewed. That’s because QGIS is rendering the map with a projection that you don’t usually see on maps of Britain. To change this, go to the bottom-right of the screen, click in the box next to the label ‘Render’, tick ‘Enable ‘on the fly’ CRS transformation, select ‘OSGB 1936 / British National Grid’ and click ‘OK’.

Create a description file for your CSV file. Create a layer from the CSV file. The list of layers is updated.

Before we bring in quantitative study data, there is a tricky extra step: we have to create a small text file that describes the CSV file. Create a text file with one line, like: "String", "String", "Real"; the data type for each of the columns, separated by commas, each in double-quotes. The data types could be String, Integer or Real. Save the file with same name as the CSV file, except with the extension .csvt; you may need to save as .txt then rename the file later. Save these two files to the same folder. QGIS will find the .csvt file when you choose the .csv file. (If we don’t do this step, all columns are treated as strings.)

Back in QGIS, click ‘Layer > Add Layer > Add Delimited Text Layer’. Browse to the CSV file, look at the preview to check the right content is coming in. For ‘File format’, click ‘CSV’. For ‘Geometry definition’, click ‘No geometry’ (because we are using the shape files for this). Finally, click ‘OK’. The list of layers (left of the main window) is updated.

^ back to contents

4. Joining the source data and the base map

Add a vector join between the map shapes layer and the study data.

We have two layers: a shape layer, and, above it, a data layer. We now need to tell QGIS how to “join” the layers; in other words, we need to tell QGIS which field is common to both data sets.

In the ‘Layers Panel’ (left of the main window), double-click the shape layer to open the ‘Properties’ window. Click on the ‘Joins’ section and click the green plus icon to add a vector join. For ‘Join field’, select the field from the study data ‘LAD15CD’. For ‘Target field’, select the equivalent field ‘CODE’. Click ‘OK’, and you should now be able to access all the study data fields in the map layer.

^ back to contents

5. Styling the map

Style the map layer with the health index column.

We can style the map layer with the health index column ‘ahah_avg’. We will add a colour scale to shade these map areas from green through to red (remember that a high index value is “bad”).

  • Double-click the map layer again, go to the ‘Style’ section.
  • Change the top option from ‘Simple’ to ‘Graduated’.
  • Change the ‘Column’ to the health index column (its name ‘ahah_avg’ is preceded by the CSV file name). (Only numerical columns can be selected; if yours does not appear, check that your .csvt file was created and saved correctly. Otherwise, you will have to use the top option ‘Categorized’.)
  • Change the ‘Color ramp’ to red -> orange -> yellow -> green and click ‘Invert’.
  • If you are happy with 5 classes, click ‘Classify’. The range of your column will be used to create 5 classes with equal linear range (not necessarily with equal distribution).
  • Click ‘Apply’ or ‘OK’ to see the results.
Add map labels with the ward names column.

We can use a similar procedure to add labels to each authority.

  • Double-click the map layer again, go to the ‘Labels’ section.
  • Change the top option from ‘No labels’ to ‘Show labels for this layer’.
  • Change ‘Label with’ to the ward name variable ‘LAD15NM’ (preceded with the CSV filename).
  • Click ‘Apply’ or ‘OK’ to see the results.

You can save the project to a .qgs file. You can export maps as PNG, JPEG or TIFF format images. If you want to add a legend, title or other details, use the Print Composer function. Full QGIS documentation is available that explains how to use the Print Composer and all other features.

^ back to contents

6. Going further

QGIS lets you easily include OpenStreetMap data beneath your semi-transparent shapes.

We also followed a similar process to study grant award data from 360Giving. This data was available at the ward level, not as small areas as the LSOA offered from the Access to Health Assets and Hazards Index. I also had trouble getting the shapefiles from LSOA to match up with the health index area, so I decided to focus on showing the average value per ward.

This exercise was conducted as part of a one-day workshop, so there are many ways it can all be improved.

^ back to contents

Librarian Data Specialist, The University of Manchester. Supporting teaching, learning and research with financial databases, digital skills and scholarship.