Visualising data about Greater Manchester authorities on maps with QGIS
This post explains the techniques used at a recent Data Expedition; how a a group visualised grant award data and where the spending might be most needed for the ten authorities that make up Greater Manchester. I will explain the process of preparing the health data set we used, preparing the map, linking the data sets together, and presenting it all in a GIS package. It is intended as an introductory demonstration of such techniques, inspiring you to go further by yourself. It assumes you have a good working knowledge of pivot tables and formulas in Excel.
- A spreadsheet tool capable of creating pivot tables, such as Microsoft Excel.
- A text editor, such as Notepad (Windows) or TextEdit (Mac).
- The data you wish to study, with some kind of geographic identifier (such as post code, authority code, country name).
- A set of shapefiles for the geographic areas in your study, with the same kind of identifier as the data you wish to study, or a way to convert to it.
- A GIS package, such as the open source QGIS.
2. Preparing the source data
Some discovery tools include the option to visualise their data on maps directly. This is great, but we want to be able to collect all the data of our study to visualise it for ourselves. This will allow us to create new variables (perhaps ratios of datatypes from two different sources) and to aggregate every source to the same levels.
2.1 Find and download the source data
We browsed MappingGM, a wonderful tool that displays many available data sets relating to Greater Manchester. Under the ‘People and communities’ section we found the Access to Health Assets and Hazards Index to be our preferred measure of health across Greater Manchester. The metadata section of MappingGM provided a link to the source data. We created an account with the data holder and downloaded the CSV file. We are interested in the first and last columns: area ‘lsoa11’ and health index ‘ahah’, so you can delete the other columns.
2.2 Filter and group the data to the ten GM authorities
The data we have is detailed for Lower Super Output Areas (LSOA) (2011 definition), that is, areas with a mean population of 1,500. It is also for the whole of England and Wales. We want data at Ward level (2015 definition) for the ten authorities of Greater Manchester. We can download a table of conversion from LSOA (2011) to Ward (2015) as another CSV file, load that in Excel, define it as a Table, then apply it to a VLOOKUP for our health data set, matching on the LSOA2011 field.
Now, the health data set has two new columns populated with VLOOKUP data for the ward area code ‘LAD15CD’ and ward name ‘LAD15NM’. You can filter the name column to select each of the ten districts Bury, Bolton, and so on. This can be a slow operation, so be patient with Excel, remembering to save as a standard Excel workbook xlsx file.
To summarise the data into the ten wards, we will make a pivot table. Select and copy all your health data (including the headings), add a new sheet, then do ‘Paste As Values’ (may appear under ‘Paste Special’). This will permanently remove any filtered out wards outside of Greater Manchester, and “fix” the VLOOKUP data in place. The new data needs to be defined as a Table.
In a new sheet, insert a Pivot Table reading from the newly created Table with the following details:
- In the ‘Rows’ box, choose the ward code ‘LAD15CD’ then ward name ‘LAD15NM’.
- In the ‘Values’ box, choose the health index ‘ahah’. Summarise the health data with ‘Average’ instead of ‘Sum’
- In the ‘Pivot Table Design’ tab: set ‘Report Layout’ as ‘Show in Tabular Form’, set ‘Subtotals’ as ‘Don’t Show Subtotals’, set ‘Grand Totals’ as ‘Off for Rows and Columns’.
- Rename the new columns to something sensible (‘LAD15CD’, ‘LAD15NM’ and ‘ahah_avg’).
2.3 Export the correct data from the spreadsheet tool
The last stage of preparation in Excel is to select all the pivot table data, copy, create a new sheet, select ‘Paste As Values’. This will ensure that the data you see are the exact cells that we will be giving to the GIS package. Finally, save this sheet as a CSV file, ignoring any warnings about formatting and other sheets, then close the workbook.
3. Preparing the base map
3.1 Download the Greater Manchester shapes
After reading a post from a previous Open Data Manchester session, I found a set of shapes for Greater Manchester authorities (in GeoJSON format). This file contains vectors with the outlines of all ten authorities, each labelled with the LD15CD code that we have prepared already. Click the ‘Download’ link and save the .geojson file.
3.2 Create a new GIS project
Open QGIS and created a new, blank project. To bring in the shapefile, click ‘Layer > Add Layer > Add Vector Layer’, browse for your downloaded .geojson file (you may need to change the ‘Files of type’ filter to show GeoJSON files). The shapes should appear, all the same colour (not necessarily the purple in the image above).
You may be thinking that the shape of Greater Manchester is skewed. That’s because QGIS is rendering the map with a projection that you don’t usually see on maps of Britain. To change this, go to the bottom-right of the screen, click in the box next to the label ‘Render’, tick ‘Enable ‘on the fly’ CRS transformation, select ‘OSGB 1936 / British National Grid’ and click ‘OK’.
3.3 Add the study data
Before we bring in quantitative study data, there is a tricky extra step: we have to create a small text file that describes the CSV file. Create a text file with one line, like:
"String", "String", "Real"; the data type for each of the columns, separated by commas, each in double-quotes. The data types could be
Real. Save the file with same name as the CSV file, except with the extension .csvt; you may need to save as .txt then rename the file later. Save these two files to the same folder. QGIS will find the .csvt file when you choose the .csv file. (If we don’t do this step, all columns are treated as strings.)
Back in QGIS, click ‘Layer > Add Layer > Add Delimited Text Layer’. Browse to the CSV file, look at the preview to check the right content is coming in. For ‘File format’, click ‘CSV’. For ‘Geometry definition’, click ‘No geometry’ (because we are using the shape files for this). Finally, click ‘OK’. The list of layers (left of the main window) is updated.
4. Joining the source data and the base map
We have two layers: a shape layer, and, above it, a data layer. We now need to tell QGIS how to “join” the layers; in other words, we need to tell QGIS which field is common to both data sets.
In the ‘Layers Panel’ (left of the main window), double-click the shape layer to open the ‘Properties’ window. Click on the ‘Joins’ section and click the green plus icon to add a vector join. For ‘Join field’, select the field from the study data ‘LAD15CD’. For ‘Target field’, select the equivalent field ‘CODE’. Click ‘OK’, and you should now be able to access all the study data fields in the map layer.
5. Styling the map
5.1 Apply a colour style
We can style the map layer with the health index column ‘ahah_avg’. We will add a colour scale to shade these map areas from green through to red (remember that a high index value is “bad”).
- Double-click the map layer again, go to the ‘Style’ section.
- Change the top option from ‘Simple’ to ‘Graduated’.
- Change the ‘Column’ to the health index column (its name ‘ahah_avg’ is preceded by the CSV file name). (Only numerical columns can be selected; if yours does not appear, check that your .csvt file was created and saved correctly. Otherwise, you will have to use the top option ‘Categorized’.)
- Change the ‘Color ramp’ to red -> orange -> yellow -> green and click ‘Invert’.
- If you are happy with 5 classes, click ‘Classify’. The range of your column will be used to create 5 classes with equal linear range (not necessarily with equal distribution).
- Click ‘Apply’ or ‘OK’ to see the results.
5.2 Apply a label
We can use a similar procedure to add labels to each authority.
- Double-click the map layer again, go to the ‘Labels’ section.
- Change the top option from ‘No labels’ to ‘Show labels for this layer’.
- Change ‘Label with’ to the ward name variable ‘LAD15NM’ (preceded with the CSV filename).
- Click ‘Apply’ or ‘OK’ to see the results.
5.3 Save the results
You can save the project to a .qgs file. You can export maps as PNG, JPEG or TIFF format images. If you want to add a legend, title or other details, use the Print Composer function. Full QGIS documentation is available that explains how to use the Print Composer and all other features.
6. Going further
We also followed a similar process to study grant award data from 360Giving. This data was available at the ward level, not as small areas as the LSOA offered from the Access to Health Assets and Hazards Index. I also had trouble getting the shapefiles from LSOA to match up with the health index area, so I decided to focus on showing the average value per ward.
This exercise was conducted as part of a one-day workshop, so there are many ways it can all be improved.