Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open...

39
www.thiswormyworld.org Practical 2 Managing data tables and creating spatial data sets using QGIS

Transcript of Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open...

Page 1: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

www.thiswormyworld.org

Practical 2

Managing data tables and creating

spatial data sets using QGIS

Page 2: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 2

Aim of practical

A key step in epidemiological analyses is to visualise the spatial patterns of infection

and/or disease. This allows for an appreciation of any spatial trends that might be

present, identification of obvious errors, and generation of hypotheses about factors

that may influence the observed patterns. Visualisation is also important for

communicating the findings to the target audience.

Here, we will visualise morbidity data potentially associated with lymphatic filariasis

that have been collected in health facilities in 2015 so as to explore the spatial

distribution in the incidence of various morbidities related to lymphatic filariasis

(LF), and produce maps to support monitoring of the clinical management of

morbidity cases by LF control programme.

Note: all data used in this practical have been made up for practical purpose and

will not correspond to real secondary data.

Key learning skills

In this practical, you will:

• Convert Excel spreadsheets in compatible formats for QGIS (.CSV files).

• Import and join tables, create spatial joins and summarise attribute data.

• Manage data in the attribute table, including removing and adding new fields.

• Usesymbology to visualize quantitative data as point maps and choropleth

maps.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported

License. This means that users are free to copy and share this material with others. Requests for

creating new derivatives should be sent to Jorge Cano Ortega ([email protected]).

Page 3: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 3

Data set description

For this practical, we will be working with the geographical data provided in the

first practical session and some “made-up” epidemiological data which simulate

reported cases of lymphoedema and hydrocele potentially associated with lymphatic

filariasis collected from health facilities across Cameron in 2015.

• Geographical data set: Cameroon district boundaries.

• Epidemiological data set: number of lymphoedema and hydrocele cases recorded

at community level.

• Demographic data set: population estimate by district. To obtain these

estimates, we have used the gridded population density data sets available at

the WorldPop project (http://www.worldpop.org.uk/). Datasets are provided at

continental and country scale.

File name Description Format Columns

CMR_LF_coordinates.csv

Text file containing coordinates for communities where morbidity cases were recorded

CSV format

RecordID

Community (name)

Latitude

Longitude

CMR_LF_cases.csv

Text file containing coordinates for 198 villages reporting LF-related morbidity cases in Cameroon in 2015

CSV format

RecordID

Community (name)

PoT (intervention status)

Period

Cases (total LF cases)

Hydrocele (number)

Lymphedema (number)

CMR_District_population.csv Text file containing the population estimates in 2015 for each district

CSV format

District

DistrictID

EstPop2015

CMR_adm2.shp

District boundaries for Cameroon

ESRI shapefile

Page 4: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 4

Practical 2

1. Displaying geographical data and changing symbology

Display the shapefile for Cameroon district boundaries (CMR_adm2.shp).

• Click Add data button .

• A window will open to establish location of data and select the source type. ‘File’

should be selected as Source Type in the new Add Vector Layer dialog. Browse

to select the vector layer.

The outline of Cameroon with district boundaries will appear in the data view. We

now need to add information about the sampled locations.

• As we saw in Practical 1, you can modify the Symbol style double-clicking on

‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window.

You can also access other setting of this layer including Labels, Fields, General

Settings, Metadata, Joins, Actions, Diagrams, etc.

• Click on the Style tab and select a pale yellow or ochre/bone as fill color.

Page 5: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 5

• Add labels to your map by right clicking on CMR_adm2.shp to bring up the Layer

Properties menu.

• Click on the Labels tab and choose the option Show labels for this layer from

the drop-down menu. Choose NAME_2 field as the field containing label. You can

set Label Style and change text font, style, size and other label properties under

Text option. Set 8 as label Size and bold as Style.

Symbol

properties

Page 6: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 6

• To improve readability of labels add a white buffer surround, by clicking Buffer

and checking off Draw text buffer option. Set3 as buffer size at 3and buffer’s

fill colour as white.

• Click Apply button, and OK.

Page 7: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 7

• Zoom in a to improve the readability of the layer.

Page 8: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 8

2. Geo-referencing epidemiological data

2.1. Geopositioning a set of locations from a data table

GIS data often come in a table or an Excel spreadsheet. If you have a list of

latitude/longitude coordinates and some attributes, you can easily use these data

in your GIS project.

Examine your tabular data source. To import these data into QGIS save them as a

text file which includes at least 2 columns containing the X and Y coordinates. If you

have a spreadsheet, use ‘Save As’ function in your program to save it as a ‘Tab

Delimited File’ or a ‘Comma Separated Values (.CSV)’ file.

Create a layer from a file containing the communities’ coordinates collected during

the mapping survey. A file called CMR_LF_coordinates.csv, which contains the

location coordinates where morbidity cases related to LF infection were reported,

has been provided as a delimited text file (.CSV).

To view a delimited text file as layer, the text file must contain:

1. A delimited header row of field names.

2. The header row must contain an X (LONGITUDE/LONG) and Y (LATITUDE/LAT)

field or a Well Known Text (WKT) field. These fields can have any name.

3. The x and y coordinates must be specified as a number.

Note: The village coordinates were collected using a geographical coordinate system;

WGS84. This coordinate system is set up by default in the most widely used handheld

Page 9: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 9

GPS receivers although it can be displayed in different formats (e.g. degree, minutes

and seconds, degree and minutes or in decimal degrees).

Load this text file into your map project by two different ways:

• Layer menu > Add Delimited Text Layer or

• Click on Add Delimited Text Layer button in the lateral Menu bar, or

under Layer in the main Menu bar.

• In the dialogue box, click on Browse and specify the path to the text file. Then

in the delimiters section, check the Tab delimiter. If your data are in CSV format

check the comma as the delimiter. The plugin will try to guess the correct x and

y coordinate fields. Change them if the plugin selects the wrong fields. Click OK.

Page 10: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 10

• Select the Coordinate Reference System 1 (either geographical or projected

system, depending on the coordinate system of the spatial data) our example

uses geographical coordinate system, WGS84, we select that. Click OK.

• The data will be imported and displayed in the QGIS canvas. To see the locations

clearly - turn off the labels in the Layer Properties menu by clicking on

CMR_adm2.shp in the legend and selecting the Label Features tab, select the

Labels tab and No labels option from the drop-down menu.

1 More information about Coordinate Reference Systems is provided in a supplemental file.

Page 11: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 11

• The locations are now displayed as a temporary layer. To make this layer

permanent, save it as a new layer. Right click on the layer and select Save As to

save it as a shapefile (in the folder... “C:\...\Cameroon _project\Vector_layers”).

• Select ESRI Shapefile format and name the shapefile

CMR_LF_villages_coordinates.shp and enable the option Add saved file to map.

Click OK to save and load the new shapefile positioning the surveyed locations.

Remove the temporal layer CMR_LF_coordinates.csv from the Layers Panel. by

right clicking on CMR_LF_coordinates.csv and selecting the Remove option from

the layer menu.

Page 12: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 12

2.2. Creating spatial joints: joining epidemiological data to layers

The new shapefile only contains spatial information: coordinates

(latitude/longitude); and the name of the village and unique ID where LF-related

morbidity data were collected. You need to join the epidemiological information

from the CMR_LF_cases.csv file to create a large table containing all the necessary

information.

• CMR_LF_cases.csv; contain the number of lymphedema and hydrocele cases

presumably related to LF infection at a community level.

Note: Use QGIS Browser to explore this file.

Page 13: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 13

• A major constraint of QGIS when importing .CSV

files is that any data included in the table is

stored as string data regardless of whether the

data are numeric (i.e. integer, double, real,

etc.) or date. To overcome this , we can create

a .TXT file by using the Notepad application in

which we are going to define the data type of

each field separated by commas.

• The new file MUST be saved using the same name as the .CSV file with the

epidemiological data and as .CSVT format. The new .CSVT file MUST be saved at the

same location as the .CSV. In this way, when importing the .CSV file into our map

project, QGIS will recognise the data type of every field contained.

For instance, CMR_LF_cases.csv includes the number of lymphedema and hydrocele

cases reported by community which is a numeric (integer) data type. We should create

a .CSVT file named CMR_LF_cases.csvt which would include the data type of the fields,

and store it together with the .CSV file.

Page 14: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 14

The .CSVT files needed for this practical are provided with the data tables

(“C:\…\Cameroon_project\Tables”).

The Joins tab in layer Properties allows you to join a loaded attribute table to a

loaded vector layer.

As key column you have to define a join layer, a join field and a target field.

In order to join the two files, you must ensure that both your data and your shapefile

share a field or column with common values, called a key. This is often a name or

ID code. In your data table, these identifiers must be unique, meaning one row per

name or ID. Joining works by adding fields from your data table to the shapefile’s

attribute table based on matching values found in the key columns.

Note: Use QGIS Browser to explore data tables of these files and find out the field with

common values. Also, you can look at the shapefile’s attribute table by right-clicking on

the layer name and selecting Open attribute table.

According to our data, we can join CMR_LF_cases.csv with the recently created

shapefile CMR_LF_villages_coordinates.shp using the recordID field.

To join data contained in CMR_LF_cases.csv to the shapefile

CMR_LF_villages_coordinates.shp you must:

• Add CMR_LF_cases.csv using Add Delimiter Text Layer button.

• Click Browse, find and open your .CSV files. Select the following options:: File

format - CSV (comma separated values); Record options – First record has field

names; and Geometric definitions – No geometry (attribute only table). Click to

import CMR_LF_cases.csv into QGIS.

Page 15: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 15

• At CMR_LF_villages_coordinates.shp, open the shapefile’s Properties by double-

clicking on the layer name.

• Select Joins tab in the shapefile’s Properties.

• Click the + button to add a new join.

• You will be prompted with the Add vector join tool.

• Join layer will be your “.CSV” data layer. In this case CMR_LF_cases.csv.

Page 16: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 16

• Join field is the key field to be joined on in your “.CSV” data. In this case

VillageID.

• Target field is the key field to be joined in your shapefile. In this case, select

VillageID. Target and Join field do not necessarily have to be the same.

• Click OK.

Page 17: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 17

• Open the shapefile’s ATTRIBUTE TABLE by right-clicking on the layer and

accessing layer properties of CMR_LF_villages_coordinates.shp and check that

your data have been properly joined.

• To make these joins permanent you must save a new copy.

• Right-click on CMR_LF_villages_coordinates.shp and select Save as.

• Choose a filename and location (“C:\...\Cameroon_project\Vector_layers") for

your new shapefile, and click OK. This new shapefile with the epidemiological

data will be called CMR_Community_LF_morbidity.shp

Shortcomings of QGIS’ join

function is it always mess

with the name of the

fields. We will rename

them back to what they

were originally!

Page 18: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 18

• Remove CMR_LF_villages_coordinates.shp and the data table CMR_LF_cases.csv

from the Layers Panel.

We need to rename the fields back to their original names. One major shortcomings

of QGIS’ join function is prefixes the fields using the file’s name. Address this

problem by:

• Double-click on the CMR_Community_LF_morbidity layer and select the Field

tab. This section allows the user to apply edits to the field names.

• Click on the Pencil icon, double-click to on the field name to rename it

accordingly:

� Rename: ‘CMR_LF_c_1’ to ‘PoT’

� Rename: ‘CMR_LF_c_2’ to ‘period’

� Rename: ‘CMR_LF_c_3’ to ‘cases’

� Rename: ‘CMR_LF_c_4’ to ‘hydrocele’

� Rename: ‘CMR_LF_c_5’ to ‘hyd_status’

� Rename: ‘CMR_LF_c_6’ to ‘lymphedema’

� Rename: ‘CMR_LF_c_7’ to ‘lym_status’

� Delete: ‘CMR_LF_cas’ (we will delete this field as it’s redundant!)

• Delete the field ‘CMR_LF_cas’ as it is redundant. To remove it click on

‘CMR_LF_cas’ and click on the Delete field icon . Click OK once edits are

complete.

Page 19: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 19

Each edit made on a layer MUST be saved and updated. Click on the Pencil icon to

come out of the edit mode and click Save to complete edits.

Page 20: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 20

• This newly created shapefile now has both the geometry and the data included

and can be directly imported into our project. It should consist of at least four

files with the same name but different extensions (.SHP, .SHX, .DBF, .PRJ).

Note: These files MUST always remain together to enable the layer to function normally.

• Save the map project as Cameroon_practical_2.qgs, in the

“C:\...\Cameroon_project\Projects” folder. Select File in menu bar and Save

project as.

Page 21: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 21

3. Graph options to represent health outcomes

To take a closer look at the data.

• Right-click on CMR_Community_LF_morbidity.shp in the Layers Panel and select

Open attribute table.

• We may be interested in identifying potential cluster of morbidity cases

presumably related to lymphatic filariasis infection or geographic areas which

may concentrate more cases of hydrocele or lymphedema.

A way to get a feel for the data is to visualise them on the map. We will start by

looking at the number of lymphedema cases by community.

• Double-click on the CMR_Community_LF_morbidity.shp layer title to bring up

the LAYER PROPERTIES box. Click on the tab to open the Style section.

• Decide how you wish the attribute to be presented. As we’re interested in the

range of lymphedema cases, we want symbols that represent quantities of a

numeric attribute value.

• Click Graduated. You have the choice of displaying communities with graduated

colours or graduated symbol size.

Page 22: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 22

• In Column option select lymphedema as your value to be represented.

• Click Class and Mode to choose a classification method (i.e. Equal Interval) and

number of intervals or classes (i.e. 4). The histogram can help you choose the

most appropriate method.

• Use the Colour Ramp to colour the graduated variable, or manually colour each

interval by double-clicking on each rank.

Page 23: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 23

• Change the symbol size by progressively increasing the size with increasing

number of incident lymphedema. Double-click on each symbol and a a pop will

provide access to the different settings which could be modified.

• Once you’re happy with the settings of graduated symbols, click OK.

Page 24: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 24

Now, it’s your turn. Try to display the number of hydrocele cases reported in

Cameroon.

Note: you can manually change the range double-click on the Range column for each

interval and a pop up (ENTER CLASS BOUNDS) will enable you to change the lower and

upper values. If you would want to change the label of each range, double-click on Label

column. You can also change the legends by double-clicking on them.

Page 25: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 25

Question - Can you see any obvious patterns in the distribution of hydrocele and

lymphedema cases? Do any areas appear to have higher or lower risk? Are these at-

risk areas the same for hydrocele and lymphedema?

• Open File in menu bar and then click on Save Project.

4. Aggregating epidemiological data; choropleths maps

We are interested in identifying districts which may require special attention by the

LF control programme and health care services due to their high prevalence of LF-

related morbidity cases. It would be helpful to show the district-level prevalence of

lymphedema and hydrocele on our map.

There are at least two ways to do this. You can aggregate the prevalence data

collected at community level by district within your Excel spreadsheet or your

statistic software (i.e. SPSS, Stata, EpiInfo). For this, you need to collapse the data

by district using the PivotTable function in Excel which can be found under the

Insert tab2. Then, you can sum up the number of each type of LF-related morbidity

reported for those communities which are located within the same district. Finally

divide by the population estimated in the district at the time of the survey and

multiply by 10,000 in order to calculate the prevalence by 10,000 inhabitants.

It is also possible to estimate this prevalence within QGIS using the data management

tool called Join attributes by location. This tool enables us to calculate different

types of statistics to summarise data such as sum, mean, min, max and median based

on their location.

In this section, we have focused on this second approach.

Note: It is important to check out if we have some features (village) for which we

have NULL values for the outcomes we want to aggregate, in our case Hydrocele and

Lymphedema. These features should be deleted, otherwise the Join attributes by

location tool will crash out when running the process.

For this, click on at the top of the field you wanted to sort by and the features will

be sorted in ascending order.

2 More details in the practical Data Management for mapping (Practical 5) available at the website of the Global Atlas of Helminth Infections project under Training section (GIS basic training)

Page 26: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 26

• Select the features which have NULL values in Hydrocele and Lymphedema fields.

For this hold the left button of your mouse down while you click on them.

• Click on the Toggle Editing Mode button. .

• Highlight the rows that have null values for hydrocele and lymphedema and click

on the Delete selected features tool.

• Click Toggle Editing Mode button to accept the changes undertaken on the

attribute table. Click on Save button.

• Under Vector option in the main menu bar, open the Join attributes by location

tool from the list of Data Management Tools.

Page 27: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 27

• Wenow have to set CMR_adm2.shp as Target vector layer and

CMR_Community_LF_morbidity.shp as Join vector layer. Select intersect under

Geometric predicate. Under Attribute summary select the option Take

summary of intersecting features - you will see the Statistics for summary

(comma separated) [optional] and a list: i.e. ‘sum, mean, min, max, median’.

In this instance, we are only interested in the SUM function. Keep this, and delete

the rest. The SUM option will calculate the overall sums for numeric features in

CMR_Community_LF_morbidity.shp which are “located within” districts defined

in CMR_adm2.shp.

• Save the output shapefile as CMR_LF_Districts_level.shp in the folder

“C:\...\Cameroon_project\Vector_layers.”

Page 28: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 28

• Ensure only keep matching records is checked. This will make the new shapefile

keep only those features (districts) where morbidity data were recorded.

• The new shapefile created will be loaded into our map project.

• Open the shapefile’s ATTRIBUTE TABLE by right-clicking on the layer and

accessing layer properties of CMR_LF_Districts_level.shp.

You can see that the attribute values from the spatial join layer have been

aggregated and summed according to districts in CMR_adm2.shp. One of the

shortcomings of this aggregation tool is that the fields derived from the aggregation

are often assigned with inaccurate fields names.

For instance, the summed columns derived from the target fields in

CMR_Community_LF_morbidity layer (i.e. latitude, longitude, cases, hydrocele and

lymphedema) have been assigned the wrong following names:

Target field names Wrong field name assignment What the correct field name should be…

latitude sumlongitude sumlatitude

longitude sumcases sumlongitude

cases sumhydrocele sumcases

hydrocele sumlymphedema sumhydrocele

lymphedema sumlatitude sumlymphedema

Page 29: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 29

• Correct the field names before removing any redundant columns. Click on

Processing tab, and select Toolbox. A window called Processing Toolbox will

appear.

• Use the search bar in the above panel of the Processing toolbox, and type

“Refactor fields”. The search should bring up the application - Refactor fields

This function that allows the user to make further modifications to the attributes

in the layers, as well as, changing fields names and to add or delete redundant

fields etc. Click on this function to open a menu.

Page 30: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 30

List of wrong field names Assigning correct field name

sumlongitude Latitude

sumcases Longitude

sumhydrocele Cases

sumlymphedema Hydrocele

sumlatitude Lymphedema

• In Refactor fields menu, select CMR_LF_Districts_level as the Input layer to be

modified. Correct the field names so that they are line with those in the above

table deleting any extraneous variables.

• ‘sumlatitude’ should be renamed to ‘Lymphedema’; ‘sumlymphedema’ to

‘Hydrocele’ and so on. Simply double-click on the fields to perform this

function.

• You can now delete the redundant ‘Latitude’ and ‘Longitude’ fields by clicking

the . Click OK, and new layer called “Refactored” will be added to the

Layer Panel. Save it as “CMF_LF_District_morbidity.shp”.

e.g. Rename ‘sumlatitude’

to ‘Lymphedema’

‘Count’ tells us the number

of LF community surveys

that was aggregated in a

district

‘Count’ tells us the number

of LF community surveys

that was aggregated in a

district

Page 31: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 31

• Open File in menu bar and then click on Save Project.

You can now visualise the district-level aggregated data on the map (i.e. produce

the choropleths map). Follow the same procedures you used to display the

community-level attribute data on the map (see section 3). Think carefully about

the most appropriate way to categorise your data (i.e. graduated symbol).

Note: It is useful to look at a histogram of the distribution of your data when choosing your

classification. Alternatively, you might want to adjust the range manually the range as we

did in the previous section.

• To establish a range manually double-click on each Range, having previously

fixed the number of Classes (see section 3).

• To change the label names, double-click on each Legend (see section 3).

Note: It is useful to save the defined style so that it can be used with other fields (i.e.

Hydrocele) and other projects. The Save Style option appears in the Style dropdown. Assign

a filename for this new QGIS layer style file (.QML). To use the same symbology or style

you only have to deploy the same drop-down menu under the Style option and choose Load

Page 32: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 32

Style button and select the .QML file. You will have to choose the new classification field

that you would want to plot.

• To distinguish districts which have not yet reported any morbidity cases, change

the Style symbol (colour) to CMR_adm2.shp (i.e. a solid grey) to create a

background contrasting with the surveyed districts.

Page 33: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 33

The last map lacked epidemiological rigorousness and interest because it only

displayed the total number of cases recorded by district in Cameroon. A more robust

measure for reporting LF morbidity would be either prevalence, or incidence rates.

Prevalence indicates the probability that a member of the population has a given

condition at a point in time. It is a way of assessing the overall burden of disease in

the population, so is a useful measure for administrators when assessing the need

for services or treatment facilities. Epidemiologists make a distinction between

point prevalence, the proportion of the population at a point in time and period

prevalence which includes all previous cases that still have the condition and are

still members of the population. In contrast to prevalence, incidence is a measure

of the occurrence of new cases of disease (or some other outcome) during a span of

time.

In our example, each district has an estimate of population in 2015. We can

therefore estimate the prevalence of hydrocele and lymphedema at the time of the

survey by dividing the number of morbidity cases by the estimated population. As

we presume the frequency will be quite small we can multiply the resulting

prevalence by 100,000. Thus, the reporting output is interpreted as prevalence (per

100,000 inhabitants).

To do this load the CMR_District_population.csv file in the EpiData folder and create

a join with the CMR_LF_District_morbidity.shp layer.

Remember that you must create a .CSVT file specifying the nature of the data

include in each field of the CMR_District_population table as we showed previously

in the section 2.2. This file has also been provided within the EpiData folder.

Page 34: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 34

To join data contained in CMR_District_population.csv to the shapefile

CMR_District_LF_morbidity.shp, you must:

• Add CMR_District_population.csv using Add Delimited Text File button.

• Click Browse and open the .CSV files and open them. Remember to check “No

geometry (attribute only table)” as this .CSV do not contain any coordinates.

• At CMR_District_LF_morbidity.shp, open the shapefile’s Properties by double-

clicking on the layer name.

• Select Joins tab in the shapefile’s Properties, and Click the “+” button to add a

new join.

• You will be prompted with the Add vector join tool.

• Join layer will be your “.CSV” data layer. In this

case CMR_District_population.csv. Join field is the key field to be joined on in

your “.CSV” data. In this case DistrictID.

• Target field is the key field to be joined in your shapefile. In this case, select

ID_2. Target and Join field do not necessarily have to be the same.

“Integer”, “String”, “Integer”

Page 35: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 35

• Click OK.

• Open the shapefile’s ATTRIBUTE TABLE by right-clicking on the layer and

accessing layer properties of CMR_District_LF_morbidity.shp and check that your

data have been properly joined.

• To make these joins permanent you must save a new copy.

• Right-click on CMR_District_LF_morbidity.shp and select Save as. Choose a

filename and location (“C:\...\Cameroon_project\Vector_layers") for your new

Page 36: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 36

shapefile, and click OK. This new shapefile may be called

CMR_District_LF_prevalence.

• Remove CMR_District_LF_morbidity.shp and the data table

District_population.csv from the Layers Panel.

• The new shapefile will have two new fields; CMR_Distri (which contain district

names) and CMR_Dist_1 (which include the estimated population in 2015). We

only need the population data, and so rename CMR_Dist_1 to Pop2015 AND delete

CMR_Distri. We learnt how to remove and rename fields in the section 2.2 (pages

19 and 20).

Page 37: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 37

• Finally, we have to create calculated fields for the prevalence of both hydrocele

and lymphedema by 100,000 inhabitants. These are the outcomes we ultimately

want to display and analyse.

• For this, click on the Field Calculator and create a new field called PrevLYM

for the prevalence of lymphedema by district. Choose Decimal number (real) as

output field type and set an output field width of 10 and precision of 2.

• Use the Expression tab (left panel) to to calculate the prevalence of

lymphedema cases by 100,000 inhabitants using the

formula("Lymphedema"/"Pop2015")*100000, where Lymphedema is the

number of lymphedema cases and Pop2015 is the estimated population in 2015

for every district. Click OK to derive the prevalence estimates for PrevLYM.

• Repeat the above steps for prevalence of hydrocele by 100,000. Call the new

field PrevHYC.

Page 38: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 38

• Then, click again on the Toggle Editing Mode button to accept the changes

undertaken on the attribute table. Click on Save button in the small window that

eventually pops up.

You can now visualise the district-level aggregated data on the map (i.e. produce

the choropleths map). Follow the same procedures you used to display the

community-level attribute data on the map (see section 3). Think carefully about

the most appropriate way to categorise your data (i.e. graduated symbol). Try to to

display both prevalence data the following settings:

- Graduated symbol

- Field: PrevLYM, PrevHYC

- Colour ramp: YlOrRd

- Mode: Natural Breaks (Jenks)

- Classes: 4

Page 39: Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window. You can also access other setting of this layer including

Modern Tools for NTDs Control Programmes July 2017

www.thiswormyworld.org | 39

Choropleth map produced for the prevalence of lymphedema (per 100,000 inhabitants)

Choropleth map produced for the prevalence of hydrocele (per 100,000 inhabitants)

Question - Can you see any obvious patterns in the prevalence of hydrocele and

lymphedema cases? Do any districts appear to have higher or lower risk? Are these

at-risk areas the same for hydrocele and lymphedema?

Compare these maps with those produced based on the number of morbidity cases.