Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open...
Transcript of Practical 2. Managing data tables - 16062017 · ‘Symbol’ or right click on CMR_adm2.shp to open...
www.thiswormyworld.org
Practical 2
Managing data tables and creating
spatial data sets using QGIS
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 2
Aim of practical
A key step in epidemiological analyses is to visualise the spatial patterns of infection
and/or disease. This allows for an appreciation of any spatial trends that might be
present, identification of obvious errors, and generation of hypotheses about factors
that may influence the observed patterns. Visualisation is also important for
communicating the findings to the target audience.
Here, we will visualise morbidity data potentially associated with lymphatic filariasis
that have been collected in health facilities in 2015 so as to explore the spatial
distribution in the incidence of various morbidities related to lymphatic filariasis
(LF), and produce maps to support monitoring of the clinical management of
morbidity cases by LF control programme.
Note: all data used in this practical have been made up for practical purpose and
will not correspond to real secondary data.
Key learning skills
In this practical, you will:
• Convert Excel spreadsheets in compatible formats for QGIS (.CSV files).
• Import and join tables, create spatial joins and summarise attribute data.
• Manage data in the attribute table, including removing and adding new fields.
• Usesymbology to visualize quantitative data as point maps and choropleth
maps.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported
License. This means that users are free to copy and share this material with others. Requests for
creating new derivatives should be sent to Jorge Cano Ortega ([email protected]).
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 3
Data set description
For this practical, we will be working with the geographical data provided in the
first practical session and some “made-up” epidemiological data which simulate
reported cases of lymphoedema and hydrocele potentially associated with lymphatic
filariasis collected from health facilities across Cameron in 2015.
• Geographical data set: Cameroon district boundaries.
• Epidemiological data set: number of lymphoedema and hydrocele cases recorded
at community level.
• Demographic data set: population estimate by district. To obtain these
estimates, we have used the gridded population density data sets available at
the WorldPop project (http://www.worldpop.org.uk/). Datasets are provided at
continental and country scale.
File name Description Format Columns
CMR_LF_coordinates.csv
Text file containing coordinates for communities where morbidity cases were recorded
CSV format
RecordID
Community (name)
Latitude
Longitude
CMR_LF_cases.csv
Text file containing coordinates for 198 villages reporting LF-related morbidity cases in Cameroon in 2015
CSV format
RecordID
Community (name)
PoT (intervention status)
Period
Cases (total LF cases)
Hydrocele (number)
Lymphedema (number)
CMR_District_population.csv Text file containing the population estimates in 2015 for each district
CSV format
District
DistrictID
EstPop2015
CMR_adm2.shp
District boundaries for Cameroon
ESRI shapefile
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 4
Practical 2
1. Displaying geographical data and changing symbology
Display the shapefile for Cameroon district boundaries (CMR_adm2.shp).
• Click Add data button .
• A window will open to establish location of data and select the source type. ‘File’
should be selected as Source Type in the new Add Vector Layer dialog. Browse
to select the vector layer.
The outline of Cameroon with district boundaries will appear in the data view. We
now need to add information about the sampled locations.
• As we saw in Practical 1, you can modify the Symbol style double-clicking on
‘Symbol’ or right click on CMR_adm2.shp to open LAYER PROPERTIES window.
You can also access other setting of this layer including Labels, Fields, General
Settings, Metadata, Joins, Actions, Diagrams, etc.
• Click on the Style tab and select a pale yellow or ochre/bone as fill color.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 5
• Add labels to your map by right clicking on CMR_adm2.shp to bring up the Layer
Properties menu.
• Click on the Labels tab and choose the option Show labels for this layer from
the drop-down menu. Choose NAME_2 field as the field containing label. You can
set Label Style and change text font, style, size and other label properties under
Text option. Set 8 as label Size and bold as Style.
Symbol
properties
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 6
• To improve readability of labels add a white buffer surround, by clicking Buffer
and checking off Draw text buffer option. Set3 as buffer size at 3and buffer’s
fill colour as white.
• Click Apply button, and OK.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 7
• Zoom in a to improve the readability of the layer.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 8
2. Geo-referencing epidemiological data
2.1. Geopositioning a set of locations from a data table
GIS data often come in a table or an Excel spreadsheet. If you have a list of
latitude/longitude coordinates and some attributes, you can easily use these data
in your GIS project.
Examine your tabular data source. To import these data into QGIS save them as a
text file which includes at least 2 columns containing the X and Y coordinates. If you
have a spreadsheet, use ‘Save As’ function in your program to save it as a ‘Tab
Delimited File’ or a ‘Comma Separated Values (.CSV)’ file.
Create a layer from a file containing the communities’ coordinates collected during
the mapping survey. A file called CMR_LF_coordinates.csv, which contains the
location coordinates where morbidity cases related to LF infection were reported,
has been provided as a delimited text file (.CSV).
To view a delimited text file as layer, the text file must contain:
1. A delimited header row of field names.
2. The header row must contain an X (LONGITUDE/LONG) and Y (LATITUDE/LAT)
field or a Well Known Text (WKT) field. These fields can have any name.
3. The x and y coordinates must be specified as a number.
Note: The village coordinates were collected using a geographical coordinate system;
WGS84. This coordinate system is set up by default in the most widely used handheld
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 9
GPS receivers although it can be displayed in different formats (e.g. degree, minutes
and seconds, degree and minutes or in decimal degrees).
Load this text file into your map project by two different ways:
• Layer menu > Add Delimited Text Layer or
• Click on Add Delimited Text Layer button in the lateral Menu bar, or
under Layer in the main Menu bar.
• In the dialogue box, click on Browse and specify the path to the text file. Then
in the delimiters section, check the Tab delimiter. If your data are in CSV format
check the comma as the delimiter. The plugin will try to guess the correct x and
y coordinate fields. Change them if the plugin selects the wrong fields. Click OK.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 10
• Select the Coordinate Reference System 1 (either geographical or projected
system, depending on the coordinate system of the spatial data) our example
uses geographical coordinate system, WGS84, we select that. Click OK.
• The data will be imported and displayed in the QGIS canvas. To see the locations
clearly - turn off the labels in the Layer Properties menu by clicking on
CMR_adm2.shp in the legend and selecting the Label Features tab, select the
Labels tab and No labels option from the drop-down menu.
1 More information about Coordinate Reference Systems is provided in a supplemental file.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 11
• The locations are now displayed as a temporary layer. To make this layer
permanent, save it as a new layer. Right click on the layer and select Save As to
save it as a shapefile (in the folder... “C:\...\Cameroon _project\Vector_layers”).
• Select ESRI Shapefile format and name the shapefile
CMR_LF_villages_coordinates.shp and enable the option Add saved file to map.
Click OK to save and load the new shapefile positioning the surveyed locations.
Remove the temporal layer CMR_LF_coordinates.csv from the Layers Panel. by
right clicking on CMR_LF_coordinates.csv and selecting the Remove option from
the layer menu.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 12
2.2. Creating spatial joints: joining epidemiological data to layers
The new shapefile only contains spatial information: coordinates
(latitude/longitude); and the name of the village and unique ID where LF-related
morbidity data were collected. You need to join the epidemiological information
from the CMR_LF_cases.csv file to create a large table containing all the necessary
information.
• CMR_LF_cases.csv; contain the number of lymphedema and hydrocele cases
presumably related to LF infection at a community level.
Note: Use QGIS Browser to explore this file.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 13
• A major constraint of QGIS when importing .CSV
files is that any data included in the table is
stored as string data regardless of whether the
data are numeric (i.e. integer, double, real,
etc.) or date. To overcome this , we can create
a .TXT file by using the Notepad application in
which we are going to define the data type of
each field separated by commas.
• The new file MUST be saved using the same name as the .CSV file with the
epidemiological data and as .CSVT format. The new .CSVT file MUST be saved at the
same location as the .CSV. In this way, when importing the .CSV file into our map
project, QGIS will recognise the data type of every field contained.
For instance, CMR_LF_cases.csv includes the number of lymphedema and hydrocele
cases reported by community which is a numeric (integer) data type. We should create
a .CSVT file named CMR_LF_cases.csvt which would include the data type of the fields,
and store it together with the .CSV file.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 14
The .CSVT files needed for this practical are provided with the data tables
(“C:\…\Cameroon_project\Tables”).
The Joins tab in layer Properties allows you to join a loaded attribute table to a
loaded vector layer.
As key column you have to define a join layer, a join field and a target field.
In order to join the two files, you must ensure that both your data and your shapefile
share a field or column with common values, called a key. This is often a name or
ID code. In your data table, these identifiers must be unique, meaning one row per
name or ID. Joining works by adding fields from your data table to the shapefile’s
attribute table based on matching values found in the key columns.
Note: Use QGIS Browser to explore data tables of these files and find out the field with
common values. Also, you can look at the shapefile’s attribute table by right-clicking on
the layer name and selecting Open attribute table.
According to our data, we can join CMR_LF_cases.csv with the recently created
shapefile CMR_LF_villages_coordinates.shp using the recordID field.
To join data contained in CMR_LF_cases.csv to the shapefile
CMR_LF_villages_coordinates.shp you must:
• Add CMR_LF_cases.csv using Add Delimiter Text Layer button.
• Click Browse, find and open your .CSV files. Select the following options:: File
format - CSV (comma separated values); Record options – First record has field
names; and Geometric definitions – No geometry (attribute only table). Click to
import CMR_LF_cases.csv into QGIS.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 15
• At CMR_LF_villages_coordinates.shp, open the shapefile’s Properties by double-
clicking on the layer name.
• Select Joins tab in the shapefile’s Properties.
• Click the + button to add a new join.
• You will be prompted with the Add vector join tool.
• Join layer will be your “.CSV” data layer. In this case CMR_LF_cases.csv.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 16
• Join field is the key field to be joined on in your “.CSV” data. In this case
VillageID.
• Target field is the key field to be joined in your shapefile. In this case, select
VillageID. Target and Join field do not necessarily have to be the same.
• Click OK.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 17
• Open the shapefile’s ATTRIBUTE TABLE by right-clicking on the layer and
accessing layer properties of CMR_LF_villages_coordinates.shp and check that
your data have been properly joined.
• To make these joins permanent you must save a new copy.
• Right-click on CMR_LF_villages_coordinates.shp and select Save as.
• Choose a filename and location (“C:\...\Cameroon_project\Vector_layers") for
your new shapefile, and click OK. This new shapefile with the epidemiological
data will be called CMR_Community_LF_morbidity.shp
Shortcomings of QGIS’ join
function is it always mess
with the name of the
fields. We will rename
them back to what they
were originally!
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 18
• Remove CMR_LF_villages_coordinates.shp and the data table CMR_LF_cases.csv
from the Layers Panel.
We need to rename the fields back to their original names. One major shortcomings
of QGIS’ join function is prefixes the fields using the file’s name. Address this
problem by:
• Double-click on the CMR_Community_LF_morbidity layer and select the Field
tab. This section allows the user to apply edits to the field names.
• Click on the Pencil icon, double-click to on the field name to rename it
accordingly:
� Rename: ‘CMR_LF_c_1’ to ‘PoT’
� Rename: ‘CMR_LF_c_2’ to ‘period’
� Rename: ‘CMR_LF_c_3’ to ‘cases’
� Rename: ‘CMR_LF_c_4’ to ‘hydrocele’
� Rename: ‘CMR_LF_c_5’ to ‘hyd_status’
� Rename: ‘CMR_LF_c_6’ to ‘lymphedema’
� Rename: ‘CMR_LF_c_7’ to ‘lym_status’
� Delete: ‘CMR_LF_cas’ (we will delete this field as it’s redundant!)
• Delete the field ‘CMR_LF_cas’ as it is redundant. To remove it click on
‘CMR_LF_cas’ and click on the Delete field icon . Click OK once edits are
complete.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 19
Each edit made on a layer MUST be saved and updated. Click on the Pencil icon to
come out of the edit mode and click Save to complete edits.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 20
• This newly created shapefile now has both the geometry and the data included
and can be directly imported into our project. It should consist of at least four
files with the same name but different extensions (.SHP, .SHX, .DBF, .PRJ).
Note: These files MUST always remain together to enable the layer to function normally.
• Save the map project as Cameroon_practical_2.qgs, in the
“C:\...\Cameroon_project\Projects” folder. Select File in menu bar and Save
project as.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 21
3. Graph options to represent health outcomes
To take a closer look at the data.
• Right-click on CMR_Community_LF_morbidity.shp in the Layers Panel and select
Open attribute table.
• We may be interested in identifying potential cluster of morbidity cases
presumably related to lymphatic filariasis infection or geographic areas which
may concentrate more cases of hydrocele or lymphedema.
A way to get a feel for the data is to visualise them on the map. We will start by
looking at the number of lymphedema cases by community.
• Double-click on the CMR_Community_LF_morbidity.shp layer title to bring up
the LAYER PROPERTIES box. Click on the tab to open the Style section.
• Decide how you wish the attribute to be presented. As we’re interested in the
range of lymphedema cases, we want symbols that represent quantities of a
numeric attribute value.
• Click Graduated. You have the choice of displaying communities with graduated
colours or graduated symbol size.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 22
• In Column option select lymphedema as your value to be represented.
• Click Class and Mode to choose a classification method (i.e. Equal Interval) and
number of intervals or classes (i.e. 4). The histogram can help you choose the
most appropriate method.
• Use the Colour Ramp to colour the graduated variable, or manually colour each
interval by double-clicking on each rank.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 23
• Change the symbol size by progressively increasing the size with increasing
number of incident lymphedema. Double-click on each symbol and a a pop will
provide access to the different settings which could be modified.
• Once you’re happy with the settings of graduated symbols, click OK.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 24
Now, it’s your turn. Try to display the number of hydrocele cases reported in
Cameroon.
Note: you can manually change the range double-click on the Range column for each
interval and a pop up (ENTER CLASS BOUNDS) will enable you to change the lower and
upper values. If you would want to change the label of each range, double-click on Label
column. You can also change the legends by double-clicking on them.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 25
Question - Can you see any obvious patterns in the distribution of hydrocele and
lymphedema cases? Do any areas appear to have higher or lower risk? Are these at-
risk areas the same for hydrocele and lymphedema?
• Open File in menu bar and then click on Save Project.
4. Aggregating epidemiological data; choropleths maps
We are interested in identifying districts which may require special attention by the
LF control programme and health care services due to their high prevalence of LF-
related morbidity cases. It would be helpful to show the district-level prevalence of
lymphedema and hydrocele on our map.
There are at least two ways to do this. You can aggregate the prevalence data
collected at community level by district within your Excel spreadsheet or your
statistic software (i.e. SPSS, Stata, EpiInfo). For this, you need to collapse the data
by district using the PivotTable function in Excel which can be found under the
Insert tab2. Then, you can sum up the number of each type of LF-related morbidity
reported for those communities which are located within the same district. Finally
divide by the population estimated in the district at the time of the survey and
multiply by 10,000 in order to calculate the prevalence by 10,000 inhabitants.
It is also possible to estimate this prevalence within QGIS using the data management
tool called Join attributes by location. This tool enables us to calculate different
types of statistics to summarise data such as sum, mean, min, max and median based
on their location.
In this section, we have focused on this second approach.
Note: It is important to check out if we have some features (village) for which we
have NULL values for the outcomes we want to aggregate, in our case Hydrocele and
Lymphedema. These features should be deleted, otherwise the Join attributes by
location tool will crash out when running the process.
For this, click on at the top of the field you wanted to sort by and the features will
be sorted in ascending order.
2 More details in the practical Data Management for mapping (Practical 5) available at the website of the Global Atlas of Helminth Infections project under Training section (GIS basic training)
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 26
• Select the features which have NULL values in Hydrocele and Lymphedema fields.
For this hold the left button of your mouse down while you click on them.
• Click on the Toggle Editing Mode button. .
• Highlight the rows that have null values for hydrocele and lymphedema and click
on the Delete selected features tool.
• Click Toggle Editing Mode button to accept the changes undertaken on the
attribute table. Click on Save button.
• Under Vector option in the main menu bar, open the Join attributes by location
tool from the list of Data Management Tools.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 27
• Wenow have to set CMR_adm2.shp as Target vector layer and
CMR_Community_LF_morbidity.shp as Join vector layer. Select intersect under
Geometric predicate. Under Attribute summary select the option Take
summary of intersecting features - you will see the Statistics for summary
(comma separated) [optional] and a list: i.e. ‘sum, mean, min, max, median’.
In this instance, we are only interested in the SUM function. Keep this, and delete
the rest. The SUM option will calculate the overall sums for numeric features in
CMR_Community_LF_morbidity.shp which are “located within” districts defined
in CMR_adm2.shp.
• Save the output shapefile as CMR_LF_Districts_level.shp in the folder
“C:\...\Cameroon_project\Vector_layers.”
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 28
• Ensure only keep matching records is checked. This will make the new shapefile
keep only those features (districts) where morbidity data were recorded.
• The new shapefile created will be loaded into our map project.
• Open the shapefile’s ATTRIBUTE TABLE by right-clicking on the layer and
accessing layer properties of CMR_LF_Districts_level.shp.
You can see that the attribute values from the spatial join layer have been
aggregated and summed according to districts in CMR_adm2.shp. One of the
shortcomings of this aggregation tool is that the fields derived from the aggregation
are often assigned with inaccurate fields names.
For instance, the summed columns derived from the target fields in
CMR_Community_LF_morbidity layer (i.e. latitude, longitude, cases, hydrocele and
lymphedema) have been assigned the wrong following names:
Target field names Wrong field name assignment What the correct field name should be…
latitude sumlongitude sumlatitude
longitude sumcases sumlongitude
cases sumhydrocele sumcases
hydrocele sumlymphedema sumhydrocele
lymphedema sumlatitude sumlymphedema
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 29
• Correct the field names before removing any redundant columns. Click on
Processing tab, and select Toolbox. A window called Processing Toolbox will
appear.
• Use the search bar in the above panel of the Processing toolbox, and type
“Refactor fields”. The search should bring up the application - Refactor fields
This function that allows the user to make further modifications to the attributes
in the layers, as well as, changing fields names and to add or delete redundant
fields etc. Click on this function to open a menu.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 30
List of wrong field names Assigning correct field name
sumlongitude Latitude
sumcases Longitude
sumhydrocele Cases
sumlymphedema Hydrocele
sumlatitude Lymphedema
• In Refactor fields menu, select CMR_LF_Districts_level as the Input layer to be
modified. Correct the field names so that they are line with those in the above
table deleting any extraneous variables.
• ‘sumlatitude’ should be renamed to ‘Lymphedema’; ‘sumlymphedema’ to
‘Hydrocele’ and so on. Simply double-click on the fields to perform this
function.
• You can now delete the redundant ‘Latitude’ and ‘Longitude’ fields by clicking
the . Click OK, and new layer called “Refactored” will be added to the
Layer Panel. Save it as “CMF_LF_District_morbidity.shp”.
e.g. Rename ‘sumlatitude’
to ‘Lymphedema’
‘Count’ tells us the number
of LF community surveys
that was aggregated in a
district
‘Count’ tells us the number
of LF community surveys
that was aggregated in a
district
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 31
• Open File in menu bar and then click on Save Project.
You can now visualise the district-level aggregated data on the map (i.e. produce
the choropleths map). Follow the same procedures you used to display the
community-level attribute data on the map (see section 3). Think carefully about
the most appropriate way to categorise your data (i.e. graduated symbol).
Note: It is useful to look at a histogram of the distribution of your data when choosing your
classification. Alternatively, you might want to adjust the range manually the range as we
did in the previous section.
• To establish a range manually double-click on each Range, having previously
fixed the number of Classes (see section 3).
• To change the label names, double-click on each Legend (see section 3).
Note: It is useful to save the defined style so that it can be used with other fields (i.e.
Hydrocele) and other projects. The Save Style option appears in the Style dropdown. Assign
a filename for this new QGIS layer style file (.QML). To use the same symbology or style
you only have to deploy the same drop-down menu under the Style option and choose Load
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 32
Style button and select the .QML file. You will have to choose the new classification field
that you would want to plot.
• To distinguish districts which have not yet reported any morbidity cases, change
the Style symbol (colour) to CMR_adm2.shp (i.e. a solid grey) to create a
background contrasting with the surveyed districts.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 33
The last map lacked epidemiological rigorousness and interest because it only
displayed the total number of cases recorded by district in Cameroon. A more robust
measure for reporting LF morbidity would be either prevalence, or incidence rates.
Prevalence indicates the probability that a member of the population has a given
condition at a point in time. It is a way of assessing the overall burden of disease in
the population, so is a useful measure for administrators when assessing the need
for services or treatment facilities. Epidemiologists make a distinction between
point prevalence, the proportion of the population at a point in time and period
prevalence which includes all previous cases that still have the condition and are
still members of the population. In contrast to prevalence, incidence is a measure
of the occurrence of new cases of disease (or some other outcome) during a span of
time.
In our example, each district has an estimate of population in 2015. We can
therefore estimate the prevalence of hydrocele and lymphedema at the time of the
survey by dividing the number of morbidity cases by the estimated population. As
we presume the frequency will be quite small we can multiply the resulting
prevalence by 100,000. Thus, the reporting output is interpreted as prevalence (per
100,000 inhabitants).
To do this load the CMR_District_population.csv file in the EpiData folder and create
a join with the CMR_LF_District_morbidity.shp layer.
Remember that you must create a .CSVT file specifying the nature of the data
include in each field of the CMR_District_population table as we showed previously
in the section 2.2. This file has also been provided within the EpiData folder.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 34
To join data contained in CMR_District_population.csv to the shapefile
CMR_District_LF_morbidity.shp, you must:
• Add CMR_District_population.csv using Add Delimited Text File button.
• Click Browse and open the .CSV files and open them. Remember to check “No
geometry (attribute only table)” as this .CSV do not contain any coordinates.
• At CMR_District_LF_morbidity.shp, open the shapefile’s Properties by double-
clicking on the layer name.
• Select Joins tab in the shapefile’s Properties, and Click the “+” button to add a
new join.
• You will be prompted with the Add vector join tool.
• Join layer will be your “.CSV” data layer. In this
case CMR_District_population.csv. Join field is the key field to be joined on in
your “.CSV” data. In this case DistrictID.
• Target field is the key field to be joined in your shapefile. In this case, select
ID_2. Target and Join field do not necessarily have to be the same.
“Integer”, “String”, “Integer”
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 35
• Click OK.
• Open the shapefile’s ATTRIBUTE TABLE by right-clicking on the layer and
accessing layer properties of CMR_District_LF_morbidity.shp and check that your
data have been properly joined.
• To make these joins permanent you must save a new copy.
• Right-click on CMR_District_LF_morbidity.shp and select Save as. Choose a
filename and location (“C:\...\Cameroon_project\Vector_layers") for your new
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 36
shapefile, and click OK. This new shapefile may be called
CMR_District_LF_prevalence.
• Remove CMR_District_LF_morbidity.shp and the data table
District_population.csv from the Layers Panel.
• The new shapefile will have two new fields; CMR_Distri (which contain district
names) and CMR_Dist_1 (which include the estimated population in 2015). We
only need the population data, and so rename CMR_Dist_1 to Pop2015 AND delete
CMR_Distri. We learnt how to remove and rename fields in the section 2.2 (pages
19 and 20).
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 37
• Finally, we have to create calculated fields for the prevalence of both hydrocele
and lymphedema by 100,000 inhabitants. These are the outcomes we ultimately
want to display and analyse.
• For this, click on the Field Calculator and create a new field called PrevLYM
for the prevalence of lymphedema by district. Choose Decimal number (real) as
output field type and set an output field width of 10 and precision of 2.
• Use the Expression tab (left panel) to to calculate the prevalence of
lymphedema cases by 100,000 inhabitants using the
formula("Lymphedema"/"Pop2015")*100000, where Lymphedema is the
number of lymphedema cases and Pop2015 is the estimated population in 2015
for every district. Click OK to derive the prevalence estimates for PrevLYM.
• Repeat the above steps for prevalence of hydrocele by 100,000. Call the new
field PrevHYC.
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 38
• Then, click again on the Toggle Editing Mode button to accept the changes
undertaken on the attribute table. Click on Save button in the small window that
eventually pops up.
You can now visualise the district-level aggregated data on the map (i.e. produce
the choropleths map). Follow the same procedures you used to display the
community-level attribute data on the map (see section 3). Think carefully about
the most appropriate way to categorise your data (i.e. graduated symbol). Try to to
display both prevalence data the following settings:
- Graduated symbol
- Field: PrevLYM, PrevHYC
- Colour ramp: YlOrRd
- Mode: Natural Breaks (Jenks)
- Classes: 4
Modern Tools for NTDs Control Programmes July 2017
www.thiswormyworld.org | 39
Choropleth map produced for the prevalence of lymphedema (per 100,000 inhabitants)
Choropleth map produced for the prevalence of hydrocele (per 100,000 inhabitants)
Question - Can you see any obvious patterns in the prevalence of hydrocele and
lymphedema cases? Do any districts appear to have higher or lower risk? Are these
at-risk areas the same for hydrocele and lymphedema?
Compare these maps with those produced based on the number of morbidity cases.