Using EpiData & Epi Info for Windows1
-
Author
hanan-ahmed -
Category
Documents
-
view
48 -
download
2
Embed Size (px)
description
Transcript of Using EpiData & Epi Info for Windows1
-
Using EpiData & Epi-Info
for Windows
Training for Communicable Disease
Control in Local Authorities
Cardiff Council (Strategic Planning & Environment)
March 2007
-
Acknowledgements
i
Acknowledgements
2007 Cardiff Council (Strategic Planning & Environment).
This training guide was produced by Alastair Tomlinson to form part of the
Communicable Disease Lead Officer Training Programme, co-ordinated by the
Wales Centre for Health.
Please send enquiries relating to this training guide to:
Alastair Tomlinson, Chartered Environmental Health Practitioner
Team Leader (Health Improvement)
Public Protection Division
Room 134 City Hall
Cathays Park
Cardiff. CF10 3ND.
029 2087 1845
About the software
Epi Info is a public domain software
package designed for the global community
of public health practitioners and
researchers. It provides for easy form and
database construction, data entry, and analysis with epidemiologic statistics, maps,
and graphs. Epi Info can be downloaded from http://www.cdc.gov/epiinfo
EpiData Software has developed from
securing the principles of Epi Info V6 for DOS
to an independent documentation oriented
system. EpiData can be downloaded from
http://www.epidata.dk
Conventions used in this training guide
Text to be entered on screen is shown in this font.
Directions to drop-down menu items are shown in bold type, e.g. File > SaveFile > SaveFile > SaveFile > Save.
-
Table of Contents
ii
Table of Contents
Acknowledgements...............................................................................i
Table of Contents .................................................................................. ii
Aim and Objectives ..............................................................................1
Outbreak Scenario................................................................................2
Creating a Questionnaire using EpiData.............................................3
Entering Data using EpiData ...............................................................17
Outbreak Investigation using Epi Info Analysis.................................19
Using Analysis with routine COSURV data.........................................35
Other capabilities of EpiData and Epi Info ........................................39
Appendix I Comparison of Epi Info & EpiData ...............................44
Appendix II Contents of course CD-ROM.......................................48
Appendix III Further information & resources.................................49
Appendix IV Worksheet for 2x2 table results .................................52
Appendix V Check code example ................................................53
-
Aim and Objectives
1
Aim and Objectives
Aim of the training
To provide training on the practical use of Epi Info and EpiData in communicable
disease control, with particular reference to:
An outbreak situation
Analysis of routine Cosurv surveillance data
Objectives
By the end of the training delegates will:
Have an understanding of EpiData and Epi-Info for Windows and their component elements
Be able to use EpiData to design a data entry form for a questionnaire in an outbreak situation
Be able to use EpiData to enter outbreak investigation data into a record suitable for analysis in Epi Info for Windows
Be able to use Analysis to obtain useful statistical and epidemiological information from an EpiData / Epi-Info for Windows database for outbreak
investigation purposes
Be able to use Analysis to import routine Cosurv surveillance data into Epi-Info for Windows, and obtain useful statistical and epidemiological information
-
Outbreak Scenario
2
Outbreak Scenario
On the 17th August, you receive a telephone call from a gentleman who reports that
he and several others who attended a buffet following a funeral were suffering
symptoms of food poisoning. The buffet, provided by an external caterer, was held
at a local club following the funeral, and mourners arrived at the club at around
3.00 pm on 14th August. Food left over from the buffet was placed in the main bar
areas of the club for club members to consume later that day.
Initial activity involves obtaining of a list of people who attended the funeral and
others who may have eaten the food provided for the funeral buffet. A list of food
served at the buffet has been obtained from the caterer, and cross-referenced with
initial information gathered from cases. Indications are that around 70-80 people
attended the funeral, and approximately 40-50 of these people may have
experienced symptoms consistent with food poisoning.
Table Table Table Table 1111 ---- List of foods served at the buffet List of foods served at the buffet List of foods served at the buffet List of foods served at the buffet
An Outbreak Control Team has been convened, and has decided to undertake a
cohort study to investigate the outbreak. The OCT assigns you with the following
tasks:
Establish the case definition
Develop a structured questionnaire to investigate the outbreak.
Enter questionnaire data into an appropriate computer database
Analyse the data to describe outbreak and identify exposures associated with illness
This training uses this scenario to introduce the various functions of EpiData and
Epi Info for Windows, and their particular use in outbreak investigation.
sausage rolls chicken rolls salmon sandwiches
pickled onions ham sandwiches egg rolls
corned beef sandwiches ham rolls egg sandwiches
chicken nuggets chicken sandwiches cheese & biscuits
crisps gateaux pasties
-
Creating a Questionnaire using EpiData
3
Creating a Questionnaire using EpiData
Basic Questionnaire Creation
A screenshot of the main EpiData screen is shown.
We want to create a new questionnaire, so select Define DataDefine DataDefine DataDefine Data > New > New > New > New .QES File.QES File.QES File.QES File (It is
also possible to edit an existing questionnaire, by using Define Data > Open .QES Define Data > Open .QES Define Data > Open .QES Define Data > Open .QES
FileFileFileFile). This creates an empty text file into which we can enter information.
On creation of the file, the following toolbar
option also becomes available.
Clicking this button brings up the
Field pick list dialog.
This dialog makes it easy to create different
kinds of fields. As an example, we will create
one or two of the basic field types in our
questionnaire.
-
Creating a Questionnaire using EpiData
4
First, type an appropriate heading into the first line of your questionnaire, such as
Lead Officer Training March 2007.
Then, on the row below, enter Surname: Leave the cursor flashing after the colon. If
the Field pick list is not already showing, click the button to bring it on screen.
Select the Text tab from the pick list. This then gives a short option list of text,
upper-case text, and encryption field. For now well accept the default text
option. Set the field length to 20, then click the Insert button. EpiData inserts a
series of underscore characters after the Surname: label. Underscore characters _
are how EpiData denotes plain text fields. The number of underscores indicates the
maximum length of the field.
On the next line, type Forename: Using the field pick list again, insert another text
field of 15 characters.
Now lets try a different field type dates. On
the next line, type Date of birth: Select the
Date tab from the field pick list. This
presents two lists of options general date
fields on the left, and automatic dates on
the right. General date fields are formatted in
three different ways. For most of us in
Europe, the format is most
natural, so select that. Click the Insert button,
and EpiData inserts the relevant date format
field type.
On the next line, type Gender: and insert a single character Uppercase text field.
EpiData inserts a code, which denotes an uppercase field one character long.
Later, well restrict the entries in this field to either M (male), F (female) or U
(unknown). Below this, add the label Occupation: and insert another 20 character
text field.
We need to be able to record interviewee address details.
First, lets create a house number field. On
the next line type House number:, and then
select the Numeric tab on the field pick list.
Select 3 digits before the decimal point, and
0 digits after it, then click the Insert button.
EpiData inserts ### after the House
number: label. # characters are how EpiData
denotes numeric fields, and again the
number of # characters indicates the
maximum size of the number. (Numbers with
a decimal point appear as ##.##).
-
Creating a Questionnaire using EpiData
5
Add another text field for House name (30 characters), and three more fields for
Street name (30 characters), District (20 characters) and Town (20 characters).
Then add another label for Postcode: and this time add an Uppercase text field of
8 characters. EpiData inserts uppercase fields as with the number of
spaces determining the total length of the field.
Finally, lets add a field for telephone details. Initially it seems like a good idea to
create this as a numeric field, but in doing this we wouldnt be able to record any
text details (such as ext. etc), and its unlikely we would ever want to order our data
by telephone number, so its probably easier to simply create a text field of around
15-20 characters. If you prefer you can create two fields, one for home and one for
other (e.g. work, mobile).
Weve now created the fields for the basic contact details of the interviewee. Before
proceeding onto further work, lets save what weve done so far. Click the Save
button on the toolbar (or select File > SaveFile > SaveFile > SaveFile > Save). Enter an appropriate filename and
location in the dialog box, and click Save.
We can also take a sneak preview of how the questionnaire will appear for those
entering data. Before doing that, lets set a couple of options that determine how
our fields will be named. Click File > OptionsFile > OptionsFile > OptionsFile > Options, and then select the Create data file
tab.
Generally, the automatic field names options is best, since it will try to make sense
of the question (i.e. the text immediately to the left of the field), and will ignore
common words like who, did, or etc. Sometimes it may be preferable to select
the First word in question option. For this exercise, select automatic field names.
-
Creating a Questionnaire using EpiData
6
Later in the module well look at how we can specifically tailor the fieldnames that
EpiData will generate in the data record files. Fieldnames have a maximum length
of 10 characters.
The decision on letter case of field names is mainly one of personal preference
the authors preference is to use upper-case for field names to make them stand
out.
Once you have made your option selections and clicked OK, click the Preview
Data Form button (or select Make Data File > Preview Data FormMake Data File > Preview Data FormMake Data File > Preview Data FormMake Data File > Preview Data Form).
A new tab on the main display will appear, showing the questionnaire with data
entry fields in the relevant places. You can select File > Print Data FFile > Print Data FFile > Print Data FFile > Print Data Formormormorm to get an
idea of how the questionnaire will appear on paper for completion by interviewers.
You can even practice entering data into the form to check that things appear as
you expect them to. For now, its useful just to see how things are going to be
presented. To close the form, select File > Close formFile > Close formFile > Close formFile > Close form, or press CTRL F4.
Currently our questionnaire lets us record interviewees personal details, but not a
lot else. Lets change that by adding some details specific to the event in our
scenario. The first thing to establish is whether the person actually attended the
funeral (they may have been exposed to the food under suspicion at the club bar
after the event).
This introduces us to another important field type: the Boolean field. This is simply a
Yes/No field, but this type of data is often crucial in outbreak investigation, since it
allows us to construct 2x2 tables to assess
relative risk for various exposures.
On a new line at the end of the questionnaire,
type Did you attend the funeral? Select
the Other tab on the field pick list, and select
the Boolean (yes/no) option (the Length field
on the dialog becomes greyed out as it isnt
relevant). Click the Insert button. EpiData
inserts a code, which is how it denotes a
Boolean field.
For clarity, lets also include a question on whether the person attended the
members club we dont know if people from the members club have been
affected, or whether there is crossover with the cohort of funeral attendees, but it
may be important to be able to distinguish between them later. On a new line, type
Did you attend the members club? and insert another Boolean field.
-
Creating a Questionnaire using EpiData
7
Well also use Boolean fields to record whether or not the person was ill, and what
their symptoms were. Add the relevant lines and fields to the questionnaire for the
following fields:
Sometimes people may have described themselves as ill, but do not meet the
actual case definition, so include an additional Case definition met? Boolean
field as well.
Another key set of data to record for those who have suffered symptoms is their
onset date/time, and the duration of symptoms. Go back up the questionnaire, and
add a couple of extra lines after Were you ill? but before the list of symptoms.
Type Onset date: and then insert a general date field. On the next line type Onset
time: and then insert a numeric field with 2 digits before and 2 digits after the
decimal point (##.##). EpiData records time-related information in this numeric,
with the digits before the point representing hours and those after minutes. The 24-
hour clock is used. Then add another field for duration of symptoms 2 digits in
size, intended to be measured in days, and a similar 3 digit field for incubation
period, this time intended to be measured in hours.
In a full outbreak we would probably also include further questions about whether
the person was hospitalised, whether specimens had been submitted, and so on,
together with details of any other household contacts, and maybe other data to
indicate severity of symptoms, but for the purposes of this exercise well skip these
elements.
The final major part of the questionnaire is the recording of relevant exposures.
Comparison of the rates of illness in those exposed and not exposed will enable us
to assess which exposures are most likely to be implicated in the outbreak. For the
purpose of this exercise, well assume that the OCT has decided to focus attention
on the foods consumed at the buffet. In a real life situation, it may be more
appropriate to retain an open mind and include other potentially relevant exposures
that may explain some or all of the illness.
NB avoid use of the ampersand & symbol in questionnaires, since it tends to
cause unexpected display results.
Were you ill?
Diarrhoea
Vomiting
Abdominal pain
Nausea
Pyrexia
Headache
Other aches
Other symptoms (with a separate text field for description)
-
Creating a Questionnaire using EpiData
8
Add a list of Boolean fields for the relevant food items the table from the outbreak
scenario is reproduced below.
Again, in a real situation we might add additional information on quantity eaten or
portion size to investigate the possibility of a dose-response relationship, but for this
exercise we will keep the exposures simple yes/no answers. Finally, add a general
comments text field of around 50 characters to capture any other relevant
information (e.g. perhaps interview was carried out with parent, relative, interpreter
etc).
The only remaining items to add to the questionnaire are some basic administration
fields. We need to have some way of identifying each record as unique, and EpiData
provides a specific field type for this purpose. Its quite useful for this to be easily
seen, so at the very top of the questionnaire, type Record no.: and select the
Other tab on the Field pick list. Select the Auto ID number field type and click
Insert. EpiData inserts an code, which will include an automatically
incrementing number for each new record added. It is also useful to record the date
of interview and the name of the interviewer (initials usually sufficient), so add
appropriate date and uppercase fields for this purpose.
We can also add another date field, the date of entry into EpiData. Again EpiData
can automatically insert this for us select the Date tab on the Field pick list and
select the code from the right-hand list.
The basic questionnaire is now complete. At the moment it isnt particularly easy to
read and this may make it more difficult for interviewers to complete the
questionnaire, and harder for data entry staff to accurately and quickly enter the
results. EpiData includes an Align fields option to help address this problem. Place
the cursor in one of the longer questions/labels, such as Corned beef sandwiches,
or Did you attend the members club? Then select Edit > Align FieldsEdit > Align FieldsEdit > Align FieldsEdit > Align Fields. EpiData
will realign each line of the questionnaire so that the fields appear in a column
making it easier for both interviewers and data entry staff.
Save the questionnaire before we proceed any further.
sausage rolls chicken rolls salmon sandwiches
pickled onions ham sandwiches egg rolls
corned beef sandwiches ham rolls egg sandwiches
chicken nuggets chicken sandwiches cheese & biscuits
crisps gateaux pasties
-
Creating a Questionnaire using EpiData
9
Advanced questionnaire design
In this section well cover some of the techniques and functions provided by
EpiData to help save time on data entry, and to ensure that accurate and reliable
data is entered.
Closer control over fieldnames
To start with, lets look at how our questionnaire looks in data entry mode. Select the
Preview Data Form button to display the data form. Use the TTTTabababab key to cycle through
the fields in the questionnaire. Note that for each field, the fieldname appears in
the status bar at the bottom left of the screen, and next to it information on the type
of data that can be entered (e.g. Alpha: all entries allowed, Date (dmy): 0-
9 and / allowed, Boolean: Y,1,N,0 allowed etc.).
As you cycle through the fields, note the fieldnames that EpiData has automatically
assigned to each field. In the majority of cases, they make perfect sense, but there
are a few where the fieldname doesnt intuitively indicate what the contents of the
field are. This can be particularly important where data analysis is being undertaken
by someone who wasnt involved in the original drafting of the questionnaire (quite
conceivable in a large outbreak with several partner organisations) the last thing
that they need is to be unsure what a relevant item of data actually means.
Fortunately, EpiData allows questionnaire designers greater control over fieldname
selection where necessary.
The default fieldname selected in each case is up to 10 letters long, based on the
text that appears immediately to the left of the field but ignoring common words
such as did or the. As an example, the fieldname for Did you attend the
funeral? is YOUATTENDF, for Were you ill? WEREYOUILL, and for Pickled
onions - PICKLEDONI.
For these and some other fields, we would like to tailor the fieldname to make it a
bit more meaningful. The chief way of doing this is by the use of braces { }, also
known as curly brackets. When automatically selecting fieldnames, EpiData uses
text enclosed in braces in preference to normal text. If the question is {my} first
{field} then the field name will be MYFIELD. Braces offer a powerful method of
defining meaningful field names.
Lets look at a simple example the Were you ill? question. Although the
fieldname does make sense, it doesnt really need to be that long simply the term
ILL would be enough. By putting braces around the word ill in the questionnaire, we
force EpiData to call the field ILL. Modify the text in the questionnaire so that it
looks like this:
Were you {ill}?
Now click Preview Data Form and put the cursor into the Were you ill? field. You
can see from the status bar in the bottom corner that this fieldname is now simply
ILL. Once youve satisfied yourself of this, close the preview (CTRL F4).
-
Creating a Questionnaire using EpiData
10
This is a fairly simple example, but the EpiData capabilities are more sophisticated
than that. EpiData can pull text from more than one set of braces together to create
a fieldname. As another example, consider the Did you attend the funeral/members
club questions. Presently these have fieldnames of YOUATTENDF and YOUATTENDM
respectively not terribly meaningful. But by changing the text in the questionnaire
as follows:
Did you {attend} the {fun}eral? Did you {attend} the members {club}?
we produce fieldnames of ATTENDFUN and ATTENDCLUB, which are far more
intuitive. Check for yourself by clicking Preview Data Form. Notice also that the
braces do not appear on the entry form (and wont appear on a printout either), so it
doesnt affect the ease of use for interviewers and data entry staff.
Go through the table below to update the questions as indicated to generate more
meaningful fieldnames:
Question Current fieldname Modification New fieldname
Date of birth DATEBIRTH {D}ate {o}f {b]irth DOB
Abdominal pain ABDOMINALP {Abdom}inal {pain} ABDOMPAIN
Case definition met? CASEDEFINI {Case def}inition {met}? CASEDEFMET
Sausage rolls SAUSAGEROL {Saus}age {rolls} SAUSROLLS
Pickled onions PICKLEDONI Pickled {onions} ONIONS
Chicken nuggets CHICKENNUG Chicken {nuggets} NUGGETS
Chicken rolls CHICKENROL {Chick}en {rolls} CHICKROLLS
Chicken sandwiches CHICKENSAN {Chick}en {sand}wiches CHICKSAND
Click Preview Data Form to confirm the changes that have been made. Once youre
finished, close the preview and save your modified questionnaire.
Controlling data entry and skipping questions
For some fields it can be useful to place restrictions on the range of data that can
be entered for example the Gender field can only have three sensible values
(male, female, unknown) and it also makes sense to limit the Onset time field to the
valid times represented in the 24 hour clock. There also some fields that can be
filled through calculation for example, age at time of interview, incubation period,
perhaps even case definition in some circumstances which can help with data
accuracy and consistency. Finally data entry can be significantly quicker by using
skips so that the data entry operative doesnt have to cycle through irrelevant
fields (such as symptom fields for an interviewee who wasnt ill).
These functions are all achieved by what EpiData calls checks. Checks are usually
added once a data file has been created based on the layout in a questionnaire.
-
Creating a Questionnaire using EpiData
11
One of the things well do is add a simple calculation to work out a persons age in
years at the time of the interview. Before we create the data file, add a new numeric
field of 2 digits to hold the calculated age. Place it below the date of birth question.
Now our questionnaire has all the fields we need, so we can create the data file that
EpiData will actually store the records in once they are entered. Make sure you have
saved the most recent changes to the questionnaire, then click MakeMakeMakeMake Data File > Data File > Data File > Data File >
Make Data FileMake Data FileMake Data FileMake Data File to display the following dialog:
The .QES file is the file holding the questionnaire details (.QES is the extension that
EpiData uses for all questionnaire files). The currently active file should be
displayed in this box. The data file will be created according to the details shown in
the lower box, and the default setting is the same name as the questionnaire file,
but with a .REC extension, which is the extension used by EpiData for data record
files. If the settings in the dialog look appropriate, click OK.
Youll then be presented with another dialog asking you to give a label to the data
file:
Give the data file an appropriate label (e.g. codename or incident number of the
outbreak).
You should then get a message saying that the data file has been created. EpiData
has also closed the original questionnaire, so we now have a blank screen. Click the
ChecksChecksChecksChecks button, which will then ask you to open a data file select the file we just
created above.
EpiData will then open up the new data form unsurprisingly this has a similar
appearance to that of the Preview Data Form. However, we are currently in
Add/revise checks mode, so it isnt possible to enter data. Youll also notice that
-
Creating a Questionnaire using EpiData
12
the Check file dialog has appeared. The status bar of the dialog shows the name of
the check file that check details will be stored in. This will have the same name as
the REC file, but with a .CHK extension.
The dialog shows the current field that checks
may be added to. Some of the basic check
settings are then shown in the lower part of the
dialog. Well briefly summarise all these options
before looking at some specific examples.
Range,Range,Range,Range, Legal Legal Legal Legal allows you to restrict the range of
values that can be entered in a particular field. A
range is defined by typing the minimum value and
the maximum value separated by a hyphen.
Typing 2-5 defines that only the numbers 2,3,4 or
5 can be entered in the current field. If only a
maximum value is wanted then use -INF (minus
infinity) as the minimum value. If only a minimum
value is wanted then use INF (infinity) as the
maximum value. Typing -INF-5 defines all numbers less than or equal to 5 as legal
entries in the current field. Typing 0-INF defines all positive numbers as legal
entries. Legal values are defined by typing all the accepted values separated by
spaces or commas. Typing 4,6,8,10 defines that only the numbers 4,6,8 or 10 can
be entered in the current field.
JumpsJumpsJumpsJumps are available to help data entry flow. As an example, if a person has
indicated that they suffered illness, data entry is likely to cover the symptoms that
they suffered. If they werent ill, it makes more sense for data entry to skip past
those fields and onto the next section. Jumps are entered by specifying the value,
entering a greater-than-sign (>) and specifying the name of the field to jump to. For
example, in relation to the ILL field, entering Y>ONSETDATE,N>SAUSROLLS would set
up the necessary jumps as suggested in the example. Well look at exactly how we
set this up a little later (since we might also want to automatically set the Case
Definition Met field to No if the person has not been ill). Its also possible to use
AUTOJUMP followed by the fieldname to make the skip take place regardless of the
value inserted in the field.
Must enterMust enterMust enterMust enter is quite simple the Yes/No value defines whether or not a value must
be entered for the question. There will be some fields for which this is useful (e.g.
basic personal details like name, perhaps date of birth and address as well,
together with key data points such as whether they were ill and/or meet the case
definition). There may be other fields where being able to leave the field blank is
useful for example if someone does not remember whether or not they ate Crisps,
it is better that the field is left blank rather than assuming a No answer, which
could distort results.
RepeatRepeatRepeatRepeat if Yes is entered in this rule then the data entered in the previous record
will be repeated in the next new record. Repeated data can be changed during data
entry. This function can save a lot of typing if your forms contain data that changes
-
Creating a Questionnaire using EpiData
13
only rarely in a particular batch of forms (e.g. reporting forms in a surveillance
system). It is probably of less use in an outbreak situation.
Value labelsValue labelsValue labelsValue labels are a set of values combined with text items that explain the meaning
of each value. For example, a field is created to enter information on the sex of the
informant. It is decided that a value of 1 in the field means that the informant is
male and that a value of 2 means the informant is female. If a value label is defined
then a translation table can be shown during data entry if the user presses [F9] (or
the [+] key on the numeric keypad). The value labels in this example would be:
1 Male 2 Female
It is important not to confuse value labels with ranges/legal values although both
place restrictions on the data that can be entered into the field. Decide what you
want and select the appropriate option you may not want to have to go to the
trouble of setting value labels if a simple range is all thats required, and of course
in some situations value labels arent relevant.
These are the basic checks that can be attached to a field through the check file
dialog. In addition, you can click Edit to open the check file editor for the current
field and enter check code manually. This is useful for calculating field values based
on what has already been entered, and for more complicated checks (which are
largely outside the remit of this training but covered in detail in the EpiData help
files).
Lets work through some examples.
Select the ONSETTIME field on the form (or from the dropdown list in the check
dialog). Then type 00.00-23.59 into the Range, LegalRange, LegalRange, LegalRange, Legal box on the dialog. This sets
the range to those relevant to a time setting. Lets see how this actually appears in
the check code itself click the Edit button. This brings up another screen showing
the actual check code relevant to this field.
We can see that the code
starts with the fieldname, and
then the code for the range is
included. The word END
indicates the end of the checks
for this field. As well go on to
see, one field can have several
different types of checks in the
code. Click Cancel to close this
dialog without making any
further changes.
Now select the GENDER field. This time well create Value labels to restrict the
options for entry (and to give a guide to data entry staff). Select the Value label
dropdown list youll see that there are some predefined value labels, including
one for sex. However, the predefined sex labels are based on entering a single
-
Creating a Questionnaire using EpiData
14
digit number, and our gender field is an uppercase text field. So lets instead create
our own value label list. Select the [none] option from the dropdown list, and then
click the + button next to the list.
The edit checks screen appears, with the following text showing:
LABEL Label_GENDER END
We then enter the legal values and relevant labels as follows:
LABEL Label_GENDER M Male F Female U Unknown END
The indenting of the text setting the labels is optional, but makes the code easier to
read. Click Accept and CloseAccept and CloseAccept and CloseAccept and Close to close the window. The value label list now shows
label_gender. This label list can be re-used for other fields if desired perhaps not
so useful in the case of gender, but if for example you wanted to record details of
portion size in relation to each food consumed, you could define one list of value
labels (e.g. small, medium, large) and apply that to each portion size field.
There are a few fields that we would like to be entered for every questionnaire for
example, if we do not know if the person was ill or meets the case definition, it is
difficult to draw any conclusions from any other information they have given us. So
we need to make sure that these fields are set to must enter. Select the ILL field
and change the Must enter option to Yes. Repeat this process for the CASEDEFMET,
ATTENDFUN and ATTENDCLUB fields. Initially it can be tempting to set this option for
most of the fields, but not all fields will be relevant to all interviewees (e.g.
ONSETDATE is only relevant for those who have been ill) and the blank field option
(indicating missing or unknown data) can be important in relation to exposures.
So far weve covered the use of ranges, value labels and must enter checks. Next,
lets consider the use of jumps. Previously we considered that this might be useful
in controlling data entry flow after the Were you ill? question. Select the ILL field.
There are two options for the contents of the field after entry Y or N. (By setting the
Must enter property to Yes, the empty option is not available). Type the following
into the Jumps option:
Y>ONSETDATE
This sets the flow so that the next field selected after a Y is entered will be
ONSETDATE.
Now we need to enter the details for the N option add a comma after the text that
is already there, then type:
N>
-
Creating a Questionnaire using EpiData
15
Instead of typing the name of the appropriate field to jump to, you can also select it
on the screen using the mouse do this now by clicking on the Sausage rolls
field. EpiData automatically inserts the relevant fieldname (SAUSROLLS) into the
Jumps option.
Before we move on, lets look at how
this code looks in the editor. Click the
Edit button to bring up the check code
editor. We can see that the JUMPS
options are laid out line by line, and
MUSTENTER follows it, showing how
more than one check can be included
in relation to one particular field. Even
so, all this was created just by using the
dialog box.
Now lets look at calculating one field based on the information entered into one or
more other fields. As an example, well calculate the age of the respondent at the
time of the event, based on the date of the event (a value we will provide in the
code). Since this is a calculation that will be run once the Date of Birth details have
been entered, we actually need to put the relevant code into that field, so select the
DOB field. We cant create the calculation using the dialog options, so instead click
Edit to bring up the code editor.
The first thing to do is tell EpiData that we want the commands to run after data
entry into the DOB field has finished. We do this using the AFTER ENTRY END code
block, as follows:
DOB AFTER ENTRY END END
Now we can enter the actual code to do the calculation, in between AFTER ENTRY
and the first END command. First, we need to define the date of the event:
DEFINE dateofevent dateofevent = 14/08/1998
This defines a temporary variable that holds the date of the event doing things
this way makes the final formula easier to understand. Now we add the actual
calculation that assigns the age of the person to the AGE field.
AGE = trunc(int(dateofevent - DOB)/365.25)
This might seem a little complicated, but by taking it apart it is easier to understand:
1. First we take the difference in days between the event date and DOB: dateofevent - DOB
-
Creating a Questionnaire using EpiData
16
2. We convert that difference (which EpiData is still treating as a date) into an
integer, using the int function: int(dateofevent - DOB)
3. Convert the result in days to number of years, by dividing by 365.25: int(dateofevent - DOB ) / 365.25
4. Its likely that the result of this calculation isnt going to be a round number, so
we use the trunc function to round the result down to the persons age in
years: trunc(int(dateofevent - DOB) / 365.25)
5. Finally we assign the result of the calculation to the AGE data field: AGE = trunc(int(dateofevent - DOB) / 365.25)
One other thing to do since we are calculating the AGE field, we dont need the
data entry form to actually include that field, so we can skip it and go straight to the
Gender field. We can use the Jumps section of the check dialog to do this, so
Accept and Close the code edits that you have made for the calculation and return
to the check dialog. Because we want to jump straight to the Gender field
regardless of the value entered in the DOB field, we use the AUTOJUMP term, as
follows:
AUTOJUMP GENDER
Thats all the changes we need to make, so click Save and then Close on the check
dialog.
Hopefully this makes some sense, and you can follow how check code can be used
to calculate data for a particular field. If it doesnt, or seems too complicated, dont
worry too much. Knowing how to use the finer points of calculations and check code
is not essential to using EpiData for outbreak investigation but it does open up
some of the power of the program in controlling data entry and consistency, and
saving time.
On the other hand, if this has piqued your interest in using check code for running
calculations and controlling data entry, much more information on how to do this
can be found in the EpiData help files. EpiData follows largely the same check code
rules as Epi Info 6 (the DOS version of Epi Info) so if you have access to old check
code programs used in Epi Info 6, they may still work in EpiData (perhaps with some
minor tweaks).
For now, weve done enough to create a questionnaire to investigate this outbreak,
with some basic checks and calculations in place to help data entry. In the next
section, well look briefly at how we actually enter data into our EpiData data file.
-
Entering Data using EpiData
17
Entering Data using EpiData
From the main EpiData screen, select the Enter DataEnter DataEnter DataEnter Data button (close any open forms
first if necessary). Youll be asked to select a data file choose the data (.REC) file
that you created earlier.
The data entry form that you are probably familiar with by now should appear. This
time you can enter data for real! Note also that the status bar at the bottom of the
screen has some additional buttons for navigating around records in the file.
The table below includes fictional data for three sample records have a go at
entering them into the form. Its good to get into the habit of using the EnterEnterEnterEnter and/or
TabTabTabTab keys to move between the fields, rather than clicking with the mouse because
check code that has been set to run before or after entry of data into a particular
field will not be run if the mouse is used.
As you go through the data entry, note how the check code we included earlier is
working inserting calculated results into the AGE field, jumping fields according to
the data that has been entered, requiring data to be entered into a particular field.
At the end of each record, youll be asked if you want to save the record to disk
click Yes.
Field Record 1 Record 2 Record 3
Interview date 20/08/1998 21/08/1998 24/08/1998
Interviewer AGT DJG KJB
Surname Jones Dickens Jenkins
Forename Stephen Charles Hannah
Date of birth 24/10/1943 12/12/1922 13/10/1992
Gender M M F
Occupation Teacher Retired Schoolchild
House number 24 745
House name Ty Gwyn
Street name Gelligaer Street Mill Lane Newport Road
District Cathays Lisvane Rumney
Town Cardiff Cardiff Cardiff
Postcode CF24 4LA
Home tel. 029 2067 8765 029 2045 3234
Other tel. 07796 423659
Did you attend the
funeral?
Yes Yes Yes
Did you attend the
members club?
No No No
Were you ill? Yes No Yes
Onset date 15/08/1998 16/08/1998
Onset time 13.00 01.00
-
Entering Data using EpiData
18
Duration (days) 3 4
Inc. period (hrs) 22 34
Diarrhoea Y Y
Vomiting Y N
Abdominal pain Y Y
Nausea Y N
Pyrexia N Y
Headache N Y
Other aches N N
Other symptoms Y N
Other symptoms
description Fainted
Case definition met? Y N Y
Sausage rolls Y N Y
Salmon sandwiches N Y N
Pickled onions N Y Y
Corned beef sandwiches N N N
Chicken nuggets Y N Y
Chicken rolls N Y Y
Chicken sandwiches N Y N
Ham sandwiches Y Y N
Ham rolls N N N
Egg rolls Y N Y
Egg sandwiches Y N Y
Pasties Y Y N
Crisps N N Y
Gateaux N Y Y
Cheese & biscuits Y N N
Comments Interview with mother
Basic data entry is as simple as that really.
The only other thing well dwell on here is
navigating between existing records. You can use the additional buttons at the
bottom of the status bar, which show the current record, total number of records, as
well as having buttons for creating a new record and deleting the current record.
When you delete a record, the record is just marked as deleted with the word DEL
in the status bar (and therefore isnt included in any future analysis). However you
can undelete the record by simply clicking the delete button again (or using the
option in the Goto menu).
Now that weve seen how questionnaires can be designed, and data entered using
EpiData, lets look at how we analyse the data using the data analysis tools in Epi
Info for Windows.
-
Outbreak Investigation using Epi Info Analysis
19
Outbreak Investigation using Epi Info Analysis
For this section of the course, well move to using Epi Info for Windows, and
specifically the data analysis elements of the software.1
Epi Info for Windows is based around the idea on working on projects, which are
actually based around the Microsoft Access file format. Epi Info for Windows
provides a full package for designing questionnaires, entering data and carrying out
analysis, but because EpiDatas questionnaire design tools are quicker and easier
to use, we made use of that software instead. So we need to import our EpiData
data file (which is stored in REC format, the same format used by Epi Info 6) into a
format that is usable in Epi Info for Windows.
The first thing we need to do is create an Epi Info project that we can import the
data into. We actually do this by starting the process of designing a new
questionnaire. Using the shortcut on the desktop, or via the Start menu, open the
main Epi Info for Windows menu screen. Click the Make View button to open the
Make/Edit View program. The program starts with a blank screen, so create a new
project by selecting File > NewFile > NewFile > NewFile > New. The Create or Open Project dialog appears:
Browse to the directory containing your working files and type in an appropriate file
name, then click Open (which will create the project file). Youll then be asked to
name the new View that you are creating (view is the term Epi Info uses for a
questionnaire design & data table) as we dont want to create a new
1 Elements of this section of the training have been taken from Introduction to Epi Info for Windows by
Andrew G Dean, available at: http://www.epiinformatics.com/Resources.htm
-
Outbreak Investigation using Epi Info Analysis
20
questionnaire, just click Cancel. Then select File > ExitFile > ExitFile > ExitFile > Exit to return to the main Epi Info
menu.
Now we can import our data into the Epi Info Analysis program. To run the Analysis
element, click the ANALYZE DATA button on the main menu screen. The Analysis
program will then open.
Lets take a quick tour of what the program shows. All the main analysis commands
are shown in the tree view on the left. Clicking on a command will bring up a dialog
that places the command in appropriate form in the program editor at the bottom of
the screen. Results appear as web pages in the Output window, a simplified version
of the Microsoft Internet Explorer browser.
READing data into Analysis
For this section of the training well use some real (anonymous) outbreak data that
broadly matches the scenario weve been working with. Before we do that, though,
well import the data we entered into the questionnaire we created earlier, just to
confirm that it all worked! The first thing we need to do is READ that data into
Analysis. The data is in the REC file format used by Epi Info v6 and EpiData.
Click on the READ command. A dialog box appears so that you can choose a
database and a view. Click the button called CHANGE PROJECT and then use the
-
Outbreak Investigation using Epi Info Analysis
21
dialog that pops up to find the project file you created a moment ago. Once youve
found the file, select it and click the Open button to return to the main READ dialog.
Since we created a brand new project, no views or data tables appear in the list.
Now we need to import our EpiData REC file. On the Data Formats dropdown list,
select Epi6 (which is the same format as EpiData uses). Then, click the button with
three dots next to the Data Source text box (which will now be empty). Select the
REC file we created earlier from the dialog, and select Open. Youll now return to the
READ dialog, which should show something like the screenshot below:
Check that youve selected the right file, then click OK. Epi Info imports the data
table into Analysis. Youll notice that some text will appear in the Analysis Output
window detailing the current view (Epi Infos term for a data table), number of
records and the current date and time. Youll also notice that a command appears
in the Program Editor every time you carry out a task/function the relevant line of
code will appear in the editor.
The data has now been imported into the Epi Info project so if you come back a t
afuture time to carry out more analysis, you can select the relevant view from
within the project, rather than having to import the data again. Of course if the data
has been updated or amended then youll want to import it again to work with the
most recent information.
Well carry out a couple of basic tasks with this data before we import some more
meaningful records.
-
Outbreak Investigation using Epi Info Analysis
22
LISTing basic case details
A common task in outbreak investigation is producing a simple case listing,
including for example name, gender, date of birth, case status, onset date, and so
on.
Click on LIST in the command tree. A dialog box will then appear. Initially, lets go
with the default settings and produce a grid showing all the data, so just click OK. A
grid then appears over the top of the output window, with scrollbars etc, allowing
you to scroll through all the data currently selected. This is a bit overwhelming so we
need to change our parameters a little to limit the information that appears.
Click the LIST command again, but this time use the drop-down list at the top of the
dialog to select the following variables:
SURNAME FORENAME GENDER AGE CASEDEFMET ONSETDATE ONSETTIME INCUBATION DURATION
Note that the items appear in
the dialog in alphabetical order
as you select them however
when the list is displayed the
items will appear from left to
right in the order you actually
select them. We can also
choose to have the listing
formatted in the main output
window, rather than the
temporary grid we got the first
time. To do this, select the Web (HTML) option from the Display Mode list on the
right hand side of the dialog. When youve selected this, some extra options will
appear the one that is most useful is Fields Per Page. Putting a zero into Fields
Per Page ensures that all the data for one record appears on the same line
(otherwise the table can get split after 6 columns, which can be difficult to read).
Click OK to display the list. Epi Info displays Missing to represent blank fields (such
as ONSETDATE for a person who did not suffer illness).
OK, so weve seen that we can import data created in EpiData for analysis in Epi
Info. Lets now import some real outbreak data for some more meaningful analysis.
Click the Read command again, and repeat the process we followed before (except
that you shouldnt need to Change Project this time). Select the OutbreakData.rec
file. If all has gone well, the output window should show a record count of 75
records.
-
Outbreak Investigation using Epi Info Analysis
23
For the rest of this section well work with this data to carry out analysis that fits in
with our outbreak scenario.
Producing FREQuencies for various items of data
Generally it is useful to know what the distribution of our study population is for
example by age, location, gender, occupation, illness etc. These sort of descriptive
statistics are often the starting point for analysis of a set of data. Epi Infos
Frequencies command allows us to do this. Click the Frequencies command in the
command tree to bring up the FREQ dialog. In the dialog box, use the Frequency of
dropdown list to select the following variables:
ILL SEX
PERIOD (meaning
incubation period)
Then click OK. After a
short wait, this produces
a frequency breakdown
for each of these data
variables in the Output
window. Scroll up and
down and note that
each table is accompanied by yellow bars to the right that indicate the frequencies.
Epi Info also estimates 95% confidence intervals for each row in the frequency
table, although for most variables these are likely to be of limited use (particularly in
outbreak investigation). If you want to turn off these statistics, click the Settings
button in the FREQ dialog, and ensure the Statistics option is set to None.
For numeric (i.e. measured on a continuous scale) variables like PERIOD, we might
also wish to identify some other statistics, such as mean and median values. Epi
Infos Means command can provide this data. Click the Means command in the
command tree to bring up the dialog select PERIOD from the Means of dropdown
list, and click OK. This produces a similar table to the one we got with the
Frequencies command, but at the bottom additional statistics are provided
including number of observations, mean, median, mode, standard deviation, and
quartiles.
(As an aside, putting two or more numeric variables into the MEANS dialog means
that Epi Info also runs a test to check for the statistical significance between the
variables. This isnt usually of use in an outbreak investigation, where we are more
interested in the Relative Risk and/or Odds Ratio for various exposures, but can be
of value in other types of epidemiological investigation, such as evaluation of a trial
or intervention, comparing a particular measurement before and after the
intervention).
-
Outbreak Investigation using Epi Info Analysis
24
SELECTing particular groups of records
Sometimes you dont want to run a particular analysis on the entire dataset
perhaps you only want to know about those people who were ill. In Epi Info Analysis,
the SELECT statement limits subsequent analysis to particular records based on
criteria that you specify. Click the Select command to bring up the SELECT dialog.
The Available Variables dropdown list includes all the available variables in the
dataset. Well select only those cases that meet the case definition. Select the
CASEDEF variable, then click the = and Yes buttons to put together the full
statement note that Epi Info uses the notation (+) to denote Yes (it also uses (-)
for No and (.) for Missing (blank) data. Alternatively, you can just type the
statement directly into the dialog. Its possible to construct more complex
conditional statements e.g. requiring more than one condition to be met using
AND, or one of a series of conditions to be met using OR.
Click OK when you are finished, and youll be returned to the main Analysis screen.
The Output window has updated to show the current statement(s) that are applying,
and the number of records now selected (note that this has dropped from 75 to 56).
Run a FREQ CASEDEF command to confirm that only cases have been selected.
-
Outbreak Investigation using Epi Info Analysis
25
If we now run another SELECT command, that will be processed only on the 56
records that we are currently working with (this provides another way for combining
several statements to select a particular subgroup of the data). Lets try this by
clicking the SELECT command again. This time, well select all those aged 50 or
over. Select AGE from the dropdown list of variables, then either type or use the
buttons to produce the statement AGE>=50. Click OK, and see how the Output
window has updated to reflect the second SELECT statement. Run a case LISTing of
AGE and CASEDEF to see the selected records.
To get back to the base dataset of all records, simply click the CANCEL SELECT
command, and click OK in the resulting dialog.
Youll notice that there are also options in the same section of the command tree
for SORTing data. The dialog is self-explanatory so we wont dwell on it here like
the SELECT command, the SORT on the data will apply to all future analysis until a
CANCEL SORT command is issued. Before moving on, cancel any current SELECT or
SORT commands so that you are working with the base dataset again.
Recoding data
We saw earlier that, as you might expect, the individuals in our dataset are spread
across a wide range of ages. This makes it difficult to get an accurate, easily
understood picture from Frequencies or Means results about how the population is
distributed. We can overcome this problem by recoding the age data into a series of
age groups.
To do this, we first need to define a new variable that will hold the age group
information for each case. Click the Define command to bring up the DEFINE dialog.
Defining a Standard variable effectively creates an additional field in our dataset,
into which we can then insert data. (Global and Permanent variables are more
commonly used in more complex Epi Info Analysis programs if you want to know
more about them, read the information in the Epi Info help file). Define the Variable
Name as AGEGROUP, and click OK.
Now that weve defined the AGEGROUP variable, we can recode the age data into that
variable. Click the Recode command to show the RECODE dialog.
-
Outbreak Investigation using Epi Info Analysis
26
We want to recode from the base data in the AGE field, so select that in the From
dropdown list. Our target field for the recoded data is AGEGROUP, so put that into the
To list.
The dialog provides two ways to
recode data: entering the options by
hand into the grid shown, or by
clicking the Fill Ranges button,
which is what well do in this case.
Youll see that the dialog changes to
a simpler format, asking for values
for Start, End and By. Start and End
are self-explanatory the start and
finish points for the recoding. The By
values determines the size or
interval for each group. Enter the
range as follows: Start=10, End=90,
By=10, then click OK. This returns
you to the original dialog, which has
now had the grid filled with recoding
values based on our selections. Notice that the terms LOVALUE and HIVALUE are
used to denote the minimum and maximum values and these rows in the grid are
automatically created for us (which is why we set 10 as the start point, not zero).
Click OK to process the recode.
Now run a FREQuencies command on the AGEGROUP variable. Select the CASEDEF
variable in the Stratify by dropdown list to produce two separate tables: one for
cases and the other for non-cases. Using the recoded variable gives a much more
concise description of the age distribution in the population.
One final note: Epi Info may give an error if you try and recode with more than 10
separate recoding options, so its best to limit the range of rows in the recoding
grid to a maximum of 10 options.
Using the Program editor
Because the AGEGROUP variable we defined is only a standard variable, the data
will be lost when we exit Epi Info Analysis. Rather than having to go through the
whole process again (not too difficult in this case, but potentially irritating if the
recoding groups have been more carefully hand-crafted), it would be useful to have
a way of saving the commands that we produced so that we can issue them again
in future. Fortunately, Epi Info Analysis allows us to do this, using the Program
Editor.
Youve probably already noticed that every time you run a command through a
dialog, text is added to the Program Editor window. Lets look a bit more closely at it.
Scroll through the text and youll see a range of commands, starting with the
READing of the original data, the case LISTing, FREQuencies and MEANS operations,
-
Outbreak Investigation using Epi Info Analysis
27
SELECTing and SORTing of records. At the bottom of the text will be the most recent
commands, including DEFINE AGEGROUP and the RECODE commands.
The Program Editor allows us to save this output into the project as an Epi Info
Program. We could save this entire output as a program in Epi Info, and then
running it again at another date would process all the commands listed one after
another. Thats probably overkill, though, and would take a significant amount of
processing time to come up with the results. More usefully, we could save the
commands relevant to the recoding process. Go through the text in the editor and
delete everything that appears before the DEFINE AGEGROUP statement. Then delete
any text after the END statement at the end of the RECODE code block. The Program
Editor should now contain text that looks something like this:
DEFINE AGEGROUP RECODE AGE TO AGEGROUP LOVALUE - 10 = "10 - 20" 20 - 30 = ">20 - 30" 30 - 40 = ">30 - 40" 40 - 50 = ">40 - 50" 50 - 60 = ">50 - 60" 60 - 70 = ">60 - 70" 70 - 80 = ">70 - 80" 80 - 90 = ">80 - 90" 90 - HIVALUE = ">90" END
Click the Save button in the Program Editor the Save Program dialog will appear:
The Project File field
contains the details of the
current project, and can be
left as it is. Type a name for
your program into the
Program field, put your
name in the Author field
and perhaps include a brief
comment as well. The Date
fields are automatically
filled by Epi Info and are not
editable. When youre
happy, click OK to save the
program. (Note that the
Text File option also exists
to save the program to a
separate text file useful if you want to maintain a repository of useful programs to
import into any Epi Info project.)
Now lets see how this has worked. Exit the Analysis program (theres a button at
the top left) and return to the main Epi Info menu. Click the Analyze Data button to
reopen a new instance of Analysis. Youll need to READ in the outbreak data, so do
that (changing project if necessary), selecting the viewOutbreakData option.
-
Outbreak Investigation using Epi Info Analysis
28
Once the data has been read into Analysis, click the Open button on the Program
Editor. The Read Program dialog will appear this looks very similar to the Save
Program dialog. Click the Program dropdown list and select the program you stored
earlier the author, date and comment details will then appear. Click OK to read
the program into the Program Editor. The program hasnt yet been run, so click the
Run button (note that the Run Command button runs only the command currently
containing the cursor). Since all the commands in our program dont actually
produce any output of their own, we need to run FREQ AGEGROUP to see if things
have worked. You should get the same age group breakdown that we saw a little
earlier.
Accessing previous results & controlling file storage
By now, you might be wondering where all this Analysis output is being stored. Epi
Info Analysis stores output as HTML files (web pages). By default, these files are
stored in the same folder as the project file, but there are several options available
to customise this in the Output section towards the bottom of the command tree.
Epi Info also provides a handy index of all the output that youve produced in work
on the project. First, click on CLOSEOUT to close the output file youve just been
working on, and then click on the hyperlink called RESULTS LIBRARY at the top of
the output in the browser (you might need to scroll up). An index page appears,
showing previous commands that have produced output files. Click on any of the
entries to display it.
There are a wide range of options for customising storage of data, mostly accessed
via the Storing Output command. The most useful of these is the ability to set the
Results Folder where output files are stored perhaps a new subfolder inside your
main project directory. Other settings for archiving data are also available, but get
more involved refer to the Epi Info help file for more details.
Despite all this, there might come a time when you want to define a specific file in
which to store a particular set of output. In the next section, well be producing
some graphs, so now well create a file specifically to store the graph output in. This
is does using the Routeout command, which pops up a simple dialog asking for an
output filename.
Click the button to bring
up the file browser dialog
enter a file name (like
Graphs) and click the Open
button, then click OK in the
main ROUTEOUT dialog.
Any future output will be directed to this file, until a CLOSEOUT command is issued
(when Epi Info will start issuing output files in the default location again.)
-
Outbreak Investigation using Epi Info Analysis
29
Producing simple GRAPHs
A common task in outbreak investigation is to plot an epidemic curve showing the
order and frequency of onset date. Epi Info Analysis has a Graph command that
helps us do this.
Click this command to bring up the GRAPH dialog.
First of all, lets run a simple bar chart based on ONSETDATE. At the top left of the
dialog is the Graph Type dropdown list. The default setting is BAR, so leave that
alone for now. Below that, select ONSETDATE as the Main Variable for the x-axis. For
the y-axis, we want the Count of the main variable, so this is OK as it is. Click OK.
This displays the graph in a separate Epi Graph window, and we can see that the
onset date information is displayed in the graph. This window allows further
customisation of the graph if desired for now well just accept what were given
and return to Epi Info by selecting File > Save & ExitFile > Save & ExitFile > Save & ExitFile > Save & Exit. The graph is then displayed in
the output window, and is also saved as a separate JPEG image file in the results
output directory.
-
Outbreak Investigation using Epi Info Analysis
30
Unfortunately, this graph isnt terribly helpful because the incubation period is
relatively short, separating the cases in hours would be more useful. Fortunately,
the outbreak investigators in this case included a field for incubation period in their
data collection, measured in hours. Previously we ran a MEANS command on the
PERIOD variable, so we know that there is a wide range of values from 5 to 46 hours
and around 30 different values in each case. Producing a Bar chart in this
instance may not be terribly helpful, since well get the same wide range of values
because a bar chart will produce a bar for each individual value represented in the
data set. But because were using a numerical (continuous) field, we can run a
Histogram graph, which gives us more control over the output. Click the Graph
command again, and this time select HISTOGRAM from the Graph Type dropdown
list.
Put PERIOD into the x-axis Main Variable list. Notice also the Interval boxes below.
This is where you can set the intervals that the Histogram bars will be grouped in
select an appropriate value like 6 hours. The 1st Value box can also be set the
default is Auto which will select an appropriate value. However, you can adjust this
yourself if you want to force the x-axis to start at 0 or another fixed point which
makes sense in this case, so set the value to zero. Accept the other settings and
click OK.
The way that the graph appears suggests a point source outbreak, but there is also
a second, smaller peak at around 36 hours. How could this be explained?
One possible explanation could be that these are secondary cases, who did not
attend the buffet, but consumed the food in the club bar later in the evening, or
contracted the illness through person-to-person contact with cases. We can run the
-
Outbreak Investigation using Epi Info Analysis
31
graph again, and this time separate the data into two series, based on whether or
not the case attended the funeral itself. Open the GRAPH dialog again, select
HISTOGRAM as the graph type, PERIOD as the main variable, and an interval of 6
hours, with a 1st value of zero. We still want a count of cases on the y-axis, but we
also want to display bars for the series so select the variable ATTEND in the Bar
for each value of dropdown list. It is possible to set the title, but we can do this
at the customisation stage, so well look at that in a moment.
Click OK to produce the graph in Epi Graph. Now we can see that there are two
separate sets of bars for those who did and did not attend the funeral.
Nevertheless, by customising the graph we can make it easier to see exactly whats
going on. In Epi Graph, select
View > Customization View > Customization View > Customization View > Customization
The Customization dialog allows
us to modify many of the factors
relating to the design and
presentation of the graph. On the
General tab, we can add or
modify the Title and Subtitle, set
font sizes, whether the graph
should appear in colour or
monochrome, apply gridlines, and
determine whether a data table is
produced to go with the graph.
Add a title and subtitle for the
graph if you wish you can leave most of the other settings as they are unless you
particularly want to add grid lines for the y-axis to make it easier to read across the
counts.
Now select the Plot tab in the
dialog. The Plot Style list in the
centre of this tab gives a range of
options for displaying the graph
series. Currently the separate
bars mean that it is hard to pick
out the actual epidemic curves.
We can change the display to
make it easier to see what is
happening select Area from the
Plot Style list (which displays the
series as filled areas).
To see what difference a change
makes, you can click the Apply button, which applies the customisations to the
graph without closing the dialog. It might also help to make the series show in 3D,
so select that option from the bottom left. The other tabs can be left as they are
-
Outbreak Investigation using Epi Info Analysis
32
unless you would like to change the Font, Color or Styles used for the display
these tabs are self-explanatory.
Access to many of these settings can also be obtained by right-clicking on the graph
and selecting from the pop-up menu options that appear.
Once youre happy with the settings, click OK in the dialog to have a proper look at
your graph. Its now easier to see whats been happening. When youre finished you
can select File > Save & ExitFile > Save & ExitFile > Save & ExitFile > Save & Exit to return to the Analysis window. (You can also select
File > ExportFile > ExportFile > ExportFile > Export to export the graph as an image to the clipboard, direct to a printer, or
a file location of your choice).
An example of the type of results that are achievable using Epi Graph is shown
below (you would probably show the graph in a larger format for a full outbreak
report).
0.0
5.0
10.0
0 6 12 18 24 30 36 42 48
Epidemic CurveIncubation period since buffet (hours)
COUNT
PERIOD
ATTEND=Yes ATTEND=No
Weve finished working with the graphs now, so click the Closeout command to stop
routing output to the Graphs file we created earlier.
Producing 2x2 tables to assess relevance of exposures
The chief purpose of conducting an epidemiological study in outbreak investigation
is to try to identify statistically significant associations between certain exposures
and illness. This is normally done through the construction of 2x2 tables,
comparing the rates of illness in those exposed to a particular risk factor to the rate
in those who were not exposed.
Epi Infos Tables command allows us to produce these tables, together with
relevant statistics. Lets investigate the relevance of exposures in our outbreak
example. Well start by looking at consumption of ham sandwiches. Click the Tables
command to bring up the TABLES dialog.
-
Outbreak Investigation using Epi Info Analysis
33
Select the variable HAM in the Exposure Variable list. Our outcome variable for all
this analysis will be CASEDEF, representing whether or not an individual meets the
case definition (the alternative would be ILL, but there may be some people who
have reported illness but do not meet the case definition). Accept all the other
default settings, and click OK.
The 2x2 table will be displayed in the output window, followed by a range of
different statistics. Presentation of the 2x2 table is fairly obvious, but lets spend a
little time considering the various statistics. The statistics produced from the above
analysis are shown below.
Single Table AnalysisSingle Table AnalysisSingle Table AnalysisSingle Table Analysis Point 95% Confidence Interval Estimate Lower Upper PARAMETERS: Odds-based Odds Ratio (cross product) 0.6771 0.2368 1.9365 (T) Odds Ratio (MLE) 0.6806 0.2286 1.9691 (M) 0.2038 2.1855 (F) PARAMETERS: Risk-based Risk Ratio (RR) 0.9065 0.6960 1.1807 (T) Risk Difference (RD%) -7.3257 -26.9187 12.2672 (T) (T=Taylor series; C=Cornfield; M=Mid-P; F=Fisher Exact) STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square uncorrected 0.5319 0.4658206779 Chi square - Mantel-Haenszel 0.5248 0.4688093873 Chi square - corrected (Yates) 0.2151 0.6428085146 Mid-p exact 0.2409736241 Fisher exact 0.3219542501
In a case-control study, we would be interested in the odds ratio (cross product),
and risk-based measures should not be used. In a cohort study, we are able to use
risk-based parameters as the total population exposed is a known quantity so the
Risk Ratio (also known as the Relative Risk) is the most useful basic measure of
risk. In this case the RR is 0.9065, which suggests very little difference in outcome
based on this exposure (strict interpretation: those eating ham sandwiches were
-
Outbreak Investigation using Epi Info Analysis
34
0.9065 times as likely to be cases as those who did not). Since the 95% Confidence
Interval includes the no difference value of 1, we know that this difference is not
statistically significant at the 95% confidence level.
The statistics provided also include chi-square test results in the form of p-values,
achieved by a number of different statistical procedures. In general, where a
reasonably large dataset/sample size has been used, there is unlikely to be an
important difference between these procedures. However, where the differences
between the procedures are important (e.g. one identifies a statistically significant
result, but another does not), you should seek the advice of an epidemiologist or
statistician to assist with interpretation of the results.
This analysis indicates that consumption of ham sandwiches was not associated
with illness in this outbreak. However, other foodstuffs might be implicated in the
outbreak, and assessments of the strength of association between consumption
and illness for each of the menu items should be completed.
Appendix IV Appendix IV Appendix IV Appendix IV containscontainscontainscontains a a a a worksheet containing an empty table for you to record the worksheet containing an empty table for you to record the worksheet containing an empty table for you to record the worksheet containing an empty table for you to record the
results of this analysis.results of this analysis.results of this analysis.results of this analysis. A completed version of the worksheet is included on the A completed version of the worksheet is included on the A completed version of the worksheet is included on the A completed version of the worksheet is included on the
course CDcourse CDcourse CDcourse CD----ROM.ROM.ROM.ROM.
-
Using Analysis with routine COSURV data
35
Using Analysis with routine COSURV data
So far weve concentrated on the outbreak scenario, but there are also times when
we want to carry out on routinely collected surveillance data, particularly the
information held on Cosurv. Cosurv has the ability to export data into the REC
format readable by Epi Info for Windows. This short section looks at how we export
that data out of Cosurv, and some common analysis tasks that we might wish to
perform.
NB This process will overwrite any existing/previous exports that you have done (i.e. NB This process will overwrite any existing/previous exports that you have done (i.e. NB This process will overwrite any existing/previous exports that you have done (i.e. NB This process will overwrite any existing/previous exports that you have done (i.e.
the file EpiXport.REC the file EpiXport.REC the file EpiXport.REC the file EpiXport.REC read on for details) so please make sure that you ha read on for details) so please make sure that you ha read on for details) so please make sure that you ha read on for details) so please make sure that you have taken ve taken ve taken ve taken
a copy of any previous exports if the data is still required.a copy of any previous exports if the data is still required.a copy of any previous exports if the data is still required.a copy of any previous exports if the data is still required.
Exporting data from Cosurv
Open Cosurv, and enter your username and password. From the Cosurv main
screen, select Export > EpiInfo & ASCIIExport > EpiInfo & ASCIIExport > EpiInfo & ASCIIExport > EpiInfo & ASCII. The Export dialog will then appear:
Select the dates for the relevant period you want to export to say, all of 2006
then click the Get Records button. The list of records to be exported will appear in a
spreadsheet format in the main part of the dialog. You can use the scrollbars to
move around the list and explore the data that will be exported. One word of
warning: if a specific record has been selected in that dialog, and then the Export
button is clicked, only that record will be exported (but potentially hundreds of
times, depending on the size of the dataset). So if you have unwittingly clicked on a
specific record, try clicking off the dialog somewhere to deselect it. This may be a
bug in Cosurv, or it may be intended behaviour either way, it probably isnt what
you want to happen.
-
Using Analysis with routine COSURV data
36
If youre happy with the list thats shown, click the Export button. Cosurv will process
the export you may see a DOS program window pop up briefly, as Cosurv actually
uses one of the old Epi Info 6 DOS utilities to carry out the export. Once the export
has been processed, youll be returned to the main Cosurv screen.
The exported data has been placed into an Epi Info REC file named EpiXport.REC
(NB the data has also been exported in ASCII delimited text format in a file named
csExport.txt a format suitable for import into spreadsheet software such as Excel).
But where is it? The location of this file depends on your current Cosurv settings.
You can find these out from the main Cosurv screen by selecting System > Setup > System > Setup > System > Setup > System > Setup >
DistrictDistrictDistrictDistrict, which brings up the District settings dialog.
The final box on the General tab shows the location for Epi Info export files. When
youve done an export, its strongly recommended that you move or rename this file
appropriately to stop it being overwritten by future exports.
The data that youre provided with for this training has been anonymised, with key
patient-identifying data (names, address, telephone numbers, employer details, id
numbers etc removed). Obviously when working with live data these will be
available, and this is something to bear in mind since it is important that data
protection and confidentiality guidelines are followed. The Cosurv data system itself
is encrypted, i.e. data can only be accessed through the password-protected Cosurv
database itself anyone trying to access the datafiles via other methods will just
read encrypted gobbledegook. However, data exported in Epi Info (or ASCII text)
format is not encrypted, and can be read by anyone with a text editor or
spreadsheet. Therefore, you need to keep careful control over how exported data is
stored and manipulated.
-
Using Analysis with routine COSURV data
37
Importing data into Epi Info
Since the data is now in the REC file format, importing the data into an Epi Info
project follows the same process as discussed earlier in relation to the outbreak
data. In reality, you might want to create a separate new Epi Info project for this
routine data, but for the purposes of this training, well use the same project that
weve been working with so far. So use the READ command in Epi Info Analysis,
select Epi6 from the Data Formats dropdown list, and then browse to select the
Cosurv2005.REC data file. Click OK to import the data into Epi Info Analysis.
Common analysis tasks for routine surveillance data
Lets briefly look at some of the common analysis tasks that we might want to
complete with the Cosurv data. Of course, Cosurv itself has its own reporting
mechanisms through the District Reports section, but these apply only to certain
predetermined formats using Epi Info allows us full control over the data and its
presentation. The data supplied is the anonymised surveillance reports held by
Cardiff Council for the calendar year 2005.
First of all, lets generate some basic frequencies. Click the Frequencies command
and include the following fields in the dialog:
SEX Gender of the case
LOCALITY Locality (in Cardiff, we use electoral ward)
DISEASE Disease (as used in NOIDS return)
TRANSMISS Suspected mode of transmission
CONTRACTED Suspected source home, commercial, unknown
The usual list of frequency tables will appear. You may or may not be surprised by
the results!
It might also be interesting to study the distribution of cases by age in the
population. Age data is stored in two ways in the Cosurv data system the DOB
field (date of birth), and the combination of the fields AGEY (age in years) and AGEM
(months passed of current year) so a case aged 18 months at the time of the data
entry would have AGEY=1, AGEM=6. We saw before that age has such a wide range
of values that it is often easier to recode age information into age groups for
analysis.
Let us reuse the program we created earlier in the outbreak analysis. Load the
program into the Program Editor using the Open button. We need to make one
minor modification the variable we use for age is AGEY, whereas in the program it
refers to AGE make any necessary modifications and then run the program. Now
run FREQ AGEGROUP to see the distribution of cases by age group (equally, we
could use this variable to construct a bar graph, or even a population pyramid by
stratifying by gender).
-
Using Analysis with routine COSURV data
38
Well now focus a bit more on the data that specifically relates to cases of Food
Poisoning. To do this we need to select only those cases where DISEASE=Food
Poisoning. Click the SELECT command and pick the DISEASE variable from the
dropdown list, then type =Food Poisoning into the dialog after the variable
name. (Alternatively, you can use the DISCODE variable and type =22, which achieves
the same thing and is quicker and easier to finish but of course you have to know
that the DISCODE for Food Poisoning is 22).
Now lets look at the various types of organism that have been associated with
cases of food poisoning so run a Frequencies command for the ORGANISM field.
Youll note that organisms such as Campylobacter are basically caught under one
field, but because most isolates of Salmonella are sent to HPA Colindale for further
typing, the full breakdown of species is available for these cases.
As a final exercise, lets investigate seasonal variation in notification of food
poisoning cases. Select the Graph command, and choose the LINE graph type.
Select the WEEK field as the main variable for the x-axis (WEEK is the NOIDS week
during which the case was notified), and leave the other default settings as they
are, since what we are looking for is the count of cases in each week. The graph will
be produced (and as before, you can continue to tinker with the display settings,
titles etc). The line is quite jagged, because there is inevitably some variation from
week to week anyway if you wanted to flatten this out a bit, you could produce a
similar graph in HISTOGRAM type, with an INTERVAL of either 2 or 4 weeks to
aggregate the data a bit closer (fortnightly figures are quite useful).
What Wha