Using EpiData & Epi Info for Windows1

of 57 /57
Using EpiData & Epi-Info for Windows Training for Communicable Disease Control in Local Authorities Cardiff Council (Strategic Planning & Environment) March 2007

Embed Size (px)



Transcript of Using EpiData & Epi Info for Windows1

  • Using EpiData & Epi-Info

    for Windows

    Training for Communicable Disease

    Control in Local Authorities

    Cardiff Council (Strategic Planning & Environment)

    March 2007

  • Acknowledgements



    2007 Cardiff Council (Strategic Planning & Environment).

    This training guide was produced by Alastair Tomlinson to form part of the

    Communicable Disease Lead Officer Training Programme, co-ordinated by the

    Wales Centre for Health.

    Please send enquiries relating to this training guide to:

    Alastair Tomlinson, Chartered Environmental Health Practitioner

    Team Leader (Health Improvement)

    Public Protection Division

    Room 134 City Hall

    Cathays Park

    Cardiff. CF10 3ND.

    029 2087 1845

    [email protected]

    About the software

    Epi Info is a public domain software

    package designed for the global community

    of public health practitioners and

    researchers. It provides for easy form and

    database construction, data entry, and analysis with epidemiologic statistics, maps,

    and graphs. Epi Info can be downloaded from

    EpiData Software has developed from

    securing the principles of Epi Info V6 for DOS

    to an independent documentation oriented

    system. EpiData can be downloaded from

    Conventions used in this training guide

    Text to be entered on screen is shown in this font.

    Directions to drop-down menu items are shown in bold type, e.g. File > SaveFile > SaveFile > SaveFile > Save.

  • Table of Contents


    Table of Contents


    Table of Contents .................................................................................. ii

    Aim and Objectives ..............................................................................1

    Outbreak Scenario................................................................................2

    Creating a Questionnaire using EpiData.............................................3

    Entering Data using EpiData ...............................................................17

    Outbreak Investigation using Epi Info Analysis.................................19

    Using Analysis with routine COSURV data.........................................35

    Other capabilities of EpiData and Epi Info ........................................39

    Appendix I Comparison of Epi Info & EpiData ...............................44

    Appendix II Contents of course CD-ROM.......................................48

    Appendix III Further information & resources.................................49

    Appendix IV Worksheet for 2x2 table results .................................52

    Appendix V Check code example ................................................53

  • Aim and Objectives


    Aim and Objectives

    Aim of the training

    To provide training on the practical use of Epi Info and EpiData in communicable

    disease control, with particular reference to:

    An outbreak situation

    Analysis of routine Cosurv surveillance data


    By the end of the training delegates will:

    Have an understanding of EpiData and Epi-Info for Windows and their component elements

    Be able to use EpiData to design a data entry form for a questionnaire in an outbreak situation

    Be able to use EpiData to enter outbreak investigation data into a record suitable for analysis in Epi Info for Windows

    Be able to use Analysis to obtain useful statistical and epidemiological information from an EpiData / Epi-Info for Windows database for outbreak

    investigation purposes

    Be able to use Analysis to import routine Cosurv surveillance data into Epi-Info for Windows, and obtain useful statistical and epidemiological information

  • Outbreak Scenario


    Outbreak Scenario

    On the 17th August, you receive a telephone call from a gentleman who reports that

    he and several others who attended a buffet following a funeral were suffering

    symptoms of food poisoning. The buffet, provided by an external caterer, was held

    at a local club following the funeral, and mourners arrived at the club at around

    3.00 pm on 14th August. Food left over from the buffet was placed in the main bar

    areas of the club for club members to consume later that day.

    Initial activity involves obtaining of a list of people who attended the funeral and

    others who may have eaten the food provided for the funeral buffet. A list of food

    served at the buffet has been obtained from the caterer, and cross-referenced with

    initial information gathered from cases. Indications are that around 70-80 people

    attended the funeral, and approximately 40-50 of these people may have

    experienced symptoms consistent with food poisoning.

    Table Table Table Table 1111 ---- List of foods served at the buffet List of foods served at the buffet List of foods served at the buffet List of foods served at the buffet

    An Outbreak Control Team has been convened, and has decided to undertake a

    cohort study to investigate the outbreak. The OCT assigns you with the following


    Establish the case definition

    Develop a structured questionnaire to investigate the outbreak.

    Enter questionnaire data into an appropriate computer database

    Analyse the data to describe outbreak and identify exposures associated with illness

    This training uses this scenario to introduce the various functions of EpiData and

    Epi Info for Windows, and their particular use in outbreak investigation.

    sausage rolls chicken rolls salmon sandwiches

    pickled onions ham sandwiches egg rolls

    corned beef sandwiches ham rolls egg sandwiches

    chicken nuggets chicken sandwiches cheese & biscuits

    crisps gateaux pasties

  • Creating a Questionnaire using EpiData


    Creating a Questionnaire using EpiData

    Basic Questionnaire Creation

    A screenshot of the main EpiData screen is shown.

    We want to create a new questionnaire, so select Define DataDefine DataDefine DataDefine Data > New > New > New > New .QES File.QES File.QES File.QES File (It is

    also possible to edit an existing questionnaire, by using Define Data > Open .QES Define Data > Open .QES Define Data > Open .QES Define Data > Open .QES

    FileFileFileFile). This creates an empty text file into which we can enter information.

    On creation of the file, the following toolbar

    option also becomes available.

    Clicking this button brings up the

    Field pick list dialog.

    This dialog makes it easy to create different

    kinds of fields. As an example, we will create

    one or two of the basic field types in our


  • Creating a Questionnaire using EpiData


    First, type an appropriate heading into the first line of your questionnaire, such as

    Lead Officer Training March 2007.

    Then, on the row below, enter Surname: Leave the cursor flashing after the colon. If

    the Field pick list is not already showing, click the button to bring it on screen.

    Select the Text tab from the pick list. This then gives a short option list of text,

    upper-case text, and encryption field. For now well accept the default text

    option. Set the field length to 20, then click the Insert button. EpiData inserts a

    series of underscore characters after the Surname: label. Underscore characters _

    are how EpiData denotes plain text fields. The number of underscores indicates the

    maximum length of the field.

    On the next line, type Forename: Using the field pick list again, insert another text

    field of 15 characters.

    Now lets try a different field type dates. On

    the next line, type Date of birth: Select the

    Date tab from the field pick list. This

    presents two lists of options general date

    fields on the left, and automatic dates on

    the right. General date fields are formatted in

    three different ways. For most of us in

    Europe, the format is most

    natural, so select that. Click the Insert button,

    and EpiData inserts the relevant date format

    field type.

    On the next line, type Gender: and insert a single character Uppercase text field.

    EpiData inserts a code, which denotes an uppercase field one character long.

    Later, well restrict the entries in this field to either M (male), F (female) or U

    (unknown). Below this, add the label Occupation: and insert another 20 character

    text field.

    We need to be able to record interviewee address details.

    First, lets create a house number field. On

    the next line type House number:, and then

    select the Numeric tab on the field pick list.

    Select 3 digits before the decimal point, and

    0 digits after it, then click the Insert button.

    EpiData inserts ### after the House

    number: label. # characters are how EpiData

    denotes numeric fields, and again the

    number of # characters indicates the

    maximum size of the number. (Numbers with

    a decimal point appear as ##.##).

  • Creating a Questionnaire using EpiData


    Add another text field for House name (30 characters), and three more fields for

    Street name (30 characters), District (20 characters) and Town (20 characters).

    Then add another label for Postcode: and this time add an Uppercase text field of

    8 characters. EpiData inserts uppercase fields as with the number of

    spaces determining the total length of the field.

    Finally, lets add a field for telephone details. Initially it seems like a good idea to

    create this as a numeric field, but in doing this we wouldnt be able to record any

    text details (such as ext. etc), and its unlikely we would ever want to order our data

    by telephone number, so its probably easier to simply create a text field of around

    15-20 characters. If you prefer you can create two fields, one for home and one for

    other (e.g. work, mobile).

    Weve now created the fields for the basic contact details of the interviewee. Before

    proceeding onto further work, lets save what weve done so far. Click the Save

    button on the toolbar (or select File > SaveFile > SaveFile > SaveFile > Save). Enter an appropriate filename and

    location in the dialog box, and click Save.

    We can also take a sneak preview of how the questionnaire will appear for those

    entering data. Before doing that, lets set a couple of options that determine how

    our fields will be named. Click File > OptionsFile > OptionsFile > OptionsFile > Options, and then select the Create data file


    Generally, the automatic field names options is best, since it will try to make sense

    of the question (i.e. the text immediately to the left of the field), and will ignore

    common words like who, did, or etc. Sometimes it may be preferable to select

    the First word in question option. For this exercise, select automatic field names.

  • Creating a Questionnaire using EpiData


    Later in the module well look at how we can specifically tailor the fieldnames that

    EpiData will generate in the data record files. Fieldnames have a maximum length

    of 10 characters.

    The decision on letter case of field names is mainly one of personal preference

    the authors preference is to use upper-case for field names to make them stand


    Once you have made your option selections and clicked OK, click the Preview

    Data Form button (or select Make Data File > Preview Data FormMake Data File > Preview Data FormMake Data File > Preview Data FormMake Data File > Preview Data Form).

    A new tab on the main display will appear, showing the questionnaire with data

    entry fields in the relevant places. You can select File > Print Data FFile > Print Data FFile > Print Data FFile > Print Data Formormormorm to get an

    idea of how the questionnaire will appear on paper for completion by interviewers.

    You can even practice entering data into the form to check that things appear as

    you expect them to. For now, its useful just to see how things are going to be

    presented. To close the form, select File > Close formFile > Close formFile > Close formFile > Close form, or press CTRL F4.

    Currently our questionnaire lets us record interviewees personal details, but not a

    lot else. Lets change that by adding some details specific to the event in our

    scenario. The first thing to establish is whether the person actually attended the

    funeral (they may have been exposed to the food under suspicion at the club bar

    after the event).

    This introduces us to another important field type: the Boolean field. This is simply a

    Yes/No field, but this type of data is often crucial in outbreak investigation, since it

    allows us to construct 2x2 tables to assess

    relative risk for various exposures.

    On a new line at the end of the questionnaire,

    type Did you attend the funeral? Select

    the Other tab on the field pick list, and select

    the Boolean (yes/no) option (the Length field

    on the dialog becomes greyed out as it isnt

    relevant). Click the Insert button. EpiData

    inserts a code, which is how it denotes a

    Boolean field.

    For clarity, lets also include a question on whether the person attended the

    members club we dont know if people from the members club have been

    affected, or whether there is crossover with the cohort of funeral attendees, but it

    may be important to be able to distinguish between them later. On a new line, type

    Did you attend the members club? and insert another Boolean field.

  • Creating a Questionnaire using EpiData


    Well also use Boolean fields to record whether or not the person was ill, and what

    their symptoms were. Add the relevant lines and fields to the questionnaire for the

    following fields:

    Sometimes people may have described themselves as ill, but do not meet the

    actual case definition, so include an additional Case definition met? Boolean

    field as well.

    Another key set of data to record for those who have suffered symptoms is their

    onset date/time, and the duration of symptoms. Go back up the questionnaire, and

    add a couple of extra lines after Were you ill? but before the list of symptoms.

    Type Onset date: and then insert a general date field. On the next line type Onset

    time: and then insert a numeric field with 2 digits before and 2 digits after the

    decimal point (##.##). EpiData records time-related information in this numeric,

    with the digits before the point representing hours and those after minutes. The 24-

    hour clock is used. Then add another field for duration of symptoms 2 digits in

    size, intended to be measured in days, and a similar 3 digit field for incubation

    period, this time intended to be measured in hours.

    In a full outbreak we would probably also include further questions about whether

    the person was hospitalised, whether specimens had been submitted, and so on,

    together with details of any other household contacts, and maybe other data to

    indicate severity of symptoms, but for the purposes of this exercise well skip these


    The final major part of the questionnaire is the recording of relevant exposures.

    Comparison of the rates of illness in those exposed and not exposed will enable us

    to assess which exposures are most likely to be implicated in the outbreak. For the

    purpose of this exercise, well assume that the OCT has decided to focus attention

    on the foods consumed at the buffet. In a real life situation, it may be more

    appropriate to retain an open mind and include other potentially relevant exposures

    that may explain some or all of the illness.

    NB avoid use of the ampersand & symbol in questionnaires, since it tends to

    cause unexpected display results.

    Were you ill?



    Abdominal pain




    Other aches

    Other symptoms (with a separate text field for description)

  • Creating a Questionnaire using EpiData


    Add a list of Boolean fields for the relevant food items the table from the outbreak

    scenario is reproduced below.

    Again, in a real situation we might add additional information on quantity eaten or

    portion size to investigate the possibility of a dose-response relationship, but for this

    exercise we will keep the exposures simple yes/no answers. Finally, add a general

    comments text field of around 50 characters to capture any other relevant

    information (e.g. perhaps interview was carried out with parent, relative, interpreter


    The only remaining items to add to the questionnaire are some basic administration

    fields. We need to have some way of identifying each record as unique, and EpiData

    provides a specific field type for this purpose. Its quite useful for this to be easily

    seen, so at the very top of the questionnaire, type Record no.: and select the

    Other tab on the Field pick list. Select the Auto ID number field type and click

    Insert. EpiData inserts an code, which will include an automatically

    incrementing number for each new record added. It is also useful to record the date

    of interview and the name of the interviewer (initials usually sufficient), so add

    appropriate date and uppercase fields for this purpose.

    We can also add another date field, the date of entry into EpiData. Again EpiData

    can automatically insert this for us select the Date tab on the Field pick list and

    select the code from the right-hand list.

    The basic questionnaire is now complete. At the moment it isnt particularly easy to

    read and this may make it more difficult for interviewers to complete the

    questionnaire, and harder for data entry staff to accurately and quickly enter the

    results. EpiData includes an Align fields option to help address this problem. Place

    the cursor in one of the longer questions/labels, such as Corned beef sandwiches,

    or Did you attend the members club? Then select Edit > Align FieldsEdit > Align FieldsEdit > Align FieldsEdit > Align Fields. EpiData

    will realign each line of the questionnaire so that the fields appear in a column

    making it easier for both interviewers and data entry staff.

    Save the questionnaire before we proceed any further.

    sausage rolls chicken rolls salmon sandwiches

    pickled onions ham sandwiches egg rolls

    corned beef sandwiches ham rolls egg sandwiches

    chicken nuggets chicken sandwiches cheese & biscuits

    crisps gateaux pasties

  • Creating a Questionnaire using EpiData


    Advanced questionnaire design

    In this section well cover some of the techniques and functions provided by

    EpiData to help save time on data entry, and to ensure that accurate and reliable

    data is entered.

    Closer control over fieldnames

    To start with, lets look at how our questionnaire looks in data entry mode. Select the

    Preview Data Form button to display the data form. Use the TTTTabababab key to cycle through

    the fields in the questionnaire. Note that for each field, the fieldname appears in

    the status bar at the bottom left of the screen, and next to it information on the type

    of data that can be entered (e.g. Alpha: all entries allowed, Date (dmy): 0-

    9 and / allowed, Boolean: Y,1,N,0 allowed etc.).

    As you cycle through the fields, note the fieldnames that EpiData has automatically

    assigned to each field. In the majority of cases, they make perfect sense, but there

    are a few where the fieldname doesnt intuitively indicate what the contents of the

    field are. This can be particularly important where data analysis is being undertaken

    by someone who wasnt involved in the original drafting of the questionnaire (quite

    conceivable in a large outbreak with several partner organisations) the last thing

    that they need is to be unsure what a relevant item of data actually means.

    Fortunately, EpiData allows questionnaire designers greater control over fieldname

    selection where necessary.

    The default fieldname selected in each case is up to 10 letters long, based on the

    text that appears immediately to the left of the field but ignoring common words

    such as did or the. As an example, the fieldname for Did you attend the

    funeral? is YOUATTENDF, for Were you ill? WEREYOUILL, and for Pickled

    onions - PICKLEDONI.

    For these and some other fields, we would like to tailor the fieldname to make it a

    bit more meaningful. The chief way of doing this is by the use of braces { }, also

    known as curly brackets. When automatically selecting fieldnames, EpiData uses

    text enclosed in braces in preference to normal text. If the question is {my} first

    {field} then the field name will be MYFIELD. Braces offer a powerful method of

    defining meaningful field names.

    Lets look at a simple example the Were you ill? question. Although the

    fieldname does make sense, it doesnt really need to be that long simply the term

    ILL would be enough. By putting braces around the word ill in the questionnaire, we

    force EpiData to call the field ILL. Modify the text in the questionnaire so that it

    looks like this:

    Were you {ill}?

    Now click Preview Data Form and put the cursor into the Were you ill? field. You

    can see from the status bar in the bottom corner that this fieldname is now simply

    ILL. Once youve satisfied yourself of this, close the preview (CTRL F4).

  • Creating a Questionnaire using EpiData


    This is a fairly simple example, but the EpiData capabilities are more sophisticated

    than that. EpiData can pull text from more than one set of braces together to create

    a fieldname. As another example, consider the Did you attend the funeral/members

    club questions. Presently these have fieldnames of YOUATTENDF and YOUATTENDM

    respectively not terribly meaningful. But by changing the text in the questionnaire

    as follows:

    Did you {attend} the {fun}eral? Did you {attend} the members {club}?

    we produce fieldnames of ATTENDFUN and ATTENDCLUB, which are far more

    intuitive. Check for yourself by clicking Preview Data Form. Notice also that the

    braces do not appear on the entry form (and wont appear on a printout either), so it

    doesnt affect the ease of use for interviewers and data entry staff.

    Go through the table below to update the questions as indicated to generate more

    meaningful fieldnames:

    Question Current fieldname Modification New fieldname

    Date of birth DATEBIRTH {D}ate {o}f {b]irth DOB

    Abdominal pain ABDOMINALP {Abdom}inal {pain} ABDOMPAIN

    Case definition met? CASEDEFINI {Case def}inition {met}? CASEDEFMET

    Sausage rolls SAUSAGEROL {Saus}age {rolls} SAUSROLLS

    Pickled onions PICKLEDONI Pickled {onions} ONIONS

    Chicken nuggets CHICKENNUG Chicken {nuggets} NUGGETS

    Chicken rolls CHICKENROL {Chick}en {rolls} CHICKROLLS

    Chicken sandwiches CHICKENSAN {Chick}en {sand}wiches CHICKSAND

    Click Preview Data Form to confirm the changes that have been made. Once youre

    finished, close the preview and save your modified questionnaire.

    Controlling data entry and skipping questions

    For some fields it can be useful to place restrictions on the range of data that can

    be entered for example the Gender field can only have three sensible values

    (male, female, unknown) and it also makes sense to limit the Onset time field to the

    valid times represented in the 24 hour clock. There also some fields that can be

    filled through calculation for example, age at time of interview, incubation period,

    perhaps even case definition in some circumstances which can help with data

    accuracy and consistency. Finally data entry can be significantly quicker by using

    skips so that the data entry operative doesnt have to cycle through irrelevant

    fields (such as symptom fields for an interviewee who wasnt ill).

    These functions are all achieved by what EpiData calls checks. Checks are usually

    added once a data file has been created based on the layout in a questionnaire.

  • Creating a Questionnaire using EpiData


    One of the things well do is add a simple calculation to work out a persons age in

    years at the time of the interview. Before we create the data file, add a new numeric

    field of 2 digits to hold the calculated age. Place it below the date of birth question.

    Now our questionnaire has all the fields we need, so we can create the data file that

    EpiData will actually store the records in once they are entered. Make sure you have

    saved the most recent changes to the questionnaire, then click MakeMakeMakeMake Data File > Data File > Data File > Data File >

    Make Data FileMake Data FileMake Data FileMake Data File to display the following dialog:

    The .QES file is the file holding the questionnaire details (.QES is the extension that

    EpiData uses for all questionnaire files). The currently active file should be

    displayed in this box. The data file will be created according to the details shown in

    the lower box, and the default setting is the same name as the questionnaire file,

    but with a .REC extension, which is the extension used by EpiData for data record

    files. If the settings in the dialog look appropriate, click OK.

    Youll then be presented with another dialog asking you to give a label to the data


    Give the data file an appropriate label (e.g. codename or incident number of the


    You should then get a message saying that the data file has been created. EpiData

    has also closed the original questionnaire, so we now have a blank screen. Click the

    ChecksChecksChecksChecks button, which will then ask you to open a data file select the file we just

    created above.

    EpiData will then open up the new data form unsurprisingly this has a similar

    appearance to that of the Preview Data Form. However, we are currently in

    Add/revise checks mode, so it isnt possible to enter data. Youll also notice that

  • Creating a Questionnaire using EpiData


    the Check file dialog has appeared. The status bar of the dialog shows the name of

    the check file that check details will be stored in. This will have the same name as

    the REC file, but with a .CHK extension.

    The dialog shows the current field that checks

    may be added to. Some of the basic check

    settings are then shown in the lower part of the

    dialog. Well briefly summarise all these options

    before looking at some specific examples.

    Range,Range,Range,Range, Legal Legal Legal Legal allows you to restrict the range of

    values that can be entered in a particular field. A

    range is defined by typing the minimum value and

    the maximum value separated by a hyphen.

    Typing 2-5 defines that only the numbers 2,3,4 or

    5 can be entered in the current field. If only a

    maximum value is wanted then use -INF (minus

    infinity) as the minimum value. If only a minimum

    value is wanted then use INF (infinity) as the

    maximum value. Typing -INF-5 defines all numbers less than or equal to 5 as legal

    entries in the current field. Typing 0-INF defines all positive numbers as legal

    entries. Legal values are defined by typing all the accepted values separated by

    spaces or commas. Typing 4,6,8,10 defines that only the numbers 4,6,8 or 10 can

    be entered in the current field.

    JumpsJumpsJumpsJumps are available to help data entry flow. As an example, if a person has

    indicated that they suffered illness, data entry is likely to cover the symptoms that

    they suffered. If they werent ill, it makes more sense for data entry to skip past

    those fields and onto the next section. Jumps are entered by specifying the value,

    entering a greater-than-sign (>) and specifying the name of the field to jump to. For

    example, in relation to the ILL field, entering Y>ONSETDATE,N>SAUSROLLS would set

    up the necessary jumps as suggested in the example. Well look at exactly how we

    set this up a little later (since we might also want to automatically set the Case

    Definition Met field to No if the person has not been ill). Its also possible to use

    AUTOJUMP followed by the fieldname to make the skip take place regardless of the

    value inserted in the field.

    Must enterMust enterMust enterMust enter is quite simple the Yes/No value defines whether or not a value must

    be entered for the question. There will be some fields for which this is useful (e.g.

    basic personal details like name, perhaps date of birth and address as well,

    together with key data points such as whether they were ill and/or meet the case

    definition). There may be other fields where being able to leave the field blank is

    useful for example if someone does not remember whether or not they ate Crisps,

    it is better that the field is left blank rather than assuming a No answer, which

    could distort results.

    RepeatRepeatRepeatRepeat if Yes is entered in this rule then the data entered in the previous record

    will be repeated in the next new record. Repeated data can be changed during data

    entry. This function can save a lot of typing if your forms contain data that changes

  • Creating a Questionnaire using EpiData


    only rarely in a particular batch of forms (e.g. reporting forms in a surveillance

    system). It is probably of less use in an outbreak situation.

    Value labelsValue labelsValue labelsValue labels are a set of values combined with text items that explain the meaning

    of each value. For example, a field is created to enter information on the sex of the

    informant. It is decided that a value of 1 in the field means that the informant is

    male and that a value of 2 means the informant is female. If a value label is defined

    then a translation table can be shown during data entry if the user presses [F9] (or

    the [+] key on the numeric keypad). The value labels in this example would be:

    1 Male 2 Female

    It is important not to confuse value labels with ranges/legal values although both

    place restrictions on the data that can be entered into the field. Decide what you

    want and select the appropriate option you may not want to have to go to the

    trouble of setting value labels if a simple range is all thats required, and of course

    in some situations value labels arent relevant.

    These are the basic checks that can be attached to a field through the check file

    dialog. In addition, you can click Edit to open the check file editor for the current

    field and enter check code manually. This is useful for calculating field values based

    on what has already been entered, and for more complicated checks (which are

    largely outside the remit of this training but covered in detail in the EpiData help


    Lets work through some examples.

    Select the ONSETTIME field on the form (or from the dropdown list in the check

    dialog). Then type 00.00-23.59 into the Range, LegalRange, LegalRange, LegalRange, Legal box on the dialog. This sets

    the range to those relevant to a time setting. Lets see how this actually appears in

    the check code itself click the Edit button. This brings up another screen showing

    the actual check code relevant to this field.

    We can see that the code

    starts with the fieldname, and

    then the code for the range is

    included. The word END

    indicates the end of the checks

    for this field. As well go on to

    see, one field can have several

    different types of checks in the

    code. Click Cancel to close this

    dialog without making any

    further changes.

    Now select the GENDER field. This time well create Value labels to restrict the

    options for entry (and to give a guide to data entry staff). Select the Value label

    dropdown list youll see that there are some predefined value labels, including

    one for sex. However, the predefined sex labels are based on entering a single

  • Creating a Questionnaire using EpiData


    digit number, and our gender field is an uppercase text field. So lets instead create

    our own value label list. Select the [none] option from the dropdown list, and then

    click the + button next to the list.

    The edit checks screen appears, with the following text showing:


    We then enter the legal values and relevant labels as follows:

    LABEL Label_GENDER M Male F Female U Unknown END

    The indenting of the text setting the labels is optional, but makes the code easier to

    read. Click Accept and CloseAccept and CloseAccept and CloseAccept and Close to close the window. The value label list now shows

    label_gender. This label list can be re-used for other fields if desired perhaps not

    so useful in the case of gender, but if for example you wanted to record details of

    portion size in relation to each food consumed, you could define one list of value

    labels (e.g. small, medium, large) and apply that to each portion size field.

    There are a few fields that we would like to be entered for every questionnaire for

    example, if we do not know if the person was ill or meets the case definition, it is

    difficult to draw any conclusions from any other information they have given us. So

    we need to make sure that these fields are set to must enter. Select the ILL field

    and change the Must enter option to Yes. Repeat this process for the CASEDEFMET,

    ATTENDFUN and ATTENDCLUB fields. Initially it can be tempting to set this option for

    most of the fields, but not all fields will be relevant to all interviewees (e.g.

    ONSETDATE is only relevant for those who have been ill) and the blank field option

    (indicating missing or unknown data) can be important in relation to exposures.

    So far weve covered the use of ranges, value labels and must enter checks. Next,

    lets consider the use of jumps. Previously we considered that this might be useful

    in controlling data entry flow after the Were you ill? question. Select the ILL field.

    There are two options for the contents of the field after entry Y or N. (By setting the

    Must enter property to Yes, the empty option is not available). Type the following

    into the Jumps option:


    This sets the flow so that the next field selected after a Y is entered will be


    Now we need to enter the details for the N option add a comma after the text that

    is already there, then type:


  • Creating a Questionnaire using EpiData


    Instead of typing the name of the appropriate field to jump to, you can also select it

    on the screen using the mouse do this now by clicking on the Sausage rolls

    field. EpiData automatically inserts the relevant fieldname (SAUSROLLS) into the

    Jumps option.

    Before we move on, lets look at how

    this code looks in the editor. Click the

    Edit button to bring up the check code

    editor. We can see that the JUMPS

    options are laid out line by line, and

    MUSTENTER follows it, showing how

    more than one check can be included

    in relation to one particular field. Even

    so, all this was created just by using the

    dialog box.

    Now lets look at calculating one field based on the information entered into one or

    more other fields. As an example, well calculate the age of the respondent at the

    time of the event, based on the date of the event (a value we will provide in the

    code). Since this is a calculation that will be run once the Date of Birth details have

    been entered, we actually need to put the relevant code into that field, so select the

    DOB field. We cant create the calculation using the dialog options, so instead click

    Edit to bring up the code editor.

    The first thing to do is tell EpiData that we want the commands to run after data

    entry into the DOB field has finished. We do this using the AFTER ENTRY END code

    block, as follows:


    Now we can enter the actual code to do the calculation, in between AFTER ENTRY

    and the first END command. First, we need to define the date of the event:

    DEFINE dateofevent dateofevent = 14/08/1998

    This defines a temporary variable that holds the date of the event doing things

    this way makes the final formula easier to understand. Now we add the actual

    calculation that assigns the age of the person to the AGE field.

    AGE = trunc(int(dateofevent - DOB)/365.25)

    This might seem a little complicated, but by taking it apart it is easier to understand:

    1. First we take the difference in days between the event date and DOB: dateofevent - DOB

  • Creating a Questionnaire using EpiData


    2. We convert that difference (which EpiData is still treating as a date) into an

    integer, using the int function: int(dateofevent - DOB)

    3. Convert the result in days to number of years, by dividing by 365.25: int(dateofevent - DOB ) / 365.25

    4. Its likely that the result of this calculation isnt going to be a round number, so

    we use the trunc function to round the result down to the persons age in

    years: trunc(int(dateofevent - DOB) / 365.25)

    5. Finally we assign the result of the calculation to the AGE data field: AGE = trunc(int(dateofevent - DOB) / 365.25)

    One other thing to do since we are calculating the AGE field, we dont need the

    data entry form to actually include that field, so we can skip it and go straight to the

    Gender field. We can use the Jumps section of the check dialog to do this, so

    Accept and Close the code edits that you have made for the calculation and return

    to the check dialog. Because we want to jump straight to the Gender field

    regardless of the value entered in the DOB field, we use the AUTOJUMP term, as



    Thats all the changes we need to make, so click Save and then Close on the check


    Hopefully this makes some sense, and you can follow how check code can be used

    to calculate data for a particular field. If it doesnt, or seems too complicated, dont

    worry too much. Knowing how to use the finer points of calculations and check code

    is not essential to using EpiData for outbreak investigation but it does open up

    some of the power of the program in controlling data entry and consistency, and

    saving time.

    On the other hand, if this has piqued your interest in using check code for running

    calculations and controlling data entry, much more information on how to do this

    can be found in the EpiData help files. EpiData follows largely the same check code

    rules as Epi Info 6 (the DOS version of Epi Info) so if you have access to old check

    code programs used in Epi Info 6, they may still work in EpiData (perhaps with some

    minor tweaks).

    For now, weve done enough to create a questionnaire to investigate this outbreak,

    with some basic checks and calculations in place to help data entry. In the next

    section, well look briefly at how we actually enter data into our EpiData data file.

  • Entering Data using EpiData


    Entering Data using EpiData

    From the main EpiData screen, select the Enter DataEnter DataEnter DataEnter Data button (close any open forms

    first if necessary). Youll be asked to select a data file choose the data (.REC) file

    that you created earlier.

    The data entry form that you are probably familiar with by now should appear. This

    time you can enter data for real! Note also that the status bar at the bottom of the

    screen has some additional buttons for navigating around records in the file.

    The table below includes fictional data for three sample records have a go at

    entering them into the form. Its good to get into the habit of using the EnterEnterEnterEnter and/or

    TabTabTabTab keys to move between the fields, rather than clicking with the mouse because

    check code that has been set to run before or after entry of data into a particular

    field will not be run if the mouse is used.

    As you go through the data entry, note how the check code we included earlier is

    working inserting calculated results into the AGE field, jumping fields according to

    the data that has been entered, requiring data to be entered into a particular field.

    At the end of each record, youll be asked if you want to save the record to disk

    click Yes.

    Field Record 1 Record 2 Record 3

    Interview date 20/08/1998 21/08/1998 24/08/1998

    Interviewer AGT DJG KJB

    Surname Jones Dickens Jenkins

    Forename Stephen Charles Hannah

    Date of birth 24/10/1943 12/12/1922 13/10/1992

    Gender M M F

    Occupation Teacher Retired Schoolchild

    House number 24 745

    House name Ty Gwyn

    Street name Gelligaer Street Mill Lane Newport Road

    District Cathays Lisvane Rumney

    Town Cardiff Cardiff Cardiff

    Postcode CF24 4LA

    Home tel. 029 2067 8765 029 2045 3234

    Other tel. 07796 423659

    Did you attend the


    Yes Yes Yes

    Did you attend the

    members club?

    No No No

    Were you ill? Yes No Yes

    Onset date 15/08/1998 16/08/1998

    Onset time 13.00 01.00

  • Entering Data using EpiData


    Duration (days) 3 4

    Inc. period (hrs) 22 34

    Diarrhoea Y Y

    Vomiting Y N

    Abdominal pain Y Y

    Nausea Y N

    Pyrexia N Y

    Headache N Y

    Other aches N N

    Other symptoms Y N

    Other symptoms

    description Fainted

    Case definition met? Y N Y

    Sausage rolls Y N Y

    Salmon sandwiches N Y N

    Pickled onions N Y Y

    Corned beef sandwiches N N N

    Chicken nuggets Y N Y

    Chicken rolls N Y Y

    Chicken sandwiches N Y N

    Ham sandwiches Y Y N

    Ham rolls N N N

    Egg rolls Y N Y

    Egg sandwiches Y N Y

    Pasties Y Y N

    Crisps N N Y

    Gateaux N Y Y

    Cheese & biscuits Y N N

    Comments Interview with mother

    Basic data entry is as simple as that really.

    The only other thing well dwell on here is

    navigating between existing records. You can use the additional buttons at the

    bottom of the status bar, which show the current record, total number of records, as

    well as having buttons for creating a new record and deleting the current record.

    When you delete a record, the record is just marked as deleted with the word DEL

    in the status bar (and therefore isnt included in any future analysis). However you

    can undelete the record by simply clicking the delete button again (or using the

    option in the Goto menu).

    Now that weve seen how questionnaires can be designed, and data entered using

    EpiData, lets look at how we analyse the data using the data analysis tools in Epi

    Info for Windows.

  • Outbreak Investigation using Epi Info Analysis


    Outbreak Investigation using Epi Info Analysis

    For this section of the course, well move to using Epi Info for Windows, and

    specifically the data analysis elements of the software.1

    Epi Info for Windows is based around the idea on working on projects, which are

    actually based around the Microsoft Access file format. Epi Info for Windows

    provides a full package for designing questionnaires, entering data and carrying out

    analysis, but because EpiDatas questionnaire design tools are quicker and easier

    to use, we made use of that software instead. So we need to import our EpiData

    data file (which is stored in REC format, the same format used by Epi Info 6) into a

    format that is usable in Epi Info for Windows.

    The first thing we need to do is create an Epi Info project that we can import the

    data into. We actually do this by starting the process of designing a new

    questionnaire. Using the shortcut on the desktop, or via the Start menu, open the

    main Epi Info for Windows menu screen. Click the Make View button to open the

    Make/Edit View program. The program starts with a blank screen, so create a new

    project by selecting File > NewFile > NewFile > NewFile > New. The Create or Open Project dialog appears:

    Browse to the directory containing your working files and type in an appropriate file

    name, then click Open (which will create the project file). Youll then be asked to

    name the new View that you are creating (view is the term Epi Info uses for a

    questionnaire design & data table) as we dont want to create a new

    1 Elements of this section of the training have been taken from Introduction to Epi Info for Windows by

    Andrew G Dean, available at:

  • Outbreak Investigation using Epi Info Analysis


    questionnaire, just click Cancel. Then select File > ExitFile > ExitFile > ExitFile > Exit to return to the main Epi Info


    Now we can import our data into the Epi Info Analysis program. To run the Analysis

    element, click the ANALYZE DATA button on the main menu screen. The Analysis

    program will then open.

    Lets take a quick tour of what the program shows. All the main analysis commands

    are shown in the tree view on the left. Clicking on a command will bring up a dialog

    that places the command in appropriate form in the program editor at the bottom of

    the screen. Results appear as web pages in the Output window, a simplified version

    of the Microsoft Internet Explorer browser.

    READing data into Analysis

    For this section of the training well use some real (anonymous) outbreak data that

    broadly matches the scenario weve been working with. Before we do that, though,

    well import the data we entered into the questionnaire we created earlier, just to

    confirm that it all worked! The first thing we need to do is READ that data into

    Analysis. The data is in the REC file format used by Epi Info v6 and EpiData.

    Click on the READ command. A dialog box appears so that you can choose a

    database and a view. Click the button called CHANGE PROJECT and then use the

  • Outbreak Investigation using Epi Info Analysis


    dialog that pops up to find the project file you created a moment ago. Once youve

    found the file, select it and click the Open button to return to the main READ dialog.

    Since we created a brand new project, no views or data tables appear in the list.

    Now we need to import our EpiData REC file. On the Data Formats dropdown list,

    select Epi6 (which is the same format as EpiData uses). Then, click the button with

    three dots next to the Data Source text box (which will now be empty). Select the

    REC file we created earlier from the dialog, and select Open. Youll now return to the

    READ dialog, which should show something like the screenshot below:

    Check that youve selected the right file, then click OK. Epi Info imports the data

    table into Analysis. Youll notice that some text will appear in the Analysis Output

    window detailing the current view (Epi Infos term for a data table), number of

    records and the current date and time. Youll also notice that a command appears

    in the Program Editor every time you carry out a task/function the relevant line of

    code will appear in the editor.

    The data has now been imported into the Epi Info project so if you come back a t

    afuture time to carry out more analysis, you can select the relevant view from

    within the project, rather than having to import the data again. Of course if the data

    has been updated or amended then youll want to import it again to work with the

    most recent information.

    Well carry out a couple of basic tasks with this data before we import some more

    meaningful records.

  • Outbreak Investigation using Epi Info Analysis


    LISTing basic case details

    A common task in outbreak investigation is producing a simple case listing,

    including for example name, gender, date of birth, case status, onset date, and so


    Click on LIST in the command tree. A dialog box will then appear. Initially, lets go

    with the default settings and produce a grid showing all the data, so just click OK. A

    grid then appears over the top of the output window, with scrollbars etc, allowing

    you to scroll through all the data currently selected. This is a bit overwhelming so we

    need to change our parameters a little to limit the information that appears.

    Click the LIST command again, but this time use the drop-down list at the top of the

    dialog to select the following variables:


    Note that the items appear in

    the dialog in alphabetical order

    as you select them however

    when the list is displayed the

    items will appear from left to

    right in the order you actually

    select them. We can also

    choose to have the listing

    formatted in the main output

    window, rather than the

    temporary grid we got the first

    time. To do this, select the Web (HTML) option from the Display Mode list on the

    right hand side of the dialog. When youve selected this, some extra options will

    appear the one that is most useful is Fields Per Page. Putting a zero into Fields

    Per Page ensures that all the data for one record appears on the same line

    (otherwise the table can get split after 6 columns, which can be difficult to read).

    Click OK to display the list. Epi Info displays Missing to represent blank fields (such

    as ONSETDATE for a person who did not suffer illness).

    OK, so weve seen that we can import data created in EpiData for analysis in Epi

    Info. Lets now import some real outbreak data for some more meaningful analysis.

    Click the Read command again, and repeat the process we followed before (except

    that you shouldnt need to Change Project this time). Select the OutbreakData.rec

    file. If all has gone well, the output window should show a record count of 75


  • Outbreak Investigation using Epi Info Analysis


    For the rest of this section well work with this data to carry out analysis that fits in

    with our outbreak scenario.

    Producing FREQuencies for various items of data

    Generally it is useful to know what the distribution of our study population is for

    example by age, location, gender, occupation, illness etc. These sort of descriptive

    statistics are often the starting point for analysis of a set of data. Epi Infos

    Frequencies command allows us to do this. Click the Frequencies command in the

    command tree to bring up the FREQ dialog. In the dialog box, use the Frequency of

    dropdown list to select the following variables:


    PERIOD (meaning

    incubation period)

    Then click OK. After a

    short wait, this produces

    a frequency breakdown

    for each of these data

    variables in the Output

    window. Scroll up and

    down and note that

    each table is accompanied by yellow bars to the right that indicate the frequencies.

    Epi Info also estimates 95% confidence intervals for each row in the frequency

    table, although for most variables these are likely to be of limited use (particularly in

    outbreak investigation). If you want to turn off these statistics, click the Settings

    button in the FREQ dialog, and ensure the Statistics option is set to None.

    For numeric (i.e. measured on a continuous scale) variables like PERIOD, we might

    also wish to identify some other statistics, such as mean and median values. Epi

    Infos Means command can provide this data. Click the Means command in the

    command tree to bring up the dialog select PERIOD from the Means of dropdown

    list, and click OK. This produces a similar table to the one we got with the

    Frequencies command, but at the bottom additional statistics are provided

    including number of observations, mean, median, mode, standard deviation, and


    (As an aside, putting two or more numeric variables into the MEANS dialog means

    that Epi Info also runs a test to check for the statistical significance between the

    variables. This isnt usually of use in an outbreak investigation, where we are more

    interested in the Relative Risk and/or Odds Ratio for various exposures, but can be

    of value in other types of epidemiological investigation, such as evaluation of a trial

    or intervention, comparing a particular measurement before and after the


  • Outbreak Investigation using Epi Info Analysis


    SELECTing particular groups of records

    Sometimes you dont want to run a particular analysis on the entire dataset

    perhaps you only want to know about those people who were ill. In Epi Info Analysis,

    the SELECT statement limits subsequent analysis to particular records based on

    criteria that you specify. Click the Select command to bring up the SELECT dialog.

    The Available Variables dropdown list includes all the available variables in the

    dataset. Well select only those cases that meet the case definition. Select the

    CASEDEF variable, then click the = and Yes buttons to put together the full

    statement note that Epi Info uses the notation (+) to denote Yes (it also uses (-)

    for No and (.) for Missing (blank) data. Alternatively, you can just type the

    statement directly into the dialog. Its possible to construct more complex

    conditional statements e.g. requiring more than one condition to be met using

    AND, or one of a series of conditions to be met using OR.

    Click OK when you are finished, and youll be returned to the main Analysis screen.

    The Output window has updated to show the current statement(s) that are applying,

    and the number of records now selected (note that this has dropped from 75 to 56).

    Run a FREQ CASEDEF command to confirm that only cases have been selected.

  • Outbreak Investigation using Epi Info Analysis


    If we now run another SELECT command, that will be processed only on the 56

    records that we are currently working with (this provides another way for combining

    several statements to select a particular subgroup of the data). Lets try this by

    clicking the SELECT command again. This time, well select all those aged 50 or

    over. Select AGE from the dropdown list of variables, then either type or use the

    buttons to produce the statement AGE>=50. Click OK, and see how the Output

    window has updated to reflect the second SELECT statement. Run a case LISTing of

    AGE and CASEDEF to see the selected records.

    To get back to the base dataset of all records, simply click the CANCEL SELECT

    command, and click OK in the resulting dialog.

    Youll notice that there are also options in the same section of the command tree

    for SORTing data. The dialog is self-explanatory so we wont dwell on it here like

    the SELECT command, the SORT on the data will apply to all future analysis until a

    CANCEL SORT command is issued. Before moving on, cancel any current SELECT or

    SORT commands so that you are working with the base dataset again.

    Recoding data

    We saw earlier that, as you might expect, the individuals in our dataset are spread

    across a wide range of ages. This makes it difficult to get an accurate, easily

    understood picture from Frequencies or Means results about how the population is

    distributed. We can overcome this problem by recoding the age data into a series of

    age groups.

    To do this, we first need to define a new variable that will hold the age group

    information for each case. Click the Define command to bring up the DEFINE dialog.

    Defining a Standard variable effectively creates an additional field in our dataset,

    into which we can then insert data. (Global and Permanent variables are more

    commonly used in more complex Epi Info Analysis programs if you want to know

    more about them, read the information in the Epi Info help file). Define the Variable

    Name as AGEGROUP, and click OK.

    Now that weve defined the AGEGROUP variable, we can recode the age data into that

    variable. Click the Recode command to show the RECODE dialog.

  • Outbreak Investigation using Epi Info Analysis


    We want to recode from the base data in the AGE field, so select that in the From

    dropdown list. Our target field for the recoded data is AGEGROUP, so put that into the

    To list.

    The dialog provides two ways to

    recode data: entering the options by

    hand into the grid shown, or by

    clicking the Fill Ranges button,

    which is what well do in this case.

    Youll see that the dialog changes to

    a simpler format, asking for values

    for Start, End and By. Start and End

    are self-explanatory the start and

    finish points for the recoding. The By

    values determines the size or

    interval for each group. Enter the

    range as follows: Start=10, End=90,

    By=10, then click OK. This returns

    you to the original dialog, which has

    now had the grid filled with recoding

    values based on our selections. Notice that the terms LOVALUE and HIVALUE are

    used to denote the minimum and maximum values and these rows in the grid are

    automatically created for us (which is why we set 10 as the start point, not zero).

    Click OK to process the recode.

    Now run a FREQuencies command on the AGEGROUP variable. Select the CASEDEF

    variable in the Stratify by dropdown list to produce two separate tables: one for

    cases and the other for non-cases. Using the recoded variable gives a much more

    concise description of the age distribution in the population.

    One final note: Epi Info may give an error if you try and recode with more than 10

    separate recoding options, so its best to limit the range of rows in the recoding

    grid to a maximum of 10 options.

    Using the Program editor

    Because the AGEGROUP variable we defined is only a standard variable, the data

    will be lost when we exit Epi Info Analysis. Rather than having to go through the

    whole process again (not too difficult in this case, but potentially irritating if the

    recoding groups have been more carefully hand-crafted), it would be useful to have

    a way of saving the commands that we produced so that we can issue them again

    in future. Fortunately, Epi Info Analysis allows us to do this, using the Program


    Youve probably already noticed that every time you run a command through a

    dialog, text is added to the Program Editor window. Lets look a bit more closely at it.

    Scroll through the text and youll see a range of commands, starting with the

    READing of the original data, the case LISTing, FREQuencies and MEANS operations,

  • Outbreak Investigation using Epi Info Analysis


    SELECTing and SORTing of records. At the bottom of the text will be the most recent

    commands, including DEFINE AGEGROUP and the RECODE commands.

    The Program Editor allows us to save this output into the project as an Epi Info

    Program. We could save this entire output as a program in Epi Info, and then

    running it again at another date would process all the commands listed one after

    another. Thats probably overkill, though, and would take a significant amount of

    processing time to come up with the results. More usefully, we could save the

    commands relevant to the recoding process. Go through the text in the editor and

    delete everything that appears before the DEFINE AGEGROUP statement. Then delete

    any text after the END statement at the end of the RECODE code block. The Program

    Editor should now contain text that looks something like this:

    DEFINE AGEGROUP RECODE AGE TO AGEGROUP LOVALUE - 10 = "10 - 20" 20 - 30 = ">20 - 30" 30 - 40 = ">30 - 40" 40 - 50 = ">40 - 50" 50 - 60 = ">50 - 60" 60 - 70 = ">60 - 70" 70 - 80 = ">70 - 80" 80 - 90 = ">80 - 90" 90 - HIVALUE = ">90" END

    Click the Save button in the Program Editor the Save Program dialog will appear:

    The Project File field

    contains the details of the

    current project, and can be

    left as it is. Type a name for

    your program into the

    Program field, put your

    name in the Author field

    and perhaps include a brief

    comment as well. The Date

    fields are automatically

    filled by Epi Info and are not

    editable. When youre

    happy, click OK to save the

    program. (Note that the

    Text File option also exists

    to save the program to a

    separate text file useful if you want to maintain a repository of useful programs to

    import into any Epi Info project.)

    Now lets see how this has worked. Exit the Analysis program (theres a button at

    the top left) and return to the main Epi Info menu. Click the Analyze Data button to

    reopen a new instance of Analysis. Youll need to READ in the outbreak data, so do

    that (changing project if necessary), selecting the viewOutbreakData option.

  • Outbreak Investigation using Epi Info Analysis


    Once the data has been read into Analysis, click the Open button on the Program

    Editor. The Read Program dialog will appear this looks very similar to the Save

    Program dialog. Click the Program dropdown list and select the program you stored

    earlier the author, date and comment details will then appear. Click OK to read

    the program into the Program Editor. The program hasnt yet been run, so click the

    Run button (note that the Run Command button runs only the command currently

    containing the cursor). Since all the commands in our program dont actually

    produce any output of their own, we need to run FREQ AGEGROUP to see if things

    have worked. You should get the same age group breakdown that we saw a little


    Accessing previous results & controlling file storage

    By now, you might be wondering where all this Analysis output is being stored. Epi

    Info Analysis stores output as HTML files (web pages). By default, these files are

    stored in the same folder as the project file, but there are several options available

    to customise this in the Output section towards the bottom of the command tree.

    Epi Info also provides a handy index of all the output that youve produced in work

    on the project. First, click on CLOSEOUT to close the output file youve just been

    working on, and then click on the hyperlink called RESULTS LIBRARY at the top of

    the output in the browser (you might need to scroll up). An index page appears,

    showing previous commands that have produced output files. Click on any of the

    entries to display it.

    There are a wide range of options for customising storage of data, mostly accessed

    via the Storing Output command. The most useful of these is the ability to set the

    Results Folder where output files are stored perhaps a new subfolder inside your

    main project directory. Other settings for archiving data are also available, but get

    more involved refer to the Epi Info help file for more details.

    Despite all this, there might come a time when you want to define a specific file in

    which to store a particular set of output. In the next section, well be producing

    some graphs, so now well create a file specifically to store the graph output in. This

    is does using the Routeout command, which pops up a simple dialog asking for an

    output filename.

    Click the button to bring

    up the file browser dialog

    enter a file name (like

    Graphs) and click the Open

    button, then click OK in the

    main ROUTEOUT dialog.

    Any future output will be directed to this file, until a CLOSEOUT command is issued

    (when Epi Info will start issuing output files in the default location again.)

  • Outbreak Investigation using Epi Info Analysis


    Producing simple GRAPHs

    A common task in outbreak investigation is to plot an epidemic curve showing the

    order and frequency of onset date. Epi Info Analysis has a Graph command that

    helps us do this.

    Click this command to bring up the GRAPH dialog.

    First of all, lets run a simple bar chart based on ONSETDATE. At the top left of the

    dialog is the Graph Type dropdown list. The default setting is BAR, so leave that

    alone for now. Below that, select ONSETDATE as the Main Variable for the x-axis. For

    the y-axis, we want the Count of the main variable, so this is OK as it is. Click OK.

    This displays the graph in a separate Epi Graph window, and we can see that the

    onset date information is displayed in the graph. This window allows further

    customisation of the graph if desired for now well just accept what were given

    and return to Epi Info by selecting File > Save & ExitFile > Save & ExitFile > Save & ExitFile > Save & Exit. The graph is then displayed in

    the output window, and is also saved as a separate JPEG image file in the results

    output directory.

  • Outbreak Investigation using Epi Info Analysis


    Unfortunately, this graph isnt terribly helpful because the incubation period is

    relatively short, separating the cases in hours would be more useful. Fortunately,

    the outbreak investigators in this case included a field for incubation period in their

    data collection, measured in hours. Previously we ran a MEANS command on the

    PERIOD variable, so we know that there is a wide range of values from 5 to 46 hours

    and around 30 different values in each case. Producing a Bar chart in this

    instance may not be terribly helpful, since well get the same wide range of values

    because a bar chart will produce a bar for each individual value represented in the

    data set. But because were using a numerical (continuous) field, we can run a

    Histogram graph, which gives us more control over the output. Click the Graph

    command again, and this time select HISTOGRAM from the Graph Type dropdown


    Put PERIOD into the x-axis Main Variable list. Notice also the Interval boxes below.

    This is where you can set the intervals that the Histogram bars will be grouped in

    select an appropriate value like 6 hours. The 1st Value box can also be set the

    default is Auto which will select an appropriate value. However, you can adjust this

    yourself if you want to force the x-axis to start at 0 or another fixed point which

    makes sense in this case, so set the value to zero. Accept the other settings and

    click OK.

    The way that the graph appears suggests a point source outbreak, but there is also

    a second, smaller peak at around 36 hours. How could this be explained?

    One possible explanation could be that these are secondary cases, who did not

    attend the buffet, but consumed the food in the club bar later in the evening, or

    contracted the illness through person-to-person contact with cases. We can run the

  • Outbreak Investigation using Epi Info Analysis


    graph again, and this time separate the data into two series, based on whether or

    not the case attended the funeral itself. Open the GRAPH dialog again, select

    HISTOGRAM as the graph type, PERIOD as the main variable, and an interval of 6

    hours, with a 1st value of zero. We still want a count of cases on the y-axis, but we

    also want to display bars for the series so select the variable ATTEND in the Bar

    for each value of dropdown list. It is possible to set the title, but we can do this

    at the customisation stage, so well look at that in a moment.

    Click OK to produce the graph in Epi Graph. Now we can see that there are two

    separate sets of bars for those who did and did not attend the funeral.

    Nevertheless, by customising the graph we can make it easier to see exactly whats

    going on. In Epi Graph, select

    View > Customization View > Customization View > Customization View > Customization

    The Customization dialog allows

    us to modify many of the factors

    relating to the design and

    presentation of the graph. On the

    General tab, we can add or

    modify the Title and Subtitle, set

    font sizes, whether the graph

    should appear in colour or

    monochrome, apply gridlines, and

    determine whether a data table is

    produced to go with the graph.

    Add a title and subtitle for the

    graph if you wish you can leave most of the other settings as they are unless you

    particularly want to add grid lines for the y-axis to make it easier to read across the


    Now select the Plot tab in the

    dialog. The Plot Style list in the

    centre of this tab gives a range of

    options for displaying the graph

    series. Currently the separate

    bars mean that it is hard to pick

    out the actual epidemic curves.

    We can change the display to

    make it easier to see what is

    happening select Area from the

    Plot Style list (which displays the

    series as filled areas).

    To see what difference a change

    makes, you can click the Apply button, which applies the customisations to the

    graph without closing the dialog. It might also help to make the series show in 3D,

    so select that option from the bottom left. The other tabs can be left as they are

  • Outbreak Investigation using Epi Info Analysis


    unless you would like to change the Font, Color or Styles used for the display

    these tabs are self-explanatory.

    Access to many of these settings can also be obtained by right-clicking on the graph

    and selecting from the pop-up menu options that appear.

    Once youre happy with the settings, click OK in the dialog to have a proper look at

    your graph. Its now easier to see whats been happening. When youre finished you

    can select File > Save & ExitFile > Save & ExitFile > Save & ExitFile > Save & Exit to return to the Analysis window. (You can also select

    File > ExportFile > ExportFile > ExportFile > Export to export the graph as an image to the clipboard, direct to a printer, or

    a file location of your choice).

    An example of the type of results that are achievable using Epi Graph is shown

    below (you would probably show the graph in a larger format for a full outbreak





    0 6 12 18 24 30 36 42 48

    Epidemic CurveIncubation period since buffet (hours)




    Weve finished working with the graphs now, so click the Closeout command to stop

    routing output to the Graphs file we created earlier.

    Producing 2x2 tables to assess relevance of exposures

    The chief purpose of conducting an epidemiological study in outbreak investigation

    is to try to identify statistically significant associations between certain exposures

    and illness. This is normally done through the construction of 2x2 tables,

    comparing the rates of illness in those exposed to a particular risk factor to the rate

    in those who were not exposed.

    Epi Infos Tables command allows us to produce these tables, together with

    relevant statistics. Lets investigate the relevance of exposures in our outbreak

    example. Well start by looking at consumption of ham sandwiches. Click the Tables

    command to bring up the TABLES dialog.

  • Outbreak Investigation using Epi Info Analysis


    Select the variable HAM in the Exposure Variable list. Our outcome variable for all

    this analysis will be CASEDEF, representing whether or not an individual meets the

    case definition (the alternative would be ILL, but there may be some people who

    have reported illness but do not meet the case definition). Accept all the other

    default settings, and click OK.

    The 2x2 table will be displayed in the output window, followed by a range of

    different statistics. Presentation of the 2x2 table is fairly obvious, but lets spend a

    little time considering the various statistics. The statistics produced from the above

    analysis are shown below.

    Single Table AnalysisSingle Table AnalysisSingle Table AnalysisSingle Table Analysis Point 95% Confidence Interval Estimate Lower Upper PARAMETERS: Odds-based Odds Ratio (cross product) 0.6771 0.2368 1.9365 (T) Odds Ratio (MLE) 0.6806 0.2286 1.9691 (M) 0.2038 2.1855 (F) PARAMETERS: Risk-based Risk Ratio (RR) 0.9065 0.6960 1.1807 (T) Risk Difference (RD%) -7.3257 -26.9187 12.2672 (T) (T=Taylor series; C=Cornfield; M=Mid-P; F=Fisher Exact) STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square uncorrected 0.5319 0.4658206779 Chi square - Mantel-Haenszel 0.5248 0.4688093873 Chi square - corrected (Yates) 0.2151 0.6428085146 Mid-p exact 0.2409736241 Fisher exact 0.3219542501

    In a case-control study, we would be interested in the odds ratio (cross product),

    and risk-based measures should not be used. In a cohort study, we are able to use

    risk-based parameters as the total population exposed is a known quantity so the

    Risk Ratio (also known as the Relative Risk) is the most useful basic measure of

    risk. In this case the RR is 0.9065, which suggests very little difference in outcome

    based on this exposure (strict interpretation: those eating ham sandwiches were

  • Outbreak Investigation using Epi Info Analysis


    0.9065 times as likely to be cases as those who did not). Since the 95% Confidence

    Interval includes the no difference value of 1, we know that this difference is not

    statistically significant at the 95% confidence level.

    The statistics provided also include chi-square test results in the form of p-values,

    achieved by a number of different statistical procedures. In general, where a

    reasonably large dataset/sample size has been used, there is unlikely to be an

    important difference between these procedures. However, where the differences

    between the procedures are important (e.g. one identifies a statistically significant

    result, but another does not), you should seek the advice of an epidemiologist or

    statistician to assist with interpretation of the results.

    This analysis indicates that consumption of ham sandwiches was not associated

    with illness in this outbreak. However, other foodstuffs might be implicated in the

    outbreak, and assessments of the strength of association between consumption

    and illness for each of the menu items should be completed.

    Appendix IV Appendix IV Appendix IV Appendix IV containscontainscontainscontains a a a a worksheet containing an empty table for you to record the worksheet containing an empty table for you to record the worksheet containing an empty table for you to record the worksheet containing an empty table for you to record the

    results of this analysis.results of this analysis.results of this analysis.results of this analysis. A completed version of the worksheet is included on the A completed version of the worksheet is included on the A completed version of the worksheet is included on the A completed version of the worksheet is included on the

    course CDcourse CDcourse CDcourse CD----ROM.ROM.ROM.ROM.

  • Using Analysis with routine COSURV data


    Using Analysis with routine COSURV data

    So far weve concentrated on the outbreak scenario, but there are also times when

    we want to carry out on routinely collected surveillance data, particularly the

    information held on Cosurv. Cosurv has the ability to export data into the REC

    format readable by Epi Info for Windows. This short section looks at how we export

    that data out of Cosurv, and some common analysis tasks that we might wish to


    NB This process will overwrite any existing/previous exports that you have done (i.e. NB This process will overwrite any existing/previous exports that you have done (i.e. NB This process will overwrite any existing/previous exports that you have done (i.e. NB This process will overwrite any existing/previous exports that you have done (i.e.

    the file EpiXport.REC the file EpiXport.REC the file EpiXport.REC the file EpiXport.REC read on for details) so please make sure that you ha read on for details) so please make sure that you ha read on for details) so please make sure that you ha read on for details) so please make sure that you have taken ve taken ve taken ve taken

    a copy of any previous exports if the data is still required.a copy of any previous exports if the data is still required.a copy of any previous exports if the data is still required.a copy of any previous exports if the data is still required.

    Exporting data from Cosurv

    Open Cosurv, and enter your username and password. From the Cosurv main

    screen, select Export > EpiInfo & ASCIIExport > EpiInfo & ASCIIExport > EpiInfo & ASCIIExport > EpiInfo & ASCII. The Export dialog will then appear:

    Select the dates for the relevant period you want to export to say, all of 2006

    then click the Get Records button. The list of records to be exported will appear in a

    spreadsheet format in the main part of the dialog. You can use the scrollbars to

    move around the list and explore the data that will be exported. One word of

    warning: if a specific record has been selected in that dialog, and then the Export

    button is clicked, only that record will be exported (but potentially hundreds of

    times, depending on the size of the dataset). So if you have unwittingly clicked on a

    specific record, try clicking off the dialog somewhere to deselect it. This may be a

    bug in Cosurv, or it may be intended behaviour either way, it probably isnt what

    you want to happen.

  • Using Analysis with routine COSURV data


    If youre happy with the list thats shown, click the Export button. Cosurv will process

    the export you may see a DOS program window pop up briefly, as Cosurv actually

    uses one of the old Epi Info 6 DOS utilities to carry out the export. Once the export

    has been processed, youll be returned to the main Cosurv screen.

    The exported data has been placed into an Epi Info REC file named EpiXport.REC

    (NB the data has also been exported in ASCII delimited text format in a file named

    csExport.txt a format suitable for import into spreadsheet software such as Excel).

    But where is it? The location of this file depends on your current Cosurv settings.

    You can find these out from the main Cosurv screen by selecting System > Setup > System > Setup > System > Setup > System > Setup >

    DistrictDistrictDistrictDistrict, which brings up the District settings dialog.

    The final box on the General tab shows the location for Epi Info export files. When

    youve done an export, its strongly recommended that you move or rename this file

    appropriately to stop it being overwritten by future exports.

    The data that youre provided with for this training has been anonymised, with key

    patient-identifying data (names, address, telephone numbers, employer details, id

    numbers etc removed). Obviously when working with live data these will be

    available, and this is something to bear in mind since it is important that data

    protection and confidentiality guidelines are followed. The Cosurv data system itself

    is encrypted, i.e. data can only be accessed through the password-protected Cosurv

    database itself anyone trying to access the datafiles via other methods will just

    read encrypted gobbledegook. However, data exported in Epi Info (or ASCII text)

    format is not encrypted, and can be read by anyone with a text editor or

    spreadsheet. Therefore, you need to keep careful control over how exported data is

    stored and manipulated.

  • Using Analysis with routine COSURV data


    Importing data into Epi Info

    Since the data is now in the REC file format, importing the data into an Epi Info

    project follows the same process as discussed earlier in relation to the outbreak

    data. In reality, you might want to create a separate new Epi Info project for this

    routine data, but for the purposes of this training, well use the same project that

    weve been working with so far. So use the READ command in Epi Info Analysis,

    select Epi6 from the Data Formats dropdown list, and then browse to select the

    Cosurv2005.REC data file. Click OK to import the data into Epi Info Analysis.

    Common analysis tasks for routine surveillance data

    Lets briefly look at some of the common analysis tasks that we might want to

    complete with the Cosurv data. Of course, Cosurv itself has its own reporting

    mechanisms through the District Reports section, but these apply only to certain

    predetermined formats using Epi Info allows us full control over the data and its

    presentation. The data supplied is the anonymised surveillance reports held by

    Cardiff Council for the calendar year 2005.

    First of all, lets generate some basic frequencies. Click the Frequencies command

    and include the following fields in the dialog:

    SEX Gender of the case

    LOCALITY Locality (in Cardiff, we use electoral ward)

    DISEASE Disease (as used in NOIDS return)

    TRANSMISS Suspected mode of transmission

    CONTRACTED Suspected source home, commercial, unknown

    The usual list of frequency tables will appear. You may or may not be surprised by

    the results!

    It might also be interesting to study the distribution of cases by age in the

    population. Age data is stored in two ways in the Cosurv data system the DOB

    field (date of birth), and the combination of the fields AGEY (age in years) and AGEM

    (months passed of current year) so a case aged 18 months at the time of the data

    entry would have AGEY=1, AGEM=6. We saw before that age has such a wide range

    of values that it is often easier to recode age information into age groups for


    Let us reuse the program we created earlier in the outbreak analysis. Load the

    program into the Program Editor using the Open button. We need to make one

    minor modification the variable we use for age is AGEY, whereas in the program it

    refers to AGE make any necessary modifications and then run the program. Now

    run FREQ AGEGROUP to see the distribution of cases by age group (equally, we

    could use this variable to construct a bar graph, or even a population pyramid by

    stratifying by gender).

  • Using Analysis with routine COSURV data


    Well now focus a bit more on the data that specifically relates to cases of Food

    Poisoning. To do this we need to select only those cases where DISEASE=Food

    Poisoning. Click the SELECT command and pick the DISEASE variable from the

    dropdown list, then type =Food Poisoning into the dialog after the variable

    name. (Alternatively, you can use the DISCODE variable and type =22, which achieves

    the same thing and is quicker and easier to finish but of course you have to know

    that the DISCODE for Food Poisoning is 22).

    Now lets look at the various types of organism that have been associated with

    cases of food poisoning so run a Frequencies command for the ORGANISM field.

    Youll note that organisms such as Campylobacter are basically caught under one

    field, but because most isolates of Salmonella are sent to HPA Colindale for further

    typing, the full breakdown of species is available for these cases.

    As a final exercise, lets investigate seasonal variation in notification of food

    poisoning cases. Select the Graph command, and choose the LINE graph type.

    Select the WEEK field as the main variable for the x-axis (WEEK is the NOIDS week

    during which the case was notified), and leave the other default settings as they

    are, since what we are looking for is the count of cases in each week. The graph will

    be produced (and as before, you can continue to tinker with the display settings,

    titles etc). The line is quite jagged, because there is inevitably some variation from

    week to week anyway if you wanted to flatten this out a bit, you could produce a

    similar graph in HISTOGRAM type, with an INTERVAL of either 2 or 4 weeks to

    aggregate the data a bit closer (fortnightly figures are quite useful).

    What Wha