An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An...

25

Click here to load reader

Transcript of An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An...

Page 1: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

An introduction to Epi-Info

Gavin Shaddick

Minsk, February 2000

.

Page 1 of 19Minsk, February 2000

Page 2: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

Table of ContentsTable of Contents............................................................................................................................2A brief introduction........................................................................................................................3Starting Epi-Info.............................................................................................................................3Introduction to the Analysis program...........................................................................................3

Browsing the data.........................................................................................................................4First look at the data.....................................................................................................................4One-way frequency tables............................................................................................................5Entering Commands and Variable names using Menus...............................................................6Displaying Categorical Variables.................................................................................................6Statistics on Continuous Variables...............................................................................................6Displaying Continuous Variables.................................................................................................7Saving variables and leaving Epi-Info.........................................................................................9

Analysis of Quantitative variables.................................................................................................9Introduction..................................................................................................................................9Using the Means Command.........................................................................................................9Linear Regression.......................................................................................................................11

Analysis of Categorical Data........................................................................................................12Introduction................................................................................................................................12Using the Tables command........................................................................................................12

Using the Statcalc program for stratified analysis.....................................................................13Introduction................................................................................................................................13Linear trend in proportions.........................................................................................................15

Reading a data file into Epi-info..................................................................................................17Creating a questionnaire (qes) File...........................................................................................17Variable types.............................................................................................................................18Creating a .rec file...................................................................................................................18Importing the data......................................................................................................................18

Page 2 of 19Minsk, February 2000

Page 3: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

A brief introductionEpi-info is a multi-purpose computer package designed for use by epidemiological researchers. It contains smaller programs for use with Survey Design (Epiaid), Questionnaire Design and Report Writing (Eped), Data Entry (Enter), Data Checking (Check), Data Analysis (Analysis), Simple Statistics (Statcalc), Importing and Exporting files (Import, Export). There is also a separate package for mapping (EpiMap).

The package is made available by WHO and CDC as public domain software and can be downloaded (free of charge) from http://www.cdc.gov/epo/epi/epiinfo.htm .

Starting Epi-Info Epi-info is a DOS based program, using pull down menus, although a mouse can be used. The cursor can be used to move up and down the menus (using the up arrow and the down arrow) to see the descriptions of the programs. Note the on a colour display an alternative way of moving up and down is to press the highlighted letter for the program you require.

Here we concentrate on the Analysis program

Introduction to the Analysis program

Position the cursor bar on Analysis and read the description on the right hand side. Press ENTER to select Analysis. The screen goes blank for a few seconds and then the Analysis screen appears. The screen is split into two – the upper window is headed Output and the lower, smaller, window is headed Commands. The cursor is on the Command window against the EPI> prompt. At the top of the screen are two lines giving the status information:

Dataset: <None> Free memory: 262KUse READ to choose a dataset

This indicates that we have not yet specified the name of the dataset to be analysed, and hints how to do it. It also states the amount of free memory.

In order to load a dataset for use, we use the read command, for example if the file is called itpexamp we type

read itpexamp

The full name of an Epi-info file will end with .rec , so the actual name of the file will be itpexamp.rec, but Epi-info allows it to be omitted.

Note that you should enter the whole path as well as the file name, for example a:\itpexamp.rec

The name of the file and the number of records appears at the top of the screen, indicating that the file has been found and read. We also see the all records have been selected (as so far we have not specified any criteria for selecting or rejecting records).

Page 3 of 19Minsk, February 2000

Page 4: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

Browsing the dataYou can browse the data by pressing F4. As you pass through the different columns, you can see what type of variables they contain at the top of the screen.

If you press F4, Full screen mode is selected, this shows a single record in its entirety.

Pressing F5 will start Split mode, this is a combination of both modes, browse in the top window and Full screen in the bottom

Note that although we entered browse by pressing F4 in the analysis mode, we could have also typed browse at the prompt.

First look at the data

When starting to look at any new set of data, one of the first steps is to check that the values of the variables are sensible and that they correspond to the codes defined in the coding schedule or other documentation about the data. For the categorical variables, we might do one-way tables to check that only the specified codes occur and to check for missing values, for example in the sex field there should only be the values 1 and 2. For continuous variables, we need to obtain summary statistics (mean, standard deviation, minimum, maximum) and to check that these are what we expect.

Page 4 of 19Minsk, February 2000

Page 5: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

One-way frequency tablesWe start by producing one-way tables for the categorical variables. At the prompt type

tables sex

The resulting table appears in the Output window

This shows that there are 45 males (1’s) and 35 females (2’s) together with percentages, the total number of records (80) and summary statistics (sum, mean and standard deviation). Ignore for now the Student’s t-distribution.

Exercise:

Repeat the tables command for the observed ages (observeage). (Note that in many cases age would have a large number of possible values and so a frequency table might be large and unwieldy and so other commands would be used - however here we have a small range of ages and the table can be quite useful)

What is the youngest age ?

What is the average (mean, mode and median ) age ?

How many 13 year olds are there ?

How many children are 13 years or younger ?

Page 5 of 19Minsk, February 2000

Page 6: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

What percentage of the children are 13 years or younger ?

Entering Commands and Variable names using Menus

We have used the tables command by typing it at the command prompt. It is also possible to enter commands by selecting them from a list of commands, similarly it is also possible to select the variable names from a list of variables.

If, for example, we wanted to construct a table of sex, F2 is the Commands key which brings up a list of possible commands . The tables command is in the General section, by highlighting it and pressed ENTER, the command is ‘pasted’ into the command line. Now press the Variables function key, F3, and a list of the variables will be shown. Highlighting sex and pressing ENTER will paste into onto the command line, which can now be entered giving the same results as when we typed in the commands by hand.

If you want to pick more than one variable in this way, as will be the case when we do two way tables, you can tag groups of variables using the plus (+) and minus (-) sign, i.e. select sex and press + and then select observeage and press +. You will see that these two variables will have been tagged (marked) by a small sign, pressing ENTER and both of them will appear in the command line. This also works for more than 2 variables.

Displaying Categorical VariablesThe distribution of each of the categorical variables can be displayed using either a bar chart or a pie chart. At the command prompt type (or select)

pie sex

A pie chart should appear on the screen, showing the percentage of males and females. A bar chart can be produced using the command bar

bar sex

Exercise:

Produce pie and bar charts for thyroid medication (THYRMEDICA).

Statistics on Continuous VariablesThe command for obtaining summary statistics for continuous variables is means, for example

means height

The output is the same as for the tables command – a frequency table followed by summary statistics. Because there are so many different values for height, the table is much longer. For continuous variables a frequency table is not much use – except for checking for suspicious values. The means command (unlike the tables command) allows us to suppress the frequency table and print only the summary statistics. This is achieved as follows

Page 6 of 19Minsk, February 2000

Page 7: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

means height /n

The full specification of the means command includes a grouping variable, but for now we are dealing with all the data together. At this stage we do not want to subdivide the data into groups, for example males and females separately. We need a way of forming one group of all the records in it. This is done as follows:

let groupall = 1

This creates a new variable groupall which has the value 1 for every record. Thus to group the data by groupall will cause all the records to be included in one group. If you browse the data you can see the new variable. We can now use the means command

means height groupall /n

This produces an entirely different output – no frequency table and a total of 11 statistics.

Exercise:

What are the mean and standard deviation of the WEIGHTs of the 80 children ?

What are the median and interquartile range (75 th percentile – 25th percentile) of the weights of the 80 children ?

What is the range of the weights of the weights ? (minimum – maximum)

Displaying Continuous Variables

Neither bar charts or pie charts are sensible ways of displaying continuous variables with a lot of different values. Try one of the commands on height and you will see that the result is not very useful.

Bar charts have individual separated bars and are used to display categorical variables for which the order of the categories is irrelevant. Histograms are used for continuous variables.

Usually a continuous variable is grouped before the histogram is drawn. However, if the variable has a relatively small number of distinct values a histogram can sometimes give a good representation of the distribution.

histogram observeage

Exercise:

Are the observed ages of the children approximately Normally distributed ?

Try doing a histogram of the heights of the children

histogram heights

Page 7 of 19Minsk, February 2000

Page 8: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

If there are a lot of different values, then the resulting histogram can be less useful. We might want to group the variable. To group the height variable we need to create a new variable, which we will call htgp , which will have grouping interval of 10cm. To form the groups we use the let statement to divide height by 10 and assign the result to htgp. Because we want the new variable to have integer values rather than exact values with decimal places, we use the div operator (this is the way that Epi-info does integer division – the traditional / will give the exact answer)

let htgp = height div 10

Before looking at the histogram, see the effect of the let statement by getting a frequency distribution for htgp

tables htgp

You will see that 8 height groups have been created. Now type

histogram htgp

Exercise:Are the heights Normally distributed ?

Page 8 of 19Minsk, February 2000

Page 9: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

Repeat this process for weights. Create a new variable called wtgp again using the let command and the div operator, choosing a sensible grouping interval.

Are the weights approximately Normally distributed ?

Saving variables and leaving Epi-Info

If you have created new variables, you might want to save them for use the next time you use Epi-Info. You could re-write the original data file, but it is recommended that you save to a new file. To do this you first need to route the output and then to designate a file to which the new dataset (including both the old and new variables) will be saved. If we wanted to save out new dataset to a file called itpnew.rec we would type

route itpnew.rec

Again, it is important that you put in the full path for the file, e.g. a:\itpnew.recAnd then to write the data to that file

write recfile

To leave Epi-Info, press F10 to leave Analysis and return to the main Epi-Info menu, and then press F10 (or select Quit) to leave Epi-Info

Analysis of Quantitative variables

IntroductionHere, we are going to use Epi-info to analyse data in the form of continuous variables, i.e. quantitative variables measured on a continuous scale. We shall use the means command to compare continuous variables classified by categorical variables, we shall also see how two continuous variables can be compared using scatter and regress.

Using the Means Command In this section we use the ungrouped HEIGHT variable, and ask the hypothetical question of whether a child’s weight varies according to its sex and height.

One of the advantages of using statistical packages is that it is easy to examine the data visually before proceeding to formal statistical analysis. This is one way of checking whether the assumptions made in the analysis are reasonable. We can examine a scatter plot of the data

scatter sex height

ExerciseExecute the above command. Note that the first variable is put on the x-axis and the second on the y-axis.

Guess the mean height for each sex

Page 9 of 19Minsk, February 2000

Page 10: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

Mean height for sex = 1Mean height for sex = 2

Does this suggest an association between height and sex ?

Are there any outlying observations ?

These graphs are not really suitable for presenting the data, since it is difficult to discern the distribution of height where the points are crowded together. An alternative graphical presentation to illustrate the variation in height according to sex is to use histograms.

Exercise:Type the following:

let hgtrp = height div 10histogram hgtrp

This produces a histogram of all the values of height. Epi-info allows us to use subgroups of the data with the select command.

Type:select sex=1histogram htgrp

To see a histogram of height for males only. Note the result of the select command is shown at the top left of the screen as Criteria: sex=1

Exercise:Plot a histogram for the heights of females, is there any difference ?

What happens if we forget to type select before select sex=2 ? (remember to type select again before the next section!)

Recall that we used the means command to derive summary statistics for a variable. Remind yourself of the reason for having to create a new variable with a single value to get the means output for a single variable

let groupall = 1means weight groupall /n

Make sure that you understand the output, now to calculate the statistics for each sex separately

means height sex /n

The first part is the same summary as you have already seen, but subdivided by each level of sex

ExerciseDo these results compare with what you guessed when you looked at the scatter plot ?

Page 10 of 19Minsk, February 2000

Page 11: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

Linear RegressionIf we are examining the relationship between two continuous variables, such as height and weight we might start by drawing a scatter diagram before proceeding to formal statistical tests.

scatter height weight

Exercise :Does there appear to be a straight line relationship between the two variables ? If so, guess the best straight line, now estimate the slope of the best straight line as follows:

Pick two points, A and B, towards the ends of your line (A at the bottom, B at the top). Write down the values of height and weight for each point.

At point A heightweight

At point B heightweight

The slope of the line is (heightB – heightA)/(weightA-weightB)

What do you calculate the slope to be ?

We can use Epi-info to perform linear regression using the regress command. To use this to perform a linear regression of height on weight type:

regress height weight

Note that the regress command requires the dependent (response) variable, to go on the y-axis and then the independent (explanatory) variable to go on the x-axis. We are given the correlation, together with 95% confidence limits.

Exercise:What does this correlation tell you about the relationship between height and weight ?

The program then gives us output which tests the null hypothesis that the slope of the line is equal to zero (i.e. no relationship between the two variables). The next part of the output is the estimated regression coefficients. These are estimates of the parameters and in the formula:

height = + x weight

The estimate of the parameter is labeled as the -coefficient for variable weight and is the slope of the line. The estimate of parameter is labeled as the Y-intercept, i.e. the value of height when weight=0.

Exercise:What is the equation of the fitted line ? How does it compare to the estimate that you previously calculated ?Using the equation of the Epi-info fitted line, calculate the following

Page 11 of 19Minsk, February 2000

Page 12: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

the predicted value of height for weight = 120the predicted value of height for weight = 170

How do these values compare with what you would have got using your original equation ?

Note the value of the y-intercept in the absurd case that weight=0, this apparently ridiculous result arises because the relationship between height and weight is not linear over the entire range of the data, although it does look to be a reasonable approximation over the range we are examining. One of the reasons for checking the data graphically is to check whether the relationship might be linear, or whether a curve might be a better description.

Epi-info will plot the regression line on the graph for you.

scatter height weight /r

Analysis of Categorical Data

IntroductionIn this session, we aim to use Epi-info for the analysis of categorical data. In particular we will construct and interpret two-way tables of categorical data, test the association between 2 categorical variables using the chi-squared test, analyse the association between two binary variables in the presence of one or more confounding variables and test for a linear test in proportions.

Using the Tables command We start by asking whether THYROIDILL (whether the child has ever had thyroid disease) is associated with a child’s SEX. The variable THYROIDILL has the value 1 for yes, 2 for no, and 3 for don’t know, and sex takes values 1 for male and 2 for female. For the purposes of this teaching exercise, we will ignore the cases where thyroidill takes the value 3, as we require a binary variable, i.e. one that can only take 2 values. In order to use just that subset of the data we use the select statement

select thyroidill < 3

Since these are now two categorical variables, we can look at their associations using a two way table, using the tables command.

tables thyroidill sex

Exercise:The program prints out the required 2x2 table. Examine the table and decide which part of the output indicates whether there is an association between the two variables ?

It would be easier to see the measure of association if the table had percentages on it. We can request these using the set command

set percents=on

Page 12 of 19Minsk, February 2000

Page 13: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

Now repeat the tables command

tables thyroidill sex

This time the tables appear with row and column percentages in the cells of the tables. The row percentages are printed first, with an arrow beside them pointing to the denominator on the right. The column percentages are printed underneath the row percentages. However, the tables has a rather muddled appearance and you have to look at it quite carefully to see what percentage and cell counts are. It would be better if there were more space between the four cells - or if there were lines between them. This can be achieved using the lines command.

set lines=on

If we now concentrate on the statistics provided within the table. We are given, with confidence limits, the odds ratio and relative risk, together with Chi-squared statistics with and without continuity correction (“Yates corrected”). There is also a “Mantel-Haenszel” chi-squared statistics, which is really for stratified analyses and can be ignored for the time being.

Exercise:Check that you can calculate

(i) the odds ratio(ii) the relative risk(iii) the chi-square statistics, with and without continuity correction

What is the response variable ?Which is the explanatory variable ?What do you conclude about the association between the two variables ?

Note that Epi-info assumes that the response variables is the column variable in the table, the one listed second in the tables command.

Using the Statcalc program for stratified analysis

IntroductionOften when we are investigating the relationship between two variables, we want to take into account the effect of other variables that have associations with both the response and explanatory variables we are interested in. Epi-info can be used to allow for such confounding variables, using the tables command and also a separate module called statcalc.

To illustrate these methods, we use another dataset, on the use of bed nets and the presence of enlarged spleens in two villages in Africa.

The data is as follows:

Village A Village BSpleen enlarged Spleen enlargedyes no Total yes no Total

With nets 12 (50%) 12 24 15 (22%) 52 67Without nets

42 (59%) 29 71 4 (25%) 12 16

Total 54 (57%) 41 95 19 (23%) 64 83

Page 13 of 19Minsk, February 2000

Page 14: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

Both villages

combined Spleen enlargedyes no Total

With nets 27 (30%) 64 91Without nets

46 (53%) 41 87

Total 73 (41%) 105 178

A stratified analysis is necessary here, because village is a confounding factor – being related both to the response variable (enlarged spleen) and the explanatory variable (bed-net use)

We can conduct this analysis using the statcalc module of Epi-info. We start this module from the Epi-info menu (after exiting Analysis by pressing the F10 key)

You are given the choice of three options – choose the first option, Tables (2x2, 2xn). You are then faced with the traditional, “Exposure by Disease” table:

Disease+ -

Exposure +-

Note that, once again the disease (response variable) must be the column variable and exposure the row variable.

Exercise:We now have to enter the data for the two villages combined, entering cells counts only and not totals, as follows

Type 27 and press ENTER. Notice that the cursor automatically goes to the next cellType 64 and press ENTERType 46 and press ENTERType 41 and press ENTER

If you have entered the four cell counts correctly, press F4 to request the analysis of the table. A set of statistics, similar to those produced from the tables command used earlier is given.

Page 14 of 19Minsk, February 2000

Page 15: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

Exercise:What are the values of the

(i) Relative risk ?(ii) Yates corrected chi-squared test?

Now we have to enter the data separately for the two villages, press the ENTER key twice to return to the blank table.

Exercise:Enter the data for the first table (village A), press F4 to get the analysis. For village A, What are the values of the

(iii) Relative risk ?(iv) Yates corrected chi-squared test?

To enter the data for village B, press the F2 key and proceed as before. For village B,What are the values of the

(v) Relative risk ?(vi) Yates corrected chi-squared test?

To get the summary analysis, press the ENTER key.What are the values of

(vii) The crude RR?(viii) The summary RR ?

Page 15 of 19Minsk, February 2000

Page 16: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

(ix) The Mantel Haenzel summary chi-square ?

What are your conclusions about the relationship between bed-net use and presence of an enlarged spleen ?What was the effect of controlling for the confounding variable, village ?

Linear trend in proportionsAnother technique in the analysis of tables is the statistical test for a linear trend in proportions, and this can also be performed within Statcalc . This test may be used in the analysis of a 2xc table (2 rows and c columns), where the column variable is an ordered categorical variable such as age-group. It provides a more sensitive test for the association between the two variables. As an example, we use the following data showing the proportions of women with early age at menarche by their triceps skinfold thickness.

Triceps Skinfold GroupSmall Intermediate Large Total

Age at < 12 years 15 (9%) 29 (13%) 36 (19%) 80menarche 12+ years 156 197 150 503

Total 171 226 186 583

Note that there is an increasing trend in the proportion of women with early age at menarche as skinfold thickness increases.

Exercise:The usual 2x3 chi-square test can be carried out within Statcalc by selecting the (2x2,2xn) option, as before. The table has 2 rows and 3 columns, but Epi-info requires the data to be entered as 2 columns and 3 rows. Starting with the blank table, enter the data (cell counts only, no totals), for the ‘Small’ group (from column 1 of the table), remembering to press ENTER after each number has been entered. Now enter the cell counts for the ‘Intermediate’ group (column 2 from the table above). Now just continue typing to enter a third row of numbers for the ‘Large’ group. After the third row, press F4 to get the analysis. Statistics for the table are displayed

(i) What is the value of the chi-square?(ii) How many degrees of freedom are there?(iii) What is the p-value ?

Now we will carry out a test in proportions, to do this we have to assign a numerical score to each column (the skinfold group from the table), namely 1, 2 and 3.

In order to perform a chi-squared test for trend, return to the Statcalc menu by pressing F10. Notice that the third option is chi-squared test for trend .

You will be asked to enter the following:

Exposure score Cases Controls

For each column of the table above:the exposure score is the value of the numerical score;cases refers to the number of women with age at menarche <12 years;

Page 16 of 19Minsk, February 2000

Page 17: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

controls refers to the number of women with age at menarche 12+ years;

Press ENTER after each entry. After entering the last number, press the F4 key to calculate the statistics.

Exercise:(i) What is the value of the chi-squared test for trend ?(ii) How many degrees of freedom are there ?(iii) What is the P-value ?

Note that the odds ratios are given, using the ‘Small’ category as the baseline. This confirms the initial observation that there is an increasing proportion of women with early age at menarche.

Exercise:Perform a similar analysis on the following data, which describes 128 children aged under 12 years who were followed up during the malaria season to record which of them experienced clinical attacks of malaria. The results by age group were:

Age-group Number getting malaria

Total Percentage getting malaria

1-2 19 303-4 16 245-6 12 267-8 11 27

9-11 7 21

Page 17 of 19Minsk, February 2000

Page 18: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

(i) Calculate the percentage of children in each age group who contracted malaria(ii) Conduct a significance test to assess if there is any evidence of age-related variation

in malarial morbidity(iii) Carry out a test for trend in proportions(iv) What are your conclusions ?

Reading a data file into Epi-infoThere are three steps to entering your own data into Epi-Info. If you have a comma delimited file of data they are:(i) Create a questionnaire file detailing the variables within the dataset(ii) Convert the questionnaire file to an empty .rec file ready to take the data(iii) Read the data into the .rec file

Creating a questionnaire (qes) FileFrom the main menu, select the Eped program, which is a very simple word processor. We are going to create a file layout for our data and save it in a text file with a .qes file extension. This step can also be done, probably with more ease in a ‘proper’ word processor such as MS Word, just make sure the file is saved as text with a .qes file extension and not as a word document or with a .txt file extension).

The layout of the file will consist of names for all the variables in your data file and some information to define them. An example of a file layout is given in itpexamp.qes

PERSONCOD ##SEX #OBSERVEAGE ##THYROIDILLIS #THYRMEDICAMIS #THYRFAMILYIS #IODSALTIS #SEAPRODUCTIS #OBSERVEGORMON #OBSERVEIOD #OBSERVEVITAMINE #HEIGHT ###WEIGHT ##RIGHTSHIRINA #.##RIGHTTOLSHINA #.##RIGHTDLINA #.##LEFTSHIRINA #.##LEFTTOLSHINA #.##LEFTDLIN #.##IODURINE #.##

Next to the variable names are special characters that define the type and length of the variable. The first variable in this file is the person code (PERSONCOD) which is a number can take up two digits, i.e. it can be from 0 to 99. The ## characters imply that the variable is of numeric type and is made up of 2 digits.

Page 18 of 19Minsk, February 2000

Page 19: An introduction to Epi-info - Personal Homepages for …people.bath.ac.uk/masgs/other_teaching/An introduction to... · Web viewAn introduction to Epi-Info Gavin Shaddick Minsk, February

Introduction to Epi-Info

Variable typesThere are four basic variable types allowed within Epi-Info. Once the variable type is defined, Epi-Info will only allow data of that type to be entered for that variable. The variable types are as follows:

(i) Numeric variables are defined using the # character. The number of characters defined the length, so that #### defines a variable with 3 digits. A decimal point can also be included so that ###.## can hold numbers between –99.99 and 999.99

(ii) Text variables are defined using the _ (underline/underscore) character, with the number of characters defining the length. Alternatively , a text variable can be defined as upper case, using the ‘A’ character between less-than and greater than signs, e.g. <A>.

(iii) Date variables are defined as a date format between less-than and greater than signs, e.g. <DD/MM/YY> defines a variable in the form day. month, year

(iv) Logical or yes/no variables are defined as <Y>.

Creating a .rec fileWhen we have created a .qes file, for example itpexamp.qes we need to convert it to a .rec file, for example itpexamp.rec , which uses the definitions we have put in out questionnaire file

Open the Enter program from the menu. You are asked for the name of a .rec file, but since it doesn’t exist you must put in the name you want to call it. You now want to create new data from a .qes file, which is choice 2 which will ask you to enter the filename of your .qes file. You will now see the questionnaire (file layout) that you have created, with spaces highlighted for each of the variables. We need go no further within Enter as we have now created a .rec file into which we can load out pre-prepared data.

Importing the dataFrom the menu select the Import program, you are presented with a screen asking for two file names and information on the layout of the type of data to be read.

In the first space enter the name of the .rec file that contains the file layout. The second filename required is the name of the file that contains the data, for example itpexamp.csv which is comma separated. In the case of comma separated data, choose the ‘delim’ option for delimited (or separated by something). The other option ‘fixed’ is for when the data is in a very rigid structure, which is unlikely to be the case if the data has been extracted from a common spreadsheet or statistical package, such as Excel or S-plus.

The data will now have been loaded into the .rec file and is ready for use in Analysis.

Page 19 of 19Minsk, February 2000