Post on 03-Jan-2016
description
Organizing and Displaying Epidemiologic Data
with Tables and Graphs
2
Displaying Data
Learning Objectives
Discuss the difference between tables and graphs for written reports versus oral presentations
Create and interpret one and two variable tables Create and interpret a line graph Create and interpret an epidemic curve Create and interpret one and two variable bar
charts Describe when to use each type of table, graph,
and chart
3
Displaying Data
Can you summarize the age and sex of the case-patients at a
glance?Case No.
Date of Onset Age Sex
1 21 Nov 9 M
2 21 Nov 39 M
3 22 Nov 29 F
4
Displaying Data
Can you summarize the age and sex of the case-patients at a
glance?Case No.
Date of Onset Age Sex
1 21 Nov 9 M
2 21 Nov 39 M
3 22 Nov 29 F
4 21 Nov 10 M
5 22 Nov 55 F
6 22 Nov 11 M
5
Displaying Data
Can you summarize the age and sex of the case-patients at a
glance?Case No. Age Sex
1 9 M
2 39 M
3 29 F
4 10 M
5 55 F
6 11 M
7 9 M
8 7 F
9 17 M
10 10 M
Case No. Age Sex
11 10 M
12 6 M
13 9 M
14 40 M
15 40 F
16 10 M
17 11 M
18 43 F
19 71 F
20 9 F
Case No. Age Sex
21 38 F
22 34 F
23 9 M
24 10 M
25 6 F
26 11 M
27 9 M
28 41 M
29 6 M
30 11 M
Case No. Age Sex
31 10 M
32 31 F
33 8 F
34 9 M
35 10 F
36 11 M
37 38 M
38 11 M
39 7 M
40 16 F
6
Displaying Data
Basic Methods for Organizing and Presenting Data
Data can be organized through creation of:– Tables– Graphs– Charts
7
Displaying Data
Why organize and present data?
To summarize when data set has too many records to look at individually
To become familiar with the data before analysis, and to catch errors
To look for (and display)– Patterns– Trends– Relationships– Exceptions / outliers
To communicate findings to others
8
Displaying Data
Written vs. Oral Presentation
Written Time unlimited Details OK White, grey and black
Oral Time < 1 min Less detail Colors possible
9
Displaying Data
How to organize data
Identify what data you have Use tables and graphs to summarize; catch
errors; identify patterns, relationships Decide how best to summarize the data to
communicate the findings Use tables and graphs to communicate the
findings effectively
Tables
11
Displaying Data
Tables
Data are arranged in rows and columns Quantitative information Usually, presents frequency of occurrence
of some event or characteristic in different subgroups
12
Displaying Data
Tables
Earthquake-related injury
Other injury Total
Male 74 259 333
Female 85 151 236
Unknown 3 9 12
Total 162 419 581
Column
Row
Cell
Clear, concise labels
Row totals
Column Totals
Type of injury by sex, Port-au-Prince field hospital, Haiti, January 13 – May 28, 2010
Descriptive Title (What, where, when)
CDC. Post-earthquake injuries treated at a field hospital — Haiti, 2010. MMWR 59:1673-1677.
Footnote, source
Unknown, if needed
13
Displaying Data
Types of Tables
1-variable table (frequency distribution)– Range of values of a single variable– Number of observations with each value
2-variable table– Counts shown according to 2 variables at once
3-variable table – Counts shown according to 3 variables at once
Composite (combination) tables
14
Displaying Data
Example of 1-Variable Table —Tuberculosis Cases by Sex, U.S.,
2009
Sex # Cases
Males 6,990
Females 4,544
Unknown 11
Total 11,545
Table 1. Number of Reported Cases of Tuberculosis,by Sex, United States, 2009
CDC. Reported Tuberculosis in the U.S., 2009. Atlanta: CDC, October 2010.
15
Displaying Data
Example of 1-Variable Table —Tuberculosis Cases by Age, U.S.,
2009
Age Group (years) # Cases≤ 5 401
5 – 14 24515 – 24 1,27425 – 44 3,89345 – 64 3,434
≥65 2,292Unknown 6
Total 11,545
Table 2. Number of Reported Cases of Tuberculosis,by Age, United States, 2009
CDC. Reported Tuberculosis in the U.S., 2009. Atlanta: CDC, October 2010.
16
Displaying Data
Example of 1-Variable Table, with Percent Column
CDC. Reported Tuberculosis in the U.S., 2009. Atlanta: CDC, October 2010.
Age Group (years) # Cases Percent≤ 5 401 3.5%
5 – 14 245 2.1%15 – 24 1,274 11.0%25 – 44 3,893 33.7%45 – 64 3,434 29.7%
≥65 2,292 19.9%Unknown 6 0.1%
Total 11,545 100.0%
Table 2. Number of Reported Cases of Tuberculosis,by Age, United States, 2009
18
Displaying Data
Creating Categories
Mutually exclusive, all inclusive Choices
– Standard categories for the disease– Equal intervals– Equal numbers within each group
Include category for unknown values When analyzing data, begin with more
categories, then collapse into a smaller number of categories for presentation
19
Displaying Data
Some Standard Categories in U.S.
Notifiable Diseases P&I mortality
NCHS mortality HIV/AIDS
< 1 year1-45-9
10-1415-1920-2425-2930-3940-4950-59≥60
Not stated
Total
< 28 days28 d – 1 yr
1-1415-2425-4445-6465-7475-84≥85
Unknown
Total
< 1 year1-4
5-1415-2425-3435-4445-5455-6465-7475-84≥85
Not stated
Total
< 5 years5–12
13–1415–1920–2425–2930–3435–3940–4445–4950–5455–5960–64
≥65Total
20
Displaying Data
Two-Variable Tables
Shows counts according to two variables simultaneously
Also called “cross-tab” or contingency tables
21
Displaying Data
Age Group Females Males Unk Total
Example of Two-variable Table
≤ 5 187 214 0 4015 – 14 119 126 0 245
15 – 24 559 713 2 1,27425 – 44 1,641 2,247 5 3,89345 – 64 1,153 2,278 3 3,434
≥ 65 882 1,409 1 2,292Unknown 3 3 0 6
Total 4,554 6,990 11 11,545
Table 3. Number of Reported Cases of Tuberculosis,by Age and Sex, United States, 2009
CDC. Reported Tuberculosis in the U.S., 2009. Atlanta: CDC, October 2010.
22
Displaying Data
Example of Two-by-Two Table
Drank from stream near Campsite 6?
Ill Well
Yes 18 4 22
No 5 39 44
23 43 66
23
Displaying Data
Example of Two-by-Two Table
Drank from stream near Campsite 6?
Ill WellAttack
Rate (%)
Yes 18 4 22 81.8%
No 5 39 44 11.4%
23 43 66
24
Displaying Data
Example of Three-variable Table
Females Males
Age group U.S. Other U.S. Other Total
≤ 5 167 20 183 31 401
5–14 82 37 72 54 245*
15–24 178 377 207 499 1,274*
25–44 411 1,215 635 1,591 3,893*
45–64 463 669 1,172 1,080 3,434*
65+ 365 509 631 761 2,292*
Total 1,667* 2,829* 2,900* 4,019* 11,545*
* Totals includes cases with missing age, sex, or birth country
Table 3. Number of Reported Cases of Tuberculosis,by Age, Sex, and Birth Country, United States, 2009
CDC. Reported Tuberculosis in the U.S., 2009. Atlanta: CDC, October 2010.
25
Displaying Data
Composite (Combination) Tables
Combines two or more 1-way or 2-way tables Uses limited space efficiently Well suited for written and oral presentations,
but simple tables must be prepared first
26
Displaying Data
Composite Table Example
Ortiz, Katz, Mahmoud, et al. J Infect Dis 2007;196:1685-1691
27
Displaying Data
Why Tables?
When too many records, summarize in table (or graph)
Allow you to identify, explore, understand, and present distributions, trends, relationships, variations, and exceptions in the data
Tables serve as basis for graphs – always create a table first!
28
Displaying Data
Some Tips for Creating Printed Tables
Keep it simple Should be self-explanatory Title (what, where, when) with table number Label each row and column clearly and concisely Include units of measurement (years, mg/dl, etc.) Show totals for rows and columns Explain codes, abbreviations, symbols Note any exclusions in a footnote Note source in a footnote
Graphs
30
Displaying Data
Graphs
Display quantitative data using a set of coordinates
Rectangular graphs (x, y coordinates) most common
x axis along bottom = method of classification, often time
y axis along side = frequency, usually number, percent or rate
31
Displaying Data
Graphs: Advantages and Disadvantages
Advantages Easy to understand and interpret Reveal patterns in data
– Useful for generating hypotheses– Useful before formal data analysis
Disadvantage Loss of detail
32
Displaying Data
Graph Types
Arithmetic-scale line graph Histogram Many other types, not covered in this lecture
– Semilogarithmic-scale line graph– Frequency polygon– Cumulative frequency curve– Survival curve– Scatter diagram
33
Displaying Data
Arithmetic Scale Line Graph
# Cases
Intervals on x-axis are equal
Intervals on y-axis are equal
Start y-axis at 0; use scale breaks only if you must
Useful to portray data collected over time
34
Displaying Data
Creating a Line Graph
Make x-axis longer than y-axis (best ratio 5:3) X-axis: Match x-axis scale to intervals used during
data collection Y-axis:
– Always start y-axis with 0– Identify largest value, round up for maximum Y value– Select reasonable intervals for y-axis
Plot data Create title Add comments, footnotes
35
Displaying Data
Creating a Line Graph:X-axis and Y-axis
Y-axis
X-axis
36
Displaying Data
Creating a Line Graph:Complete X-axis, Label X-axis
Data for Years 1960 – 2008
37
Displaying Data
Creating a Line Graph:Complete Y-axis, Label Y-axis
Number of Cases
481,530 cases in 1963
38
Displaying Data
Creating a Line Graph;Plot the data
Number of Cases
39
Displaying Data
Creating a Line Graph:Add Title
Number of Reported Cases of Measles by Year, United States,
1960–2008Number of Cases
40
Displaying Data
Number of Reported Cases of Measles by Year, United States,
1960–2008Vaccine licensed
Number of Cases
Creating a Line Graph:Add Comments, Footnotes, Source
CDC. Summary of Notifiable Diseases, U.S., 2008. Atlanta: CDC, June 2010.
41
Displaying Data
Number of Reported Cases of Measles by Year, United States,
1960–2008Vaccine licensed
Number of Cases
Graph with Inset
CDC. Summary of Notifiable Diseases, U.S., 2008. Atlanta: CDC, June 2010.
42
Displaying Data
Age-Adjusted Death Rates for Leading Causes of
Death, United States, 1987-2005
†
Dea
ths
per
100,
000
43
Displaying Data
Comments on Arithmetic-Scale Line Graph
Method of choice for plotting rates over time X-axis almost always time (rarely, age) Y-axis can be counts, proportions, or rates
– Y-axis should start with 0– Determine largest value of Y needed to plot– Round off that number and divide into intervals
Set distance on either axis represents same quantity anywhere on that axis
Good for comparing 2 or more sets of data
44
Displaying Data
Histogram
“Epidemic curve” in outbreak investigations Frequency distribution of quantitative data x axis continuous, usually time (onset or
diagnosis date) No spaces between adjacent columns, i.e.,
adjacent columns “touch” Easiest to interpret with equal class (x) intervals Column height proportional to number of
observations in that interval
45
Displaying Data
Histogram
46
Displaying Data
Feb. 13 14 15 16 17 18 19 20 21
One Case
Date and Time of Symptom Onset
No spaces between adjacent columns
Party
Number of Cases of Salmonella Enteritidis
by Date of Onset, Chicago, February 2000
47
Displaying Data
Feb. 13 14 15 16 17 18 19 20 21
One Case
Date and Time of Symptom Onset
Party
Number of Cases of Salmonella Enteritidis
by Date of Onset, Chicago, February 2000
48
Displaying Data
Feb. 13 14 15 16 17 18 19 20 21
Probable Case
Date and Time of Symptom Onset
Party
Number of Cases of Salmonella Enteritidis
by Date of Onset, Chicago, February 2000
Culture-confirmed Case
Charts
50
Displaying Data
Charts
Display quantitative data using only one coordinate
Most appropriate for comparing data with discrete categories
Common types include:– Bar charts– Pie charts– Maps– Other
51
Displaying Data
Bar Charts
Can be vertical or horizontal Use for variable with discrete, non-linear
categories, such as county Has space between “columns”, since categories
are not continuous
4 types – simple, grouped, stacked, 100%
Best type depends on desired emphasis
52
Displaying Data
Reported TB Cases by Race/Ethnicity
United States, 2001 (Simple Bar)
53
Displaying Data
Reported TB Cases by Race/Ethnicity
United States, 2001 (Simple Bar)
54
Displaying Data
Reported TB Cases by Race/Ethnicity
United States, 2001 (Simple Bar)
55
Displaying Data
HCV Prevalence by Selected Groups,
United StatesHemophilia
Injecting drug users
Surgeons
Hemodialysis
Average Percent Anti-HCV Positive
Gen’l pop’n adults
Military personnel
STD clients
Pregnant women
56
Displaying Data
Number of Reported Tuberculosis Casesby Birth Country and Year, U.S., 1991-
2007N
o.
of
Cas
es
(Grouped Bar Chart)Number of Cases
57
Displaying Data
Number of Reported Tuberculosis Casesby Birth Country and Year, U.S., 1991-
2007N
o.
of
Cas
es
(Stacked Bar Chart)Number of Cases
58
Displaying Data
Number of Reported Tuberculosis Casesby Birth Country and Year, U.S., 1991-
2007N
o.
of
Cas
es
(Stacked Bar Chart)Number of Cases
59
Displaying Data
100% Component Bar Chart
All bars same height (100%) Components shown as proportions of the total,
not actual values Good for comparing how components contribute
to the whole within a group Not useful for comparing relative sizes of the
components across different groups because the denominator changes
60
Displaying Data
Number of Reported Tuberculosis Casesby Birth Country and Year, U.S., 1991-
2007N
o.
of
Cas
es
(100% Component Bar Chart)Proportion of Cases
61
Displaying Data
Pie Charts
Show components of a whole Size of “slice” = proportional contribution of
each component Hard to compare two or more pie charts Begin at 12 o’clock with largest slice and
proceed clockwise
Provide label and percent for each slice
Don’t use 3-D!
62
Displaying Data
Hispanic(25%)
Black, non-Hispanic(30%)
Asian/Pacific Islander(22%)
White, non-Hispanic(21%)
American Indian/ Alaska Native (1%)
Reported TB Cases by Race/Ethnicity
United States, 2001 (Pie Chart)
63
Displaying Data
Some Tips for Creating Printed Graphs
Should be self-explanatory Title (what, where, when) with table number Label each axis clearly and concisely Include units of measurement (years, mg/dl, etc.) In epidemiology, start Y-axis at zero Epidemic curve = histogram
64
Displaying Data
Selecting the Right Presentation Method 1
Type of Graph or Diagram
Application
Arithmetic Scale Graph
Histogram
Number, proportion or rate over time
1.Frequency distribution for a continuous variable
2. Number of cases during an epidemic (epidemic curve) or over time
65
Displaying Data
Selecting the Right Presentation Method 2
Type of Graph or Diagram
Application
Simple bar chart
Grouped bar chart
Stacked bar chart
Pie chart
Compare the size or frequency of different categories of the same variable
Compare the size or frequency of different categories across 2 or more variables
Compare totals and display component parts for 2 or more categories of second variable
Display parts of a whole
66
Displaying Data
Question 1 — What’s Wrong With This Graph?
Year
No
. of C
ase
s
Source:http://wonder.cdc.gov/tb-v2007.html
Reported Tuberculosis Cases, United States, 1981-2007
67
Displaying Data
Answer 1 – Misleading
Year
No
. of C
ase
s
Source:http://wonder.cdc.gov/tb-v2007.html
Reported Tuberculosis Cases, United States, 1981-2007
68
Displaying Data
Question 2 — What’s Wrong With This Epi Curve?
69
Displaying Data
Number of Cases of Gastroenteritis,Warehouse Workers, TN, August
2003
*
* Not counted as case
Catered dinner
70
Displaying Data
Question 3 — What’s Wrong With This Graph?
Rate* of Invasive Pneumococcal Disease by Age Group -- United States, 1998
* Rate per 100,000 population
71
Displaying Data
Rate* of Invasive Pneumococcal Disease by Age Group – U.S., 1998
* Rate per 100,000 population
72
Displaying Data
Question 4 — What’s Wrong With This Table?
Age Group (years) # Cases< 15 15
15 – 20 35120 – 25 84225 – 30 89530 – 35 1,09735 – 40 1,36740 – 45 1,02345 – 55 982
55+ 284Total 6,862
Number of Reported Cases of Syphilis (P&S) by Age, United States, 2002
73
Displaying Data
Number of Reported Cases of Syphilis (P&S)
by Age, United States, 2002Age Group (years) # Cases
< 14 1515 – 19 35120 – 24 84225 – 29 89530 – 34 1,09735 – 39 1,36740 – 44 1,02345 – 54 982
≥ 55 284Total 6,862
74
Displaying Data
Summary
Data can be organized through the creation of tables, graphs and charts
The purpose of creating these visual displays
1. verify and analyze the data
2. explore patterns and trends
3. communicate information to others An effective figure should be able to be
interpreted without any additional information
75
Displaying Data
Summary 2 Tables can illustrate the number of people with
particular characteristics and can provide valuable information about relationships between 2 variables
Line graphs are useful for showing patterns or trends over some variable, usually time
Histograms are most commonly used in epidemiology for epidemic curves (cases by time)
Bar charts provide a visual display of data from a one-variable table, but grouped bar charts can show 2 variables
76
Displaying Data
Conclusion
Choose the tool that best serves the data and purpose
Start with tables Use appropriate titles and labels Print ≠ PowerPoint KISS (message, colors, dimensions)