TIPS FOR SURVEY DATA ANALYSIS
Presented By:Dr. Michael Kaylen
University of Missouri
INTRODUCTION
• SURVEY DATA ANALYSIS INVOLVES TRANSFORMING SURVEY DATA INTO INFORMATION.
DATA INFORMATION
DATA
INFORMATIONNUMBER OF TRAVELERS IN MO
BY STATE OF ORIGIN AND MONTH.
MONTH_1 MONTH_2 MONTH_3 TOTAL
MO 1,411,300 1,408,444 663,828 3,483,571IL 498,092 369,995 95,497 963,584KS 261,961 331,999 104,022 697,982AR 148,049 85,801 80,671 314,521CA 181,330 42,411 75,195 298,936OK 162,171 75,132 42,820 280,123TX 60,200 107,057 74,255 241,511OTHER 726,492 739,200 500,286 1,965,978TOTAL 3,449,595 3,160,039 1,636,573 8,246,206
INTRODUCTION• DATA INFORMATION• TIPS FOCUS ON
EXCEL PIVOT TABLESWEIGHTED DATA
• APPLICATION TO HOUSEHOLD PANEL DATA
• MONTHLY SURVEYS OF HOUSEHOLDS
• 3 LEVELS OF DATAHOUSEHOLD (DEMOGRAPHICS)TRIP (# TRAVELING, STATES VISITED, ETC.)
STATE (# NIGHTS BY LODGING TYPE, EXPENDITURES, ETC.)
• SIMULATED DATA
HOUSEHOLD PANEL DATA
HOUSEHOLD PANEL DATA• HOUSEHOLD LEVEL DATA (HOUSE! - 54,824 OBSERVATIONS) HOUSEHOLD ID MONTH # TRIPS ORIGIN STATE HOUSEHOLD INCOME RANGE TWO WEIGHTS
HOUSEHOLD PANEL DATA• HOUSEHOLD LEVEL DATA• TRIP LEVEL DATA (TRIP! - 21,144
OBSERVATIONS) HOUSEHOLD LEVEL DATA # HOUSEHOLD MEMBERS ON TRIP PRIMARY TRIP PURPOSE PRIMARY TRANSPORTATION MODE (0/1) CODE FOR EACH STATE THREE WEIGHTS
HOUSEHOLD PANEL DATA• HOUSEHOLD LEVEL DATA• TRIP LEVEL DATA• STATE LEVEL DATA (STATE! - 23,225
OBSERVATIONS) HOUSEHOLD AND TRIP LEVEL DATA DETAILED STATE # NIGHTS BY LODGING TYPE EXPENDITURES BY CATEGORY (0/1) CODE FOR ACTIVITIES THREE WEIGHTS
EXCEL PIVOT TABLESANALYZE DATA USING 3 OPERATIONS:1.GROUP DATA INTO CATEGORIES• EX. - CREATE A PIVOTTABLE
PUT CURSOR ANYWHERE IN DATA TABLE, WORKSHEET HOUSE.
CLICK ON INSERT TAB
CLICK ON PIVOT TABLE ICON
CLICK OK
To Group: Drag Fields toRow/Column Labels
Cross-tab using both Rowand Column Labels
EXCEL PIVOT TABLESANALYZE DATA USING 3 OPERATIONS:1. GROUP DATA INTO CATEGORIES2. SUMMARIZE DATA USINGCALCULATIONS• COUNT, SUM, AVERAGE, MAXIMUM, MINIMUM, STANDARD DEVIATION
• EX.- LOOK AT NUMBER OF HOUSEHOLDS IN SAMPLE, BY STATE OF ORIGIN AND MONTH.
Change the type of calculation by clicking on the drop-down menu
CLICK ON “VALUE FIELD SETTINGS”
Click on Count, then OK
EXCEL PIVOT TABLESANALYZE DATA USING 3 OPERATIONS:1. GROUP DATA INTO CATEGORIES
2. SUMMARIZE DATA USINGCALCULATIONS
3. FILTER RESULTSCAN BE USED TO VIEW A
SUBSET OF RESULTS
WEIGHTED DATA•WEIGHTS ARE USED TO PROJECT SAMPLE DATA TO A POPULATION
EX. – A HOUSEHOLD WEIGHT OF 10,000 MEANS THAT PARTICULAR HOUSEHOLD “REPRESENTS” 10,000 HOUSEHOLDS IN THE POPULATION
WEIGHTED DATA•THE DESIGN WEIGHT OF A SAMPLE ELEMENT IS THE INVERSE OF ITS INCLUSION PROBABILITY
EX. – IF 20,000 HOUSEHOLDS ARE CHOSEN FROM A SIMPLE RANDOM SAMPLING DESIGN FROM 100,000,000 HOUSEHOLDS, THE DESIGN WEIGHT IS 100,000,000/20,000 = 5,000
WEIGHTED DATA•CALIBRATION WEIGHTS - COMPUTED USING DATA ON AUXILIARY VARIABLES (E.G., DEMOGRAPHICS)
•“BALANCE” SAMPLE DATA.
EX. – IF STUDYING TRAVEL TO MO AND SAMPLE UNDER-REPRESENTS NEIGHBORING STATES.
WEIGHTED DATACALCULATIONS WITH WEIGHTS
n
iiiXwX
1
Ex. – To estimate the total number of household trips, create a new variable:
WT_HH * HH_Trips
• To estimate population totals:
PivotTable: Estimated Number of Household Trips, by Month
WEIGHTED DATACALCULATIONS WITH WEIGHTS
n
iiiXwX
1
• To estimate population averages:
• To estimate population totals:
n
ii
n
iii
w
XwX
1
1
PivotTable: Including Sum of Household Weights, by Month
Calculation of Avg. Number of Trips per Household
POTPOURRI
• Monitor sum of weights over all observations, by strata.- Weight totals should reflect population numbers.
• Monitor number of observations, by strata (e.g., month, state).- Sample size is critical to accuracy.
POTPOURRI
Ex. 1 – Sampled Households, but interested in Household Trips (e.g., What percent of all household trips included travel in MO?).
• Be careful projecting to other than the sample design population.
POTPOURRI
- TRIP! contains detailed data on trips, each row (observation) corresponding to one trip.
- Already used data in HOUSE! to estimate 138,511,079 household trips taken during 3 months.
- Problem: household weights over all trips in TRIP! sum to only 124,116,209
POTPOURRI
Sampled households could only provide details for up to 3 trips, regardless of the number of trips actually taken.
Why the discrepancy?
Solution: create a new weight
3,3*_
3,_
TripsifTripsHHWTTripsifHHWT
WT_HHTrip =
Calculation of WT_HHTrip
PivotTable showing Sum of WT_HHTrip, grouped by TR_VisitMO
About 2.9% of all HH trips included MO.
POTPOURRI
Ex. 1 – Sampled Households, but interested in Household Trips.
• Be careful projecting to other than the sample design population.
Ex. 2 – Sampled Households, but interested in Travelers (e.g., What percent of all travelers visited MO?).
POTPOURRI- The original data set contains two numbers of potential interest for each detailed trip: the number of people in the travel party and the number of household members in the travel party.- Problem: which numbers to use?
POTPOURRISolution: Since the sampling design was based on households, not travel parties, use the number of household members in the travel party.
WT_PersTrip = WT_HHTrip * TR_HHMemTot
PivotTable showing Sum of WT_PersTrip, grouped by TR_VisitMO
About 2.9% of all travelers visited MO.
Thank You!
Questions, Comments?