Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of...
-
Upload
zackery-beardsley -
Category
Documents
-
view
216 -
download
0
Transcript of Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of...
![Page 1: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/1.jpg)
Analyzing Surveys
Marcos CarzolioAssociate Collaborator for LISA
PhD StudentDepartment of Statistics, VT
Laboratory for Interdisciplinary Statistical Analysis
![Page 2: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/2.jpg)
Outline
• Data Cleaning and Preprocessing• Outlier Detection• Missing Value Imputation
• Visualizing and Understanding Data• Boxplots, Histograms, and Scatterplots• Correlation Matrices
• Analyzing Data• Contingency Tables• Analysis of Variance (ANOVA)• Regression
![Page 3: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/3.jpg)
Laboratory for Interdisciplinary Statistical Analysis
www.lisa.stat.vt.edu
Laboratory for Interdisciplinary Statistical Analysis
LISA helps VT researchers benefit from the use of Statistics
www.lisa.stat.vt.edu
Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...)
Our goal is to improve the quality of research and the use of statistics at
Virginia Tech.
![Page 4: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/4.jpg)
How can LISA help?• Formulate research question.• Screen data for integrity and unusual
observations.• Implement graphical techniques to showcase
the data – what is the story?• Develop and implement an analysis plan to
address research question.• Help interpret results.• Communicate! Help with writing the report or
giving the talk.
• Identify future research directions.4
![Page 5: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/5.jpg)
Laboratory for Interdisciplinary Statistical
Analysis
Collaboration From our website request a meeting for personalized statistical advice
Great advice right now:Meet with LISA before collecting your data
Short Courses Designed to help graduate students apply statistics in their research
Walk-In Consulting
Monday—Friday 1-3 pm in 401 HutchesonAlso, Tuesdays 1-3 pm in ICTAS Café X
& Thursdays 1-3 pm in GLC Video Conf. Room for questions requiring <30 mins
All services are FREE for VT researchers.
LISA helps VT researchers benefit
from the use of Statistics
www.lisa.stat.vt.edu
Designing Experiments • Analyzing Data • Interpreting ResultsGrant Proposals • Using Software (R, SAS, JMP, Minitab...)
![Page 6: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/6.jpg)
Some Useful Resources
• R Statistical Computing Software• Can be downloaded for free from: http://www.r-project.org/ • R Studio, a free Integrated Development Environment:
http://rstudio.org/
• For a more interactive and user-friendly experience, try JMP• Downloadable from the Virginia Tech software library: http
://www2.ita.vt.edu/software/department/products/sas/jmp/index.html
• Amelia II: A Program for Missing Data• Visit: http://gking.harvard.edu/amelia/
![Page 7: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/7.jpg)
Types of Survey Data
Data Type Description Examples Statistics
Nominal Data with no intrinsic relative meaning behind labels
Strawberry, Banana, Hispanic
Mode
Ordinal Data with an ordered structure
Small, Extra Large, Likert Scale*
Median and Percentiles
Interval (continuous or discrete)
Data with meaningful difference relations
Degrees in Celsius, Birthdates, GPS Coordinates
Mean, Standard Deviation, Correlation
Ratio (continuous or discrete)
Data with scale relations
Weight, Income, Length
Mean, Standard Deviation, Correlation
![Page 8: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/8.jpg)
Outlier Detection and Handling
Outlier
• Outliers are data points that deviate far from the main body of data so as to arouse suspicion about their origins
• Visualize your data• Boxplots, histograms, and
scatterplots
• Only remove outliers that are verifiable errors
• Extremeness in observations is not in itself cause for data removal
• R Package ‘outliers’
![Page 9: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/9.jpg)
Missing Value Imputation
• Imputation is the process of filling in the missing values of a dataset
• Before considering imputation, try going after respondents for their true answers
• Can be very tricky (Come to LISA for help)
• If only one or two missing values are present in a vast dataset, use the mean of available values as a “best guess”
Honaker, James et al., AMELIA II: A Program for Missing Data
![Page 10: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/10.jpg)
Visualizing Your Data
Boxplots
SAS/GRAPH(R) 9.2: Statistical Graphics Procedures Guide, Second Edition
![Page 11: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/11.jpg)
Visualizing Your Data
Histograms
![Page 12: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/12.jpg)
Visualizing Your Data
Scatter Plots
![Page 13: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/13.jpg)
Understanding Your Data
Correlation Matrices
![Page 14: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/14.jpg)
Contingency Tables
• Tabulates the number of responses in each category
• Helps to visualize the distribution of data
• Use χ2 approximate test for independencePearson's Chi-squared test
data: tab X-squared = 0.7658, df = 2, p-value = 0.6819
Warning message:In chisq.test(tab) : Chi-squared approximation may be incorrect
![Page 15: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/15.jpg)
Analysis of Variance
• Technique used to test the differences between groups
• Always plot your data before doing analyses
Call:
aov(formula = resp_height ~ gender)
Terms:
gender Residuals
Sum of Squares 297.744 588.567
Deg. of Freedom 1 39
![Page 16: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/16.jpg)
Regression
• Actually a generalization of ANOVA
• Again, always plot your data
Call:lm(formula = exercise ~ dad_height)
Residuals: Min 1Q Median 3Q Max -5.9866 -3.4205 -0.3236 2.6709 14.0949 Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) -7.8573 10.7968 -0.728 0.471dad_height 0.1938 0.1546 1.253 0.218
Residual standard error: 4.381 on 37 degrees of freedom (8 observations deleted due to missingness)Multiple R-squared: 0.04073,Adjusted R-squared: 0.0148 F-statistic: 1.571 on 1 and 37 DF, p-value: 0.2179
![Page 17: Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.](https://reader036.fdocuments.in/reader036/viewer/2022081516/5519a543550346ce608b468a/html5/thumbnails/17.jpg)
Other Useful Resources
• A PowerPoint on more automated outlier detection techniques:• http://www.dbs.ifi.lmu.de/~zimek/publications/
KDD2010/kdd10-outlier-tutorial.pdf
• R Package ‘outliers’:• http://cran.r-project.org/web/packages/outliers/
outliers.pdf
• On multiple imputation:• http://sites.stat.psu.edu/~jls/mifaq.html#bayes