Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied...

11
Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis

Transcript of Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied...

Page 1: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

Andrew HoHarvard Graduate School of Education

Tuesday, January 22, 2013

S-052 Shopping – Applied Data Analysis

Page 2: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

Disciplined Perception: Experts vs. Novices

Page 3: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

22110 XXY

A single outcome variable

Continuous, interval scaled (noncategorical)

A single predictor variable…

May be transformed to meet regression assumptions of

normally distributed residualsIndependent and

identically normally distributed residuals

centered on 0

May be transformed to

achieve linearity

May be dichotomous or polychotomous

Multiple predictor variables

Interactions: Products of predictors

Quadratic/Polynomial Regression for nonlinear

relationships

What You’ve Learned

Page 4: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

Multiple RegressionAnalysis

Multiple RegressionAnalysis

22110 XXY

Do your residuals meet the required assumptions?

Test for residual normality

Use influence statistics to detect atypical datapoints

Are the data longitudinal?Use Individual growth modeling

If your residuals are not independent, replace OLS by GLS regression analysis

Specify a Multilevel Model

If time is a predictor, you need discrete-time survival analysis…

If your outcome is categorical, you need to use…

Discriminant Analysis

Multinomial logistic regression analysis (polychotomous outcome)

Binomial logistic regression analysis (dichotomous outcome)

If you have more predictors than you can deal with,

Create taxonomies of fitted models and compare them.

Conduct a Principal Components Analysis

Form composites of the indicators of any common construct.

Use Cluster Analysis

Transform the outcome or predictor

If your outcome vs. predictor relationship is non-linear,

Use non-linear regression analysis

What you will learn: The S-052 RoadmapWhat you will learn: The S-052 Roadmap

Page 5: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

What you will learn: The S-052 Roadmap8 Units

1. Taxonomies of Regression Models2. Nonlinear Regression3. Nonindependent Residuals

4. Logistic Regression5. Discrete-Time Survival Analysis

6. Forming Composites7. Cluster Analysis8. Factor Analysis

Page 6: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

Disciplined Perception: Gender in Math Instruction

http://www.edweek.org/ew/articles/2013/01/16/17gender.h32.htmlhttp://ftp.iza.org/dp6453.pdf

Page 7: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

Disciplined Perception: Massively Open Online Courses

Page 8: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

Scared!

This sounds familiar!

Logistic regression isn’t so ba-

Ack, Discrete Time what now?

Whoa, fixed and random effects?

Clustering… seems intuitive

Principal components?!

Final project

The Flow of S-052. Two steps forward. One step back.

Page 9: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

I. Research Questions and Data Sets• What predicts attrition in massively open online

courses?• Do teacher qualifications have a particularly

strong impact when female teachers teach girls?• What are the common characteristics of Academy

Award winning actors and movies over their competition?

Lectures with your questions:Active participation is encouraged, time permitting

II. Delve into the new statistical content that the RQs (and the unit) demands

• What aspect of the model do we need to learn more about?

• How do we represent this aspect of the model algebraically & graphically?

• What assumptions are we making (and how do we evaluate whether these make sense?)

III. Interpreting & presenting results• How do we interpret computer output?• What conclusions can we draw—and what

conclusions don’t necessarily follow?• How do we write up our results—in words,

graphs, tables, PowerPoints?• How do we communicate results to both technical

and non-technical audiences?

Each unit has a three-part structure

Note-taking:On laptops (in laptop

zones at the edges or the back of the lecture hall) or

printouts of handouts

Please be courteous:No cellphones, email, websurfing, IM, texting or other electronic distractions during class

How you’ll spend your time in S-052, Part I: What we’ll do in class

Page 10: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

Assignments• Six homework assignments, consisting of

one or more datasets & questions that guide you through a complete analysis (1/2 of your grade). Submitting assignments in pairs is mandatory for all assignments!

• One final exam, completed individually, will give you a chance to review all the material in a comprehensive series of analyses (1/2 of your grade).

Individual and group work• Our strong emphasis on collaboration is a

reflection our philosophy that learning statistics is like learning a language and must therefore be spoken actively and in a participatory context.

• Also reflects the realities of today’s team-driven statistical practice.

• Work in study groups as you’d like, but write and submit HWs as pairs.

• The final exam must be completed individually.

Course website: http://isites.harvard.edu/icb/icb.do?keyword=k92522Instructor Office Hours:http://andrew-ho-office-hours.wikispaces.com

How you’ll spend your time in S-052, Part II: What you’ll do outside of class

Weekly Sections• All students will have a “homeroom” section

and TF on Tuesday, Wednesday, or Thursday afternoon, to be scheduled via a doodle poll.

• Sections both reinforce and supplement lecture content. There will be Stata labs, additional examples, and opportunities for questions.

• Attendance is not mandatory but strongly, strongly encouraged.

Page 11: Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.

1. Make sure you have the prerequisites

• A solid regression class (S-030, S-040, or equivalent)

• Experience fitting regression models with statistical software (Stata or other)

5. Decide how you want to access Stata

• Visit the LTC on Gutman 3• Google “HGSE ordering

Stata”• Think about whether it

makes sense for you to purchase a Stata license.

4. Familiarize yourself with the S-052 website

• Bookmark the site: http://isites.harvard.edu/icb/icb.do?keyword=k92522

• Read the syllabus—it includes many more details and represents our learning contract.

6. Get used to accessing the handouts before class.

• I’ll be posting the 1st handout to the website before class next week.

• You don’t have to read it; but you may find it helpful to bring it.

3. Read the School’s policy on plagiarismAll written work submitted is to be in your own words or those of your partner.

2. Register for the course:http://

www.gse.harvard.edu/about/administration/registration/cross_registration.html

Note that GSAS, HBS, HLS, HMD, HSDM, GSD, HDS and HPSH students must fill out a new online cross-registration form. Hope to see you next

Tuesday, 10AM, in Larsen G08!

Six things you should do before the first class meeting, next Tuesday