Stat 139 Lec01 - An Overview - 4 Per Page

8

Click here to load reader

Transcript of Stat 139 Lec01 - An Overview - 4 Per Page

Page 1: Stat 139 Lec01 - An Overview - 4 Per Page

7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page

http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 1/8

1/25/2015

1

2

Lecture 1 Outline

• Course logistics and details

• What is Stat 139?• A few example problems

• A quick R demonstration

3

Stat 139

• Prereq’s:

• AP Stat, Stat 100, 101, 102, 104, (Intro Stat);

or 110 (Intro Prob).

• Math 21a & 21b (Multivariable Calculus &

Linear Algebra)

• The course material goes further than what

you learned in an intro stat course, and

addresses the question: what happens when

assumptions are not met ?

4

Kevin’s Contact Info 

• Kevin’s office: Science Center, Room SC-614

• Office Hours:

• Mon 11am-noon and Fri 11am-noon

• Also by appointment (via email)

• Phone numbers:

• Statistics Department: (617) 495-5496

• My office (SC-614): (617) 495-8711

• Email: [email protected](best way to get a

hold of me).

Page 2: Stat 139 Lec01 - An Overview - 4 Per Page

7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page

http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 2/8

1/25/2015

2

Teaching Staff

Teaching Fellows (may not be complete): – Ryan Lee: [email protected] 

 – Patrick Xu: [email protected]

• Teaching fellows will be teaching sections, holding

office hours, answering questions via email, and

grading HW’s and exams. 

5

6

Course Website• Course website: 

https://canvas.harvard.edu/courses/2421  

• There you will find (eventually):• Syllabus• Administrative Announcements• Lecture Notes• R Tutorial (including download and install

instructions)• Assigned Homeworks

• HW #1 

• Other Study Material (practice exams, web links,etc...)

7

Class Meetings• Lectures:

• Mon, Wed, & Fri, 10am – 11pm, SC-Hall E

• Sections

• Optional (but strongly recommended) weekly section todiscuss homework, do extra problems, and reviewdifficult concepts.

• Held mostly Wed, and Thurs afternoons

•  No sections this week (begin week of Feb. 2).

• Look for announcement on the course website for permanent times (OH’s too). 

8

Lecture Notes

• Paper copies will NOT be handed out at the beginning of

lecture after this week (we will provide copies on Wed and

Fri).

• They will loosely follow the order of the text, and willreference specific sections in the text.

• Lecture notes will be posted online at least 24 hours in

advance. An email will be sent when they are posted.

•  Notes are very concise –  you are encouraged to add your

own annotations and develop your own notes.

• Occasionally mistakes appear in lecture notes; corrected

versions will be posted after class.

Page 3: Stat 139 Lec01 - An Overview - 4 Per Page

7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page

http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 3/8

1/25/2015

3

9

Recommended Textbook(not required)

• Statistical Sleuth: A Course in Methods of Data Analysis,Ramsey & Schafer, 3rd edition. Amazon Link:

www.amazon.com/Statistical-Sleuth-Course-Methods-Analysis/dp/1133490670 

• Some of the assigned homework problems will beassigned from the text, but will always be reproduced foryou on the assignment.

• From time to time, specific reading assignments may comefrom the text as well.

• Exams will be based on the lectures directly, and nothingnew from the text, besides the specific readings, not seenin the lectures, notes, or HW’s. 

R Software (+ RStudio)

• R will be used throughout the course and it is requiredon most homework assignments (including the first).

Reasons for R: – Completely free software. Can be downloaded from

http://cran.r-project.org/ 

 – Available PC, MAC, Linux, and even Iphone and Ipad!

 – Flexible stat toolkit, access to cutting-edge methods, powerful graphics capabilities, large and vibrantcommunity, unlimited possibilities.

• RStudio helps organize/streamline the program:http://www.rstudio.com/ 

• Tutorials this week and early next week

10

R Help Guides(all found on course website)

• On the course website:

https://canvas.harvard.edu/courses/2421/files/folder/R+Guides

• R for Beginners. by E.Paradis

• Using R for Data Analysis and Graphics. Introduction, Code

and Commentary by JH Maindonald

• Simple R - Using R for Intro Statistics by J. Verzani

• The R Guide by W.J. Owen

• An Introduction to R  by LH Lam

• Comprehensive introduction to R:

http://cran.r-project.org/doc/manuals/R-intro.pdf  

11

12

Exams

• 2 Midterms, both 10-11am (in class)

Wed, March 4th 

Wed, April 10th 

• Final Exam, Date and Time TBD (May 8 –  16)

• You will be allowed one reference sheet for the

first midterm, 2 sheets for the second midterm, and

3 sheets for the final exam.

Page 4: Stat 139 Lec01 - An Overview - 4 Per Page

7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page

http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 4/8

1/25/2015

4

13

Homeworks

• Posted to course website on Fridays. Due the followingFriday

• HW #1 will be posted soon, and will be due Friday,

Feb. 6 

• Hard Copies must be handed in to the 3rd floor HW boxes.

• We allow one late HW, no questions asked (due by the beginning of the following lecture).

• Other late homework will only be accepted with anofficial University excuse (either from UHS or fromyour resident dean’s office). NO HW Scores will bedropped!

14

HW Collaboration

• You are encouraged to discuss homework with other

students (and with the instructor and TFs, of course),

 but you must write your final answers yourself, in your

own words.

• Solutions prepared “in committee” or by copying or

 paraphrasing someone else’s work are not acceptable;

your handed-in assignment must represent your own

thoughts. All computer output you submit must come

from work that you have done yourself.

• Please indicate on your problem sets the names of

the students with whom you worked.

15

Course Grading

Component Weigh t1 Weigh t2 Weight3

Homework 30% 30% 30%

Project 15% 15% 15%

Midterm 20% 5% 20%

Midterm 5% 20% 20%

Final Exam 30% 30% 15%

Total 100% 100% 100%

Your overall score for the course will be the highest of the 3 weightingschemes presented above. Final course letter grades are not assignedaccording to a fixed pe rcentages of A's, B's, etc… 

16

Lecture 1 Outline

• Course logistics and details

• What is Stat 139?

• A few example problems

• A quick R demonstration

Page 5: Stat 139 Lec01 - An Overview - 4 Per Page

7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page

http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 5/8

1/25/2015

5

Sleuthing: What’s in a word? 

• Merriam-Webster (m-w.com):

 – to act as a detective, search for information

 –to search for and discover

• Urban dictionary:

 – To play the role of detective, to gather facts andinformation usually in the traditional SherlockHolmes inconspicuous way.

 – A proper Sleuth needs to be intelligent, witty, andalways a few steps ahead others. …. His wisdomis his greatest asset.

17

Richard D. De Veaux, Paul F. Velleman, Amstat News, Sept 2008 

 Mathematics has a long history of prodigiesand geniuses, with many of the most

 famous luminaries showing their geniusat remarkably early ages… 

… but why not Statistics? 

18

Course Goal: learn statistical judgment

1. Improve understanding of statistical reasoning and

measures of uncertainty.

2. Learn to translate long computer output to a shortsummary of results in scientific as well as common

languages.

3. Expand your statistical toolkit and, at the same time,

deepen the understanding.

Change the way your reason about the world.

19

Lies, Damned Lies, and Statistics

Reasons?

• Conclusion is not supported by the method used

(e.g., causation vs. association)• Assumptions of the method are not satisfied, i.e.,

the model does NOT fit the data (e.g., units are

not independent, relationship is nonlinear etc.)

• Unreliable source of the data themselves or poor

data collection techniques.

20

Page 6: Stat 139 Lec01 - An Overview - 4 Per Page

7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page

http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 6/8

1/25/2015

6

Course Goal #3, Expanded: Statistical

Toolkit and Understanding

a. Based on formulated question of interest, be able to

choose the appropriate statistics tool;

 b. Know its assumptions and understand the consequences

of violating each of them;

c. Clean the data and prepare them for analysis;

d. Fit the chosen model using R  and check model-fit

qualitatively and quantitatively;

e. Formulate exactly what can be inferred from the results

in a language common to all scientists as well as in

layman's terms;

f. Understand the limitations of the model.21 22

23

Lecture 1 Outline

• Course logistics and details

• What is Stat 139?

• A few example problems

• A quick R demonstration

Examples of misleading conclusions if

key statistical principles are ignored

Bethany L. Peters & Edward Stringham (2006). "No Booze? You

May Lose: Why Drinkers Earn More Money Than

 Nondrinkers,“ 

 Examining the General Social Survey, we find that self-reported

drinkers earn 10-14 percent more than abstainers, which

replicates results from other data sets.[…] .These results

 suggest that social drinking leads to increased social capital.

• What could possibly go wrong with this argument?

• What are the relevant statistical principles or concepts here?

24

Page 7: Stat 139 Lec01 - An Overview - 4 Per Page

7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page

http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 7/8

1/25/2015

7

Space Shuttle Challenger  crash in

1986

• Was caused by a failure of the O-rings used to control the flow

of fuel gasses.

• During the day of the launch the outside temperature was

unusually low (31◦F).

• The previous shuttles were launched at temperatures between

53◦F and 81◦F.

• Statistical model showed association between cold

temperatures and O-ring failures, but the evidence was not

conclusive (partially, due to small sample size).

What are the relevant statistical principles or concepts here?

25

Subprime mortgage crisis

• In 2007, the US economy entered a mortgage crisis followed by a recession.

• A proximate cause was the rise in subprime lending.

• Many subprime loans were packaged into mortgage-backedsecurities (MBS) and ultimately defaulted.

• Subsequently, some flaws were highlighted in models used to price and rate securities based on mortgages:

 – Assumptions on housing prices,

 – Assumptions on correlation between defaults.

26

27

Lecture 1 Outline

• Course logistics and details

• What is Stat 139?

• A few example problems

• A quick R demonstration

An R Demonstration

• A friend of mine said that this winter has been much

milder than last year, to date.

• Is there evidence of this in the data?

• How should we collect the data?

• What summary statistics should we measure?

• What comparison should we make?

• What statistical model or test should we use?

28

Page 8: Stat 139 Lec01 - An Overview - 4 Per Page

7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page

http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 8/8

1/25/2015

8

An R Demonstration (cont.)f = file.choose()

data = read.csv(f)

n = dim(data)[1]

data$maxtemp = data$Max.TemperatureF

winter15=data[data$Date >= "2015-01-

01",]

winter14=data[data$Date >= "2014-01-01"

& data$Date <= (data$Date[n]-365),]

# Visualize the data

boxplot(winter14$maxtemp,winter15$maxtem

p,col=c("rosybrown","green3"))

# As a 2-sample unpooled t-test

t.test(winter14$maxtemp,winter15$maxtemp

)

# As a 2-sample pooled t-test

t.test(winter14$

maxtemp,winter15$maxtemp, var.equal=T)

# As a 2-sample paired t-test

t.test(winter14$Max.TemperatureF

,winter15$Max.TemperatureF, paired=T)

# As a Rank Sum test

w.test=wilcox.test(winter14$maxtemp

,winter15$maxtemp)

# As a Resampled test

diff.obs=mean(winter14$maxtemp)-

mean(winter15$maxtemp)

combined.sample=c(winter14$maxtemp,

winter15$maxtemp)

nsims=10000

diff.sim=rep(NA,nsims)

for(i in 1:nsims){

resampled.temp=sample(combined.sample,le

ngth(combined.sample))

diff.sim[i]=mean(resampled.temp[1:length

(winter14$maxtemp)])-

mean(resampled.temp[(length(winter14$max

temp)+1):length(combined.sample)])

}

mean(abs(diff.sim)>abs(diff.obs))

29

Some Logistical Details• R tutorials this week and next: Wed, Jan 28  –  Mon, Feb 2.

Very basic introduction. Note: schedule may change a bit.

 – Wed: 7-8pm in SC-107.

 – Thurs: 7-8pm in Hall A, 8-9 & 9-10pm in SC-B09

 – Fri: 12-1, 1-2, 2-3pm in SC-B09

 – Sun: 4-5pm in SC-B09

 – Mon: 7-8pm in SC-B09

• Sections will begin next week (Feb 2).

• TF OH schedule to come; starts Feb 2.

• First HW due next Friday, 2/6 @ 2pm. Will be posted bythe end of this week.

30

The Last Word

• Correlation does not imply causation, but it doeswaggle its eyebrows suggestively and gesturefurtively while mouthing look over there .

31