Stat 139 Lec01 - An Overview - 4 Per Page
Click here to load reader
-
Upload
hectorcflores1 -
Category
Documents
-
view
212 -
download
0
Transcript of Stat 139 Lec01 - An Overview - 4 Per Page
7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page
http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 1/8
1/25/2015
1
2
Lecture 1 Outline
• Course logistics and details
• What is Stat 139?• A few example problems
• A quick R demonstration
3
Stat 139
• Prereq’s:
• AP Stat, Stat 100, 101, 102, 104, (Intro Stat);
or 110 (Intro Prob).
• Math 21a & 21b (Multivariable Calculus &
Linear Algebra)
• The course material goes further than what
you learned in an intro stat course, and
addresses the question: what happens when
assumptions are not met ?
4
Kevin’s Contact Info
• Kevin’s office: Science Center, Room SC-614
• Office Hours:
• Mon 11am-noon and Fri 11am-noon
• Also by appointment (via email)
• Phone numbers:
• Statistics Department: (617) 495-5496
• My office (SC-614): (617) 495-8711
• Email: [email protected](best way to get a
hold of me).
7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page
http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 2/8
1/25/2015
2
Teaching Staff
•
Teaching Fellows (may not be complete): – Ryan Lee: [email protected]
– Patrick Xu: [email protected]
• Teaching fellows will be teaching sections, holding
office hours, answering questions via email, and
grading HW’s and exams.
5
6
Course Website• Course website:
https://canvas.harvard.edu/courses/2421
• There you will find (eventually):• Syllabus• Administrative Announcements• Lecture Notes• R Tutorial (including download and install
instructions)• Assigned Homeworks
• HW #1
• Other Study Material (practice exams, web links,etc...)
7
Class Meetings• Lectures:
• Mon, Wed, & Fri, 10am – 11pm, SC-Hall E
• Sections
• Optional (but strongly recommended) weekly section todiscuss homework, do extra problems, and reviewdifficult concepts.
• Held mostly Wed, and Thurs afternoons
• No sections this week (begin week of Feb. 2).
• Look for announcement on the course website for permanent times (OH’s too).
8
Lecture Notes
• Paper copies will NOT be handed out at the beginning of
lecture after this week (we will provide copies on Wed and
Fri).
• They will loosely follow the order of the text, and willreference specific sections in the text.
• Lecture notes will be posted online at least 24 hours in
advance. An email will be sent when they are posted.
• Notes are very concise – you are encouraged to add your
own annotations and develop your own notes.
• Occasionally mistakes appear in lecture notes; corrected
versions will be posted after class.
7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page
http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 3/8
1/25/2015
3
9
Recommended Textbook(not required)
• Statistical Sleuth: A Course in Methods of Data Analysis,Ramsey & Schafer, 3rd edition. Amazon Link:
www.amazon.com/Statistical-Sleuth-Course-Methods-Analysis/dp/1133490670
• Some of the assigned homework problems will beassigned from the text, but will always be reproduced foryou on the assignment.
• From time to time, specific reading assignments may comefrom the text as well.
• Exams will be based on the lectures directly, and nothingnew from the text, besides the specific readings, not seenin the lectures, notes, or HW’s.
R Software (+ RStudio)
• R will be used throughout the course and it is requiredon most homework assignments (including the first).
•
Reasons for R: – Completely free software. Can be downloaded from
http://cran.r-project.org/
– Available PC, MAC, Linux, and even Iphone and Ipad!
– Flexible stat toolkit, access to cutting-edge methods, powerful graphics capabilities, large and vibrantcommunity, unlimited possibilities.
• RStudio helps organize/streamline the program:http://www.rstudio.com/
• Tutorials this week and early next week
10
R Help Guides(all found on course website)
• On the course website:
https://canvas.harvard.edu/courses/2421/files/folder/R+Guides
• R for Beginners. by E.Paradis
• Using R for Data Analysis and Graphics. Introduction, Code
and Commentary by JH Maindonald
• Simple R - Using R for Intro Statistics by J. Verzani
• The R Guide by W.J. Owen
• An Introduction to R by LH Lam
• Comprehensive introduction to R:
http://cran.r-project.org/doc/manuals/R-intro.pdf
11
12
Exams
• 2 Midterms, both 10-11am (in class)
Wed, March 4th
Wed, April 10th
• Final Exam, Date and Time TBD (May 8 – 16)
• You will be allowed one reference sheet for the
first midterm, 2 sheets for the second midterm, and
3 sheets for the final exam.
7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page
http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 4/8
1/25/2015
4
13
Homeworks
• Posted to course website on Fridays. Due the followingFriday
• HW #1 will be posted soon, and will be due Friday,
Feb. 6
• Hard Copies must be handed in to the 3rd floor HW boxes.
• We allow one late HW, no questions asked (due by the beginning of the following lecture).
• Other late homework will only be accepted with anofficial University excuse (either from UHS or fromyour resident dean’s office). NO HW Scores will bedropped!
14
HW Collaboration
• You are encouraged to discuss homework with other
students (and with the instructor and TFs, of course),
but you must write your final answers yourself, in your
own words.
• Solutions prepared “in committee” or by copying or
paraphrasing someone else’s work are not acceptable;
your handed-in assignment must represent your own
thoughts. All computer output you submit must come
from work that you have done yourself.
• Please indicate on your problem sets the names of
the students with whom you worked.
15
Course Grading
Component Weigh t1 Weigh t2 Weight3
Homework 30% 30% 30%
Project 15% 15% 15%
Midterm 20% 5% 20%
Midterm 5% 20% 20%
Final Exam 30% 30% 15%
Total 100% 100% 100%
Your overall score for the course will be the highest of the 3 weightingschemes presented above. Final course letter grades are not assignedaccording to a fixed pe rcentages of A's, B's, etc…
16
Lecture 1 Outline
• Course logistics and details
• What is Stat 139?
• A few example problems
• A quick R demonstration
7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page
http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 5/8
1/25/2015
5
Sleuthing: What’s in a word?
• Merriam-Webster (m-w.com):
– to act as a detective, search for information
–to search for and discover
• Urban dictionary:
– To play the role of detective, to gather facts andinformation usually in the traditional SherlockHolmes inconspicuous way.
– A proper Sleuth needs to be intelligent, witty, andalways a few steps ahead others. …. His wisdomis his greatest asset.
17
Richard D. De Veaux, Paul F. Velleman, Amstat News, Sept 2008
Mathematics has a long history of prodigiesand geniuses, with many of the most
famous luminaries showing their geniusat remarkably early ages…
… but why not Statistics?
18
Course Goal: learn statistical judgment
1. Improve understanding of statistical reasoning and
measures of uncertainty.
2. Learn to translate long computer output to a shortsummary of results in scientific as well as common
languages.
3. Expand your statistical toolkit and, at the same time,
deepen the understanding.
Change the way your reason about the world.
19
Lies, Damned Lies, and Statistics
Reasons?
• Conclusion is not supported by the method used
(e.g., causation vs. association)• Assumptions of the method are not satisfied, i.e.,
the model does NOT fit the data (e.g., units are
not independent, relationship is nonlinear etc.)
• Unreliable source of the data themselves or poor
data collection techniques.
20
7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page
http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 6/8
1/25/2015
6
Course Goal #3, Expanded: Statistical
Toolkit and Understanding
a. Based on formulated question of interest, be able to
choose the appropriate statistics tool;
b. Know its assumptions and understand the consequences
of violating each of them;
c. Clean the data and prepare them for analysis;
d. Fit the chosen model using R and check model-fit
qualitatively and quantitatively;
e. Formulate exactly what can be inferred from the results
in a language common to all scientists as well as in
layman's terms;
f. Understand the limitations of the model.21 22
23
Lecture 1 Outline
• Course logistics and details
• What is Stat 139?
• A few example problems
• A quick R demonstration
Examples of misleading conclusions if
key statistical principles are ignored
Bethany L. Peters & Edward Stringham (2006). "No Booze? You
May Lose: Why Drinkers Earn More Money Than
Nondrinkers,“
Examining the General Social Survey, we find that self-reported
drinkers earn 10-14 percent more than abstainers, which
replicates results from other data sets.[…] .These results
suggest that social drinking leads to increased social capital.
• What could possibly go wrong with this argument?
• What are the relevant statistical principles or concepts here?
24
7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page
http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 7/8
1/25/2015
7
Space Shuttle Challenger crash in
1986
• Was caused by a failure of the O-rings used to control the flow
of fuel gasses.
• During the day of the launch the outside temperature was
unusually low (31◦F).
• The previous shuttles were launched at temperatures between
53◦F and 81◦F.
• Statistical model showed association between cold
temperatures and O-ring failures, but the evidence was not
conclusive (partially, due to small sample size).
What are the relevant statistical principles or concepts here?
25
Subprime mortgage crisis
• In 2007, the US economy entered a mortgage crisis followed by a recession.
• A proximate cause was the rise in subprime lending.
• Many subprime loans were packaged into mortgage-backedsecurities (MBS) and ultimately defaulted.
• Subsequently, some flaws were highlighted in models used to price and rate securities based on mortgages:
– Assumptions on housing prices,
– Assumptions on correlation between defaults.
26
27
Lecture 1 Outline
• Course logistics and details
• What is Stat 139?
• A few example problems
• A quick R demonstration
An R Demonstration
• A friend of mine said that this winter has been much
milder than last year, to date.
• Is there evidence of this in the data?
• How should we collect the data?
• What summary statistics should we measure?
• What comparison should we make?
• What statistical model or test should we use?
28
7/23/2019 Stat 139 Lec01 - An Overview - 4 Per Page
http://slidepdf.com/reader/full/stat-139-lec01-an-overview-4-per-page 8/8
1/25/2015
8
An R Demonstration (cont.)f = file.choose()
data = read.csv(f)
n = dim(data)[1]
data$maxtemp = data$Max.TemperatureF
winter15=data[data$Date >= "2015-01-
01",]
winter14=data[data$Date >= "2014-01-01"
& data$Date <= (data$Date[n]-365),]
# Visualize the data
boxplot(winter14$maxtemp,winter15$maxtem
p,col=c("rosybrown","green3"))
# As a 2-sample unpooled t-test
t.test(winter14$maxtemp,winter15$maxtemp
)
# As a 2-sample pooled t-test
t.test(winter14$
maxtemp,winter15$maxtemp, var.equal=T)
# As a 2-sample paired t-test
t.test(winter14$Max.TemperatureF
,winter15$Max.TemperatureF, paired=T)
# As a Rank Sum test
w.test=wilcox.test(winter14$maxtemp
,winter15$maxtemp)
# As a Resampled test
diff.obs=mean(winter14$maxtemp)-
mean(winter15$maxtemp)
combined.sample=c(winter14$maxtemp,
winter15$maxtemp)
nsims=10000
diff.sim=rep(NA,nsims)
for(i in 1:nsims){
resampled.temp=sample(combined.sample,le
ngth(combined.sample))
diff.sim[i]=mean(resampled.temp[1:length
(winter14$maxtemp)])-
mean(resampled.temp[(length(winter14$max
temp)+1):length(combined.sample)])
}
mean(abs(diff.sim)>abs(diff.obs))
29
Some Logistical Details• R tutorials this week and next: Wed, Jan 28 – Mon, Feb 2.
Very basic introduction. Note: schedule may change a bit.
– Wed: 7-8pm in SC-107.
– Thurs: 7-8pm in Hall A, 8-9 & 9-10pm in SC-B09
– Fri: 12-1, 1-2, 2-3pm in SC-B09
– Sun: 4-5pm in SC-B09
– Mon: 7-8pm in SC-B09
• Sections will begin next week (Feb 2).
• TF OH schedule to come; starts Feb 2.
• First HW due next Friday, 2/6 @ 2pm. Will be posted bythe end of this week.
30
The Last Word
• Correlation does not imply causation, but it doeswaggle its eyebrows suggestively and gesturefurtively while mouthing look over there .
31