Stat 111 - Lecture 1 - Introduction 1
Welcome to Statistics 111
Alex Braunstein
The goal of this course is to develop basic tools for data analysis, probability and statistical methods. Key topics covered in the course include exploratory data analysis, regression, probability, estimation, and hypothesis
testing
Stat 111 - Lecture 1 - Introduction 2
Syllabus notes: website
• All handouts will be available on the website:
http://stat.wharton.upenn.edu/~braunsf/stat111.html
• Website also contains my contact information
• Link on website for getting Wharton class account if you are not a Wharton student• Helpful if you want to use Wharton computer labs
Stat 111 - Lecture 1 - Introduction 3
Syllabus notes: Homeworks
• Homeworks will be handed out at the beginning of every week • ~ 5 homeworks in all
• Homeworks will be submitted at the beginning of class on Mondays• You are encouraged to work together on homework, but
homeworks are to be completed separately and handed in individually.
• Do not copy from another person.
• No late homeworks will be accepted!!• Late homeworks will get a score of zero, without exception • Your lowest homework grade is not included in final grade
Stat 111 - Lecture 1 - Introduction 4
• Midterm is held on following date:
Monday, June 15th (in class)
• No makeup midterm examination!• A missing midterm exam counts as a zero score• Consider taking this class in the fall or spring
if you can not attend the midterm!
Syllabus Notes: Midterm Exam
Stat 111 - Lecture 1 - Introduction 5
Student Questionnaire
• Fill out a questionnaire and hand it in before the break
• I will try to incorporate some of the subjects that interest you into future lectures
Stat 111 - Lecture 1 - Introduction 6
Course Overview
Collecting Data
Exploring DataProbability Intro.
Inference
Comparing Variables Relationships between Variables
Means Proportions Regression Contingency Tables
1
2
34
2 1 1 1
Stat 111 - Lecture 1 - Introduction 7
Out in public: You do statistics ?!?
• I hated that class in college!
• That was the most boring class ever!
• Lame.
Stat 111 - Lecture 1 - Introduction 8
• Statistics is all about uncertainty• Focus as much on what we don’t know (or haven’t
observed) instead of what we know
• Formulating the question that we want to answer is often the most difficult part
• Statistics is part mathematics, part roll-up-your-sleeves-and-get-thinking.
Big Picture Ideas
Stat 111 - Lecture 1 - Introduction 9
Science and Skepticism
• We always need to be cautious about conclusions based on data• Possible sources of bias and confounding?• How might things have gone wrong?
• A little bit of skepticism is a good thing!
Stat 111 - Lecture 1 - Introduction 10
Statistical Modeling
• Inference: using mathematical models of uncertainty to answer questions
• Connect probability concepts to our data
• Can not make claims without using models and making assumptions
• Are the assumptions reasonable?
Stat 111 - Lecture 1 - Introduction 11
After the break
• Collecting Data: Design of Experiments
• Sections 3.1-3.2 in Moore, McCabe and Craig
• First couple of classes will not involve much math at all, but we will get into lots of data analysis after that!
Break!
• Hand in questionnaire
• 5 minutes
Stat 111 - Lecture 1 - Introduction 12
Stat 111 - Lecture 2 - Experiments
13
Outline for Second Half of Lecture
• Introduction to Experiments• Sources of Bias in Experiments• Techniques for Avoiding Bias
• Matching • Randomization• Block Designs• Blinding and Double-Blinding
• Experiments vs. Observational Studies• Association vs. Causation
Stat 111 - Lecture 2 - Experiments
14
Experiments
• Used to address a specific question• Often used to examine causal effects• Eg. medical trials, education interventions
Population ExperimentalUnits
Treatment Group
Control Group
Treatment
No Treatment
Result
Result
12 3 4
• Can we just look at difference in results to get the causal effect of the treatment?
• Depends on whether the experiment was done well• many possible sources of bias in design of experiments
Stat 111 - Lecture 2 - Experiments
15
Sources of Bias• An experiment or study is biased if it systematically favors a particular outcome
1. Subjects are not representative of the population2. Treatment and control groups are inherently different on some lurking or
confounding variable3. Subjects are influenced by knowing they are in treatment or control groups4. Evaluator of outcomes is influenced by knowing they are in treatment or control
groups
Population ExperimentalUnits
Treatment Group
Control Group
Treatment
No Treatment
Result
Result
12 3 4
Stat 111 - Lecture 2 - Experiments
16
Bias 1: Non-representative units
• If your subjects are not representative of the population, you won’t be able to generalize the results even if the experiment is well done
• Here are two examples• Treatment group: High Level NICUs• Control Group: Low Level NICUs• Problem: classification of NICU is different from
state to state, so a hospital that might qualify as a high level NICU in one state might not in another
• Observed differences between the groups can not be generalized from one state to another
Stat 111 - Lecture 2 - Experiments
17
Bias 2: Confounding/Lurking Variables
• Treatment group and control group are different on some variable that also influences the outcome
• A confounding variable means that we can’t attribute difference in outcomes to just the treatment – Part of the difference may be due to the confounding variable
not the treatment
• Simple example: a breast cancer drug trial where only women receive the treatment and only men receive the control• Gender becomes a confounding variable • Are treatment vs control outcomes different due to the
treatment or gender differences between groups?
Stat 111 - Lecture 2 - Experiments
18
Bias 3: Subject knows treatment assignment
• A subject’s outcome is influenced by knowing that he/she is in a treatment or control group• Eg. drug trials: patients improve just because they think they are
receiving the drug • Solution: blinded experiment with placebo• Placebo appears to be the treatment, so all subjects
(treatment and control) don’t know their true treatment assignment• Controls may improve outcomes slightly; this is often
called “the placebo effect”
Stat 111 - Lecture 2 - Experiments
19
Bias 4: Evaluator knows treatment assignment
• Person evaluating outcome (eg. doctor in drug trial) may also be influenced by knowing who receives treatment
• Not a problem if outcome is something indisputable, such as death!
• This is a problem for more subjective measures like pain reduction or results from social programs
• Solution: double-blinded experiment where neither subjects not evaluators know treatment assignments
Stat 111 - Lecture 2 - Experiments
20
Association vs Causation• In the presence of a confounding variable, we can only
conclude there is an association between treatment and outcome, not causation
Stat 111 - Lecture 2 - Experiments
21
Examples: “Reporters are stupid”
• Children who watch many hours of TV get lower grades in school on average than those who watch less TV• Does this mean that TV causes poor grades?• What are potential confounding variables?
• People who use artificial sweeteners in place of sugar tend to be heavier than people who use sugar• Does this mean that sweeteners cause weight gain? • What is probably happening here?
Stat 111 - Lecture 2 - Experiments
22
One solution: Matching
• Make sure that treatment and control groups are very similar on observed variables like race, gender, age etc.• Block designs: divide subjects into blocks with similar observed
variables before dividing them into treatment vs control
• Special case: Matched Pairs• Subjects are matched up into pairs, then one
member of each pair gets treatment and the other gets control
• Example: Dandruff experiment• treatment applied to one side and control
to other side of head• No reason to expect difference
in sides except for treatment
Stat 111 - Lecture 2 - Experiments
23
Another Solution: Randomization
• Problem with matching is that you cannot usually match on unobserved characteristics (eg. Genetics)• Eg. Cholesterol drug trial - can’t match treatment and control
groups on genetic predisposition for high cholesterol
• Randomly assign subjects to treatment or control
• Random assignment should lead to groups that are similar or balanced on both observed and unobserved confounding variables
• Example: student questionnaire earlier in class - each form you filled out was randomly assigned either a 1 or 2
Stat 111 - Lecture 2 - Experiments
24
Randomization of In-Class Survey
• Check to see if groups are balanced:
• There are differences, but are they “significant”?• Later on in the course, we will be able to answer questions like this
• Of course, we can’t check the balance for unobserved variables…we just have to trust the randomization process• This is why good science needs to be replicable
Variable Treatment Control
Average Height
Average Shoe Size
Average Number of Siblings
Stat 111 - Lecture 2 - Experiments
25
Even Better: Randomization + Matching• Randomization generally leads to treatment and control groups that are evenly balanced
but you can still get unlucky and get unbalanced groups
• Example: randomly placing 20 people (10 males, 10 females) into treatment and control groups.
• How many males will end up in treatment group?• Ideally, we would have 5 males in treatment group, and 5 males in control group
(balanced)• However, there is a chance to get 9 males in treatment and 1 male in control group
(unbalanced)
Stat 111 - Lecture 2 - Experiments
26
Even Better: Randomization + Matching• Randomized Blocks: randomize within blocks of
observed variables• Example:
• Divide up subjects into males and females first, then randomly assign treatment or control to subjects in each group separately
• Guarantees that equal number of males end up in treatment group and control group (same with females)
• Randomized Matched Pairs: randomly decide which member of each pair gets treatment vs. control
• Example:• For each head in dandruff experiment, randomly assign which
side of head to get dandruff shampoo vs. control
Stat 111 - Lecture 2 - Experiments
27
Experiments vs. Observational Studies
• Often, we want the causal effect of some treatment, but our data are from an observational study• Observational studies examine effects of some variable but
without the advantages of a controlled experiment • No treatment is applied in observational studies
• Example: health effects of smoking• Unethical to randomly impose a treatment• Could there be some confounding variable that explains
health differences between smokers and non-smokers ?
• Very risky to make causal statements from observational data, since we can not avoid bias!
Stat 111 - Lecture 2 - Experiments
28
• Report to European Society of Sexual Medicine:• 153 Italian women filled out sexual function
questionnaires• “intriguing correlation”: sexual function/desire
significantly greater among chocolate-eaters• Observational study: association does not imply
causation!• Confounding: average age is 35 among frequent
chocolate-eaters, compared with 40.4 in non-chocolate group
Health Effects of Chocolate
Stat 111 - Lecture 2 - Experiments
29
Next Class - Lecture 2
• Collecting Data: – Surveys and Sampling – Graphical summaries of a single variable
• Moore, McCabe and Craig: Sections 3.3 and 1.1
Top Related