Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1...

49
Copyright (c) Bani Mallic k 1 Lecture 1 STAT 651
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    2

Transcript of Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1...

Page 1: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 1

Lecture 1

STAT 651

Page 2: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 2

Topics in Lecture #1 Welcome and basic mechanics of the

course

Samples and populations

Relative frequency histograms

The sample mean

Page 3: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 3

Book Sections Covered in Lecture #1

Chapter 1

Chapter 3.3, pages 46-53

Page 4: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 4

The Web Site

Go to http://stat.tamu.edu/~bmallick/651/651.html

Please make sure to check the web site regularly for notes from me and the TA

I apologize in advance for any typos ()

Page 5: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 5

Emails and all that The TA will answer detailed questions about homework

Check the web site for the TA name, office, email address and office hours

My email ([email protected]) should only be used as a last resort, or to set appointments (Spring 2004 only).

Page 6: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 6

Office Hours (Spring 2003 only)

My office hours are as follows (Spring 2003)

Tuesdays: 11:00-12:30, 4:00-5:00

Thursdays: 11:00-12:30

The TA will also have office hours

I am not available outside the office hours

Page 7: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 7

Printing The Lectures The lectures are set up as PowerPoint files. You can download them from the STAT651 web site

You can print them 2 or 3 per page

Go to “file”, then “print”. A little box will open, and in the bottom left you will see “print what”. It should simply say “slides”, but you can click to open the available “handout” options

Page 8: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 8

Other Web Material

All homework assignments

All data sets

Page 9: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 9

Who Am I?

You can check out my personal web site

http://stat.tamu.edu/~bmallick

Page 10: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 10

Course Mechanics

Exams (3).

You are encouraged to prepare “cheat sheets”, 3 pages for each exam.

No formulae memorization: this is an applied statistics class

Page 11: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 11

Course Mechanics

You may bring the book to exams, but the cheat sheets will be more useful.

I will expect you to be able to interpret computer output: both mechanically and conceptually

The exams are multiple choice. Exam scores are curved.

Page 12: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 12

Course Mechanics We will use SPSS.

SPSS is available throughout campus

Once you learn SPSS, other packages such as SAS will be easy

The TA can give you help with SPSS

Page 13: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 13

SPSS

• You are entitled to get SPSS at no additional cost

• Go to http://cis.tamu.edu/customer-sales/sell/student.php (Ignore any statement about cost)

• Go to http://stat.tamu.edu/~mspeed/spss for help

Page 14: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 14

Course Mechanics

Course rules (dates of exams, percent they count, policy on late homework, policy on missed exams) are available at the class web site.

Please print them out and please read them.

Page 15: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 15

NHANES

National Health and Nutrition Examination Survey

The major survey whereby the federal government monitors nutrition and health in the U.S.

I will focus on women aged 30-50

First some important definitions

Page 16: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 16

NHANES

Population: The entire collection of individuals of interest

In NHANES, the population is all women in the U.S. aged 30-50

Since there are millions of such women, it is impractical to figure out the health and nutrition for all of them: it would cost billions of dollars to do so

Page 17: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 17

NHANES

Sample: A subset of the population that is measured in lieu of measuring everyone in the population

Since we want the sample to represent the population, the goal is to make sure we sample a representative subset of the population

In NHANES, women were sampled at random from the population, the randomness meant to ensure that the sample is representative.

Page 18: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 18

Samples and Populations

Warning: I will make a big deal about the difference between samples and populations

You will be asked multiple questions on every exam about this distinction.

They will be phrased in various ways.

This is the conceptually hardest part of this course

The sample is not the population: learn this!

Page 19: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 19

Variables

What we measure: variables are things that we measure in a sample and population

They can be numerical: your height

They can be binary: your gender

The can be categorical: preference in soft drinks (Pepsi, Coke, Dr. Pepper, None, Other)

Page 20: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 20

Random Variation

Different samples lead to different outcomes: This is a hard conceptual point

First we will do an experiment, then discuss the implications

Page 21: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 21

Random Variation

Different samples lead to different outcomes: consider heights of males in this class

Sample #1: males whose SSN’s end in 1,2,3 or 4

Sample #2: males whose SSN’s end in 6,7,8 or 9

Note how the numbers will not be identical

Page 22: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 22

Random Variation

Different samples lead to different outcomes: samples do not equal populations

One of the main goals of statistics: ascertain how far a sample result is from the population result

For example, how far is the sample mean height of 10 males from the population mean height?

This will require probability statements

Page 23: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 23

A Warning!

Fancy statistical methods cannot rescue garbage data

Fancy statistical methods can help you gain insight into your data, over and above what seems obvious on its face

You should always worry about whether the sampled results are representative of the population, and whether your sample allows you to make inferences about the population.

Page 24: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 24

Histograms

A graphical means of looking at a sample from a population.

Can be used to compare two populations.

Allows you to judge central tendency, variation, and other odd features of the data

A very useful graphical tool

Page 25: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 25

Relative Frequency Histograms

Simplest graphical technique to describe a sample.

Divide range of variable into intervals of nearly equal length.

Plot the % of the data which falls in each interval.

Computers have various ways of choosing the intervals.

You’ll not do these by hand, ever, with me

Page 26: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 26

Relative Frequency Histograms

Numerical Example: ages 26,29,30,34,37,38,39,41,43,45

Interval (selected arbitrarily by me): 26-30 31-35 36-40 41-45 46-50

Count # in each interval: 3 1 3 3 0

Compute % in each interval (relative frequencies): 30 10 30 30 0

Page 27: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 27

NUMERICAL EXAMPLEIntervals 26-30 31-35 36-40 41-45 46-50

% in interval 30 10 30 30 0

0

5

10

15

20

25

30

26-30 31-35 36-40 41-45 46-50

Page 28: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 28

NHANES

Two subpopulations (yes, populations can have subpopulations)

Subpopulation #1: All women in U.S. aged 30-50 and healthy in 1980 who developed breast cancer by 1995

Subpopulation #2: All women in U.S. aged 30-50 and healthy in 1980 who did not develop breast cancer by 1995

Page 29: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 29

NHANES

Two samples

Sample #1: 59 women in U.S. aged 30-50 and healthy in 1980 who developed breast cancer by 1995

Sample #2: 60 women in U.S. aged 30-50 and healthy in 1980 who did not develop breast cancer by 1995

Page 30: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 30

NHANES

One Variable Measured on each (sub)population

Saturated Fat intake in Diet:

This was measured by a 24-hour recall: they asked each women once what they had eaten the previous day, and computed saturated fat

This is a terrible measure of saturated fat intake (garbage data?), but all that is available

I would have done multiple days, at least

Page 31: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 31

NHANES: What do we Expect?

Saturated Fat intake in Diet:

One would expect that the women who developed breast cancer tended to have higher levels of saturated fat in their diet.

What do the relative frequency histograms say?

Page 32: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 32

NHANES Saturated Fat

Relative Frequency Histograms

(the scales are the same). What

do you see?

0%

10%

20%

30%

Per

ce

nt

Cancer

Healthy

25 50 75 1000%

10%

20%

30%

Per

ce

nt

Page 33: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 33

NHANES log(Saturated Fat) Relative Frequency Histograms

(the scales are the same). What

do you see?

0%

5%

10%

15%

Per

cen

t

Cancer

Healthy

2.00 3.00 4.0

Log(Saturated Fat)

0%

5%

10%

15%

Per

cen

t

Page 34: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 34

Construction in SPSS

• I will now show you a few things about SPSS

Page 35: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 35

Construction in SPSS

• Select graphs in SPSS menu• Select interactive• Select Histogram• Select percent instead of count for a

relative frequency histogram• Place variable of interest on X-axis

Page 36: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 36

Construction in SPSS

• Select variable defining the populations and put it in “Panel Variables”

• The histograms will be side-by-side. I like them one on top of the other

• Double click on graph (may need to do this twice)

• A menu will pop up, go to “Arrangement”

Page 37: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 37

Construction in SPSS

• Select “Down then Across”• Then take over to PowerPoint (copy and

paste)• Click on the histogram in your

PowerPoint presentation, and convert it to a Microsoft picture

• Change sizes, and edit as you wish

Page 38: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 38

What Histograms Say

• Because each box is a relative frequency (percentage), you can use a histogram to learn a few things about the population

• You can also use them to compare two populations

• Whether one population has generally larger values

• Whether one population is more closely clumped

Page 39: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 39

What percentage of

the healthy women ate less than 25 grams of saturated

fat?

0%

10%

20%

30%

Per

ce

nt

Cancer

Healthy

25 50 75 1000%

10%

20%

30%

Per

ce

nt

Page 40: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 40

What percentage of

the healthy women ate less than 25 grams of saturated

fat?

Look at the 3 bars, of

about 18%, 20% and

28%, for a total of about

66%

0%

10%

20%

30%

Per

ce

nt

Cancer

Healthy

25 50 75 1000%

10%

20%

30%

Per

ce

nt

Page 41: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 41

Histograms and Shifts:

note how bottom plot has higher values

Sample from population A

Sample frompopulationB

0%

4%

8%

12%

Per

cen

t

.00

1.00

-2.0000 -1.0000 0.0000 1.0000 2.0000 3.0000

response

0%

4%

8%

12%

Per

cen

t

Page 42: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 42

Histograms and

Variability: note how top plot

has more concentrated values

Sample from population A

Sample frompopulationB

0%

5%

10%

15%

20%

Per

cen

t

.00

1.00

-4.0000 -2.0000 0.0000 2.0000 4.0000

v2

0%

5%

10%

15%

20%

Per

cen

t

Page 43: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 43

The Population Mean

• In many problems, the goal is to make inference about the population mean of a numerical variable, e.g., saturated fat intake

• Define in words what you mean by the population mean!

Page 44: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 44

The Population Mean

• In many problems, the goal is to make inference about the population mean of a numerical variable, e.g., saturated fat intake

• You’re right! The population mean is the average of all the outcomes in the population

• It cannot be measured, hence we take samples.

• BTW, what’s an average?

Page 45: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 45

The Sample Mean

• Formal definition: If the sample is of size n and the data are X1,…, Xn , then the sample mean is

• This is the sum over all the observed values, divided by the number of observations

n

i1 2 n i=1

Σ ΧΧ +Χ +...ΧΧ= =

n n

Page 46: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 46

Sample Mean: Example

= the sum over all the observed values, divided by the number of observations

Data: -4, –2, –2, –1, 0, 0, 0, 0, 2, 2, 3, 5,

n =

sum =

=

X

X

Page 47: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 47

Sample Mean: Example

= the sum over all the observed values, divided by the number of observations

Data: -4, –2, –2, –1, 0, 0, 0, 0, 2, 2, 3, 5,

n = 12

sum =

=

X

X

Page 48: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 48

Sample Mean: Example

= the sum over all the observed values, divided by the number of observations

Data: -4, –2, –2, –1, 0, 0, 0, 0, 2, 2, 3, 5,

n = 12

sum = 3

=

X

X

Page 49: Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Copyright (c) Bani Mallick 49

Sample Mean: Example

= the sum over all the observed values, divided by the number of observations

Data: -4, –2, –2, –1, 0, 0, 0, 0, 2, 2, 3, 5,

n = 12

sum = 3

= 3/12 = 0.25

X

X