Statistical thinking
-
Upload
mij1120 -
Category
Leadership & Management
-
view
10 -
download
0
Transcript of Statistical thinking
![Page 1: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/1.jpg)
I.4 Sampling Lecture Notes
1. Statistical Thinking
Statistical thinking will one day be as necessary for efficient cit-izenship as the ability to read and write. – H. G. Wells, authorof “War of the Worlds”
Definition: Statistics is the science of collecting, analyzing, and interpretingdata in such a way that the conclusions can be objectively evaluated.
2. Three Phases of Statistics
• Collect the data• Analyze the data
– order the data– graphical displays– numerical calculations (such as mean and standard dev)
• Interpret the results– use proper statistical techniques to substantiate or refute hypothe-
sized statements– match data to the appropriate technique– determine whether the proper assumptions are satisfied
3. Two types of statistics
• Descriptive statistics – summarize and describe a characteristic forsome group
• Inferential statistics – estimate, infer, predict, or conclude somethingabout a larger group
4. Examples
Descriptive Inferential
Batting Average PollsYards Per Carry Medical StudiesTest Scores Market Surveys
1
![Page 2: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/2.jpg)
2
5. Two types of data
• Quantitative data – values recorded on a natural numerical scale• Qualitative data – classified into categories
6. Quantitative Data
• Weight of subjects in medical sample• Height of buildings in Chicago• Temperatures per day at Antarctica Weather Station
7. Qualitative Data
• Gender of subjects in medical sample• Political affilation of respondents in a poll survey• Class (fresh, soph, jr, sr) of Math 101 students
8. Vocabulary
• The population is the entire set of objects (people or things) underconsideration.
• A sample is a subset of the population that is available for the analysis.• A bias is a favoring of certain outcomes over others.• A census collects data from each member of the population.• A statistic is a statement of numerical information about a sample.• A parameter is a statement of numerical information about a popula-
tion.
9. Census versus Sample
Would you use a census or a sample to determine the following:
• Project the winner of an election• Calculate a baseball player’s batting average• Predict whether it will rain tomorrow
![Page 3: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/3.jpg)
3
• Test whether the soup is too salty• Calculate Shaq’s free throw average• Use a market study to determine a new flavor of toothpaste• Report the Dow Jones Average• Generalize a medical study to other groups• The average score on the first test
10. Dealing with bias
Bias in some form occurs in the collecting of most, if not all, sets of data.
The bias may come from
• the portion of the population surveyed• the phrasing of the questions
11. Examples
• “Dewey defeats Truman” projection of Chicago Tribune based on 1948telephone poll
• “Are you in favor of Illinois banning cell phones in cars? Dial *91 onyour cellular phone to vote.”
• “Do you feel budget cuts are more important than humanitarian pro-grams that would need to be cut to obtain a balanced budget?”
12. Methods for Choosing Samples
• Judgement Sample
– Use the opinion of person(s) deemed qualified to choose membersof the sample.
– Example: to investigate study habits of atheletes, ask their coachesand teachers.
• Simple Random Selection
– Use random numbers to select the sample.
![Page 4: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/4.jpg)
4
– Page 315 Random Digit Table:
72985547555515086461
• Stratefied Sampling
– Divide the population into relatively homogenous groups, draw asample from each group, and take their union.
13. Goals of a good sample
• from the correct population• chosen in an unbiased way• large enough to reflect total population
14. Normal Distribution of Random Events
Toss a coin 100 times and count the number of heads.
How many heads would you expect?
• about 50• exactly 50
It does not seem reasonable that the count will be exactly 50.
We would not be surprised if the number of heads turned out to be 48 or51 or even 55.
We would be surprised to see 80 heads, and would begin to suspect that thecoin was not fair.
15. Coin Toss Data
Experiment: A coin is tossed n = 100 times.
The experiment is repeated 1000 times.
Here are the results:
![Page 5: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/5.jpg)
5
16. Frequency Table: No. of Heads
Heads Freq Heads Freq Heads Freq
1 0 45 54 58 27... 0 46 49 59 19
34 0 47 54 60 1135 2 48 66 61 1136 2 49 89 62 537 2 50 70 63 438 2 51 77 64 239 5 52 85 65 040 14 53 62 66 041 16 54 57 67 142 25 55 52 68 043 30 56 40
... 044 31 57 36 100 0
mean = 50.296
stand dev = 5.100
17. Coin Toss Histogram
30 40 50 60 70
![Page 6: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/6.jpg)
6
18. Sampling Distributions
If we could examine all possible samples of size n of a population, then thefrequency distribution of the means of these samples is normally distributed.
• µ = the mean over the entire population• σ = the standard deviation over the entire population• x = the mean of the sampling distribution• σx = the standard deviation of the sampling distribution
19. Two Rules
Rule 1. x = µ
Rule 2. σx =σ√
n
We are assuming in Rule 2 that the size of the entire population is muchlarger than the sample size n.
20. Two Outcome Situations
Situation: Two outcomes (for–against; heads–tails; yes–no)
p = percent in favor
q = percent opposed
Written as decimals p + q = 1 Why?
21. Example
• 29 % of Americans favor Bush’s handling of the War in Iraq,• while 71 % do not.• p = .29 q = .71• p + q = .29 + .71 = 1
![Page 7: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/7.jpg)
7
22. Quantitizing the Data
• We count a for (or yes) vote as X1 = 1• and an against (or no) vote as X2 = 0• Out of 100 people, we would expect• 100p yes votes and 100q no votes
23. To calculate the mean
Outcome (out of 100 cases):
Vote Frequency Freq ×Xi
X1 = 1 (yes) 100p 100pX2 = 0 (no) 100q 0
Total 100p
So the mean µ =100p
100= p
24. Standard Deviation
Out of 100 cases,
Vote Freq (Xi − µ)2 Freq×(Xi − µ)2
X1 = 1 100p (1 − p)2 100p(1 − p)2
X2 = 0 100q (0 − p)2 100q(0 − p)2
Total 100p(1 − p)2
+100q(0 − p)2
25. Calculating standard deviation
First divide the Total by n = 100 cases:
Total
100= p(1 − p)2 + q(0 − p)2
= p(1 − p)2 + qp2
= pq2 + qp2 [1-p=q]
![Page 8: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/8.jpg)
8
= pq(q + p)
= pq [because p + q = 1]
Then to get σ, take the square root:
σ =√
pq
26. The p–q Rule
Suppose a coin has probability p of landing heads and q = 1 − p of landingtails.
(A value other than p = 1
2means the coin is not “fair.”)
The parameter which measures a head (X = 1) versus a tail (X = 0) has
mean µ = p and standard deviation σ =√
pq
27. Bush Popularity Example
29% think Bush is doing a good job71% do not
p = .29 and q = .71
µ = p = .29
σ =√
pq =√
(.29)(.71) = .4538
28. Fair Coin Toss
Heads = 1, Tails = 0
With a fair coin, we expect the percentage of heads to be 50%:
p = .5 and q = .5
µ = p = .5
σ =√
pq =√
(.5)(.5) =√
.25 = .5
![Page 9: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/9.jpg)
9
29. Percents versus Actual Numbers
Sometimes our calculations are in terms of percents and sometimes they aregiven as actual numbers.
For example, suppose we flip a coin 340 times.
We would expect to have roughly 170 heads (and 170 tails).
We expect the percentage of heads to be 170
340= 1
2or 50%
p = 0.5 is the number used in our formulas (along with q = .5)
To convert from the percentage to the actual number of expected heads,simply multiply p by n
In this case, we expect 1
2× 340 = 170 heads.
30. Percents versus Actual Numbers Cont’d
The p–q formula computes the standard deviation σ for the population when
we are thinking in terms of percent
The formula σx = σ√
ncomputes the standard error of the mean when we are
thinking in terms of percent
To convert to actual numbers, multiply σx by n.
By properties of the square root functionσ√
n· n = σ ·
√n
31. Percents versus Actual Numbers
Flip a coin 340 times and count the number of heads.
Mean and Standard Deviation for the Entire Population
µ = 1
2= 0.5 σ =
√
(.5 × .5) = 0.5
Mean and Standard Deviation for Sample Size of n = 340 tosses
In terms of percents:
x = µ = 0.5 σx = σ√
340= .027
![Page 10: Statistical thinking](https://reader036.fdocuments.in/reader036/viewer/2022071922/55d134f2bb61eb51298b4698/html5/thumbnails/10.jpg)
10
In terms of actual numbers, multiply by n = 340:
mean = 0.5 × 340 = 170 stan. dev. = .027 × 340 = 9.22
32. Interpetation
Since the sampling distribution is normally distributed with mean 170 andstandard deviation of 9.2, the 68–95–99 rule tells us:
If you flip a fair coin 340 you would expect the number of heads to be
between 161 and 179 68% of the time [1 standard deviation]
between 152 and 188 95% of the time [2 standard deviations]
between 142 and 198 99% of the time [3 standard deviations]
33. Coin–Toss Model
• Suppose a coin has probability p of landing heads and q = 1 − p oflanding tails.
• Suppose we flip the coin n times and record x, the number of heads foreach sample.
• The values of x will be normally distributed with mean and standarddeviation given as follows:
Distribution DistributionPopulation Sample Sample
Percents Actual Numbers
Mean p p p · n
Stan. Dev. σ =√
pqσ√
nσ ·
√n
34. Comparison with Previous Experiment
Toss a coin n = 100 times
Actual Value Predicted ValueMean 50.296 50Stan. Dev. 5.100 5