Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)
-
Upload
holly-moore -
Category
Documents
-
view
18 -
download
0
description
Transcript of Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)
Basic Quantitative Methods in the Social
Sciences
(AKA Intro Stats)02-250-0102-250-01
Lecture 4Lecture 4
A Quick Review
• The entire area under the normal The entire area under the normal curve can be considered to be a curve can be considered to be a proportion of 1.00proportion of 1.00
• A proportion of .50 lies to the left A proportion of .50 lies to the left of the mean, and a proportion of the mean, and a proportion of .50 lies to the right of meanof .50 lies to the right of mean
Area Under the Normal Distribution and Z-Scores
Normal DistributionNormal Distribution
with z-score pointswith z-score points
of reference:of reference:
Properties of Area Under the Normal Distribution
• Since the normal curve is a bell shape, the proportion of scores between whole z-scores is not equal
• For example, .3413 of the scores lie between the z-scores of 0 (the mean) and 1 (or -1), while only .1359 of the scores lie between the z-scores of 1 and 2 (or -1 and -2)
Properties of Area Under the Normal Distribution
Z = -3 -2 -1 0 +1 +2 +3
.3413 .3413
.1359 .1359
.0215 .0215
.0013 .0013
Properties of Area Under the Normal Distribution
Z-scores* Proportion under the curve-1 to +1 .6826 (.3413+.3413)-2 to +2 .9544-3 to +3 .9974-4 to +4 1.0000 *Z-scores are expressed in standard deviation
units, i.e., a z-score of -1 represents one standard deviation below (to the left of) the mean
Normal Distribution Example
• A study of 2500 University of Windsor students showed that the average amount of sleep lost in the week prior to writing a statistics exam (in hours) was normally distributed with = 7.79 and = 1.75 (don’t worry, this isn’t real data!)
• This distribution is shown with the abscissa (x-axis) marked in raw score and z-score units:
Normal Distribution Example
Z = -3 -2 -1 0 +1 +2 +3
.3413 .3413
.1359 .1359
.0215 .0215
.0013 .0013
X = 2.54 4.29 6.04 7.79 9.54 11.29 13.04
Z = -3 -2 -1 0 +1 +2 +3
Example cont.
• We can see from this diagram that 34.13% We can see from this diagram that 34.13% of U of W students lost between 6.04 and of U of W students lost between 6.04 and 7.79 hours of sleep in the week prior to a 7.79 hours of sleep in the week prior to a stats test (between z=-1 and z=0)stats test (between z=-1 and z=0)
• 13.59% of students lost between 9.54 and 13.59% of students lost between 9.54 and 11.29 hours of sleep in that week 11.29 hours of sleep in that week (between z=+1 and z=+2)(between z=+1 and z=+2)
• 49.87% of students lost between 2.54 & 49.87% of students lost between 2.54 & 7.79 hours of sleep (between z=-3 and 7.79 hours of sleep (between z=-3 and z=0) (.0215+.1359+.3413 = .4987 = z=0) (.0215+.1359+.3413 = .4987 = 49.87%) 49.87%)
Properties of Area Under the Normal Distribution
• The symbol The symbol is used to denote the z-score having is used to denote the z-score having area (alpha) to its right under the normal curvearea (alpha) to its right under the normal curve
• The proportion of area under the curve between The proportion of area under the curve between the mean and a z-score can be found with the help the mean and a z-score can be found with the help of a table (Table E.10, Howell, p. 452) and a little of a table (Table E.10, Howell, p. 452) and a little math…math…
• In this example, we want to know the area In this example, we want to know the area between the mean and z = 0.20:between the mean and z = 0.20:
• Look under the column “mean to z” at z=0.20Look under the column “mean to z” at z=0.20• The proportion = 0.0793The proportion = 0.0793• Therefore, .0793 (or almost 8%) is the proportion Therefore, .0793 (or almost 8%) is the proportion
of data scores between the mean and the score of data scores between the mean and the score that has a z score of 0.20 that has a z score of 0.20
Z
Example cont.• This means that the area between This means that the area between
the mean and z = 0.20 has an area the mean and z = 0.20 has an area under the curve of 0.0793:under the curve of 0.0793:
Z: 0 0.20
.0793
.4207
Example cont.
• Since half of the normal distribution has an Since half of the normal distribution has an area of .5000, we can determine the area area of .5000, we can determine the area beyond z = .20 by subtracting the area beyond z = .20 by subtracting the area from the mean to z = .20 from .5000:from the mean to z = .20 from .5000:
• Area beyond z=.20 = .5000 - .0793Area beyond z=.20 = .5000 - .0793• Area beyond z=.20 = .4207Area beyond z=.20 = .4207• (Note: If you look at the “smaller portion” (Note: If you look at the “smaller portion”
in the table, you will see it’s .4207)in the table, you will see it’s .4207)
Example cont.• Since the normal curve is symmetrical, Since the normal curve is symmetrical,
the area between the mean and z = -.20 the area between the mean and z = -.20 is equal to the area between the mean is equal to the area between the mean and z = +.20:and z = +.20:
Z: -0.20 0 +0.20
.0793
.4207
.0793
.4207
Normal Distribution Table
• Table E.10 has 3 columns:Table E.10 has 3 columns:Mean to zMean to zLarger portionLarger portionSmaller portionSmaller portion
Table: Mean to z
Table: Larger Portion
Table: Smaller Portion
A Couple of Notes
• 1) Always report proportions (area under the curve) 1) Always report proportions (area under the curve) to four decimal places. This means that if you to four decimal places. This means that if you report an area as a percentage, it will have two report an area as a percentage, it will have two decimal places (e.g., .7943 = 79.43%)decimal places (e.g., .7943 = 79.43%)
• 2) When using Table E.10, be careful not to confuse 2) When using Table E.10, be careful not to confuse z=.20 with z=.02 (this is a common mistake)z=.20 with z=.02 (this is a common mistake)
• 3) Remember that a negative z value has the same 3) Remember that a negative z value has the same proportion under the curve as the positive z value proportion under the curve as the positive z value because the normal distribution is symmetricalbecause the normal distribution is symmetrical
• 4) When working on z-score problems, it is highly 4) When working on z-score problems, it is highly recommended that you draw a normal distribution recommended that you draw a normal distribution and plot the mean, x, and their corresponding z-and plot the mean, x, and their corresponding z-scoresscores
Another Example!
• We often want to know what the area We often want to know what the area between two scores is, as in this example:between two scores is, as in this example:
• Assume that the marks in this class are Assume that the marks in this class are normally distributed with = 69.5 and normally distributed with = 69.5 and = 7.4. What proportion of students have = 7.4. What proportion of students have marks between 50 and 80?marks between 50 and 80?
Example: Area Between 2 Scores
1) Calculate the z-scores for X values (50 & 1) Calculate the z-scores for X values (50 & 80)80)
z = (50-69.5)/7.4 = -19.5/7.4 = -2.64z = (50-69.5)/7.4 = -19.5/7.4 = -2.64
z = (80-69.5)/7.4 = 10.5/7.4 = 1.42z = (80-69.5)/7.4 = 10.5/7.4 = 1.42
2) Find the proportions between the mean 2) Find the proportions between the mean and both z-scores (consult Table E.10)and both z-scores (consult Table E.10)
z(-2.64) = .4959 is the proportion z(-2.64) = .4959 is the proportion between the mean and z.between the mean and z.
z(1.42) = .4222 is the proportion between z(1.42) = .4222 is the proportion between the mean and z.the mean and z.
Example: Area Between 2 Scores
• Third, add these proportions Third, add these proportions together to find your answer:together to find your answer:
.4959 + .4222 = .9181.4959 + .4222 = .9181
• This means that 91.81% of students This means that 91.81% of students have Stats marks between 50 and have Stats marks between 50 and 8080
Smaller and Larger Portions
• Smaller portion = proportion in the tailSmaller portion = proportion in the tail• Larger portion = proportion in the bodyLarger portion = proportion in the body
• Using the same data ( = 69.5 and = 7.4) Using the same data ( = 69.5 and = 7.4) we can calculate areas using the Smaller we can calculate areas using the Smaller and Larger Portions in the Normal and Larger Portions in the Normal Distribution table:Distribution table:
• Find the number of students who have stats Find the number of students who have stats marks of less than 80.6marks of less than 80.6
• z = (80.6-69.5)/7.4 = +1.5z = (80.6-69.5)/7.4 = +1.5
Larger Portion
• Area below z = +1.5 = 0.9332Area below z = +1.5 = 0.9332This means that 93.32% of students This means that 93.32% of students
had a mark of 80.6 or less in this had a mark of 80.6 or less in this classclass
Smaller Portion
• Find the number of students who Find the number of students who have marks of 76.93 or better:have marks of 76.93 or better:
• z = (76.93-69.5)/7.4 = 1.00z = (76.93-69.5)/7.4 = 1.00• Area in smaller portion = .1587Area in smaller portion = .1587• This means that 15.87% of This means that 15.87% of
students in this class had a mark students in this class had a mark of 76.93 or betterof 76.93 or better
Converting Back to X
• Assume = 30 and = 5, what raw Assume = 30 and = 5, what raw scores correspond to z=-1.00 and z=+1.5?scores correspond to z=-1.00 and z=+1.5?
5.37)55.1(30
25)50.1(30
)( Therefore
X
X
zX
Xz
Proportion
• What proportion of scores lie between What proportion of scores lie between z=-1.00 and z=+1.50?z=-1.00 and z=+1.50?
• Area from mean to z=-1.00 = .3413Area from mean to z=-1.00 = .3413• Area from mean to z=+1.50 = .4332Area from mean to z=+1.50 = .4332• Add them together to get the Add them together to get the
proportion that lies between these proportion that lies between these two z-scores: .3413+.4332 = .7745two z-scores: .3413+.4332 = .7745
Finding for Number of Observations
• In this example, if we know the sample In this example, if we know the sample size, (e.g., n=212) we can calculate size, (e.g., n=212) we can calculate how many people lie between z=-1.00 how many people lie between z=-1.00 and z=+1.50:and z=+1.50:
• Area between z=-1.00 and z=+1.50 Area between z=-1.00 and z=+1.50 = .7745 (see the last slide)= .7745 (see the last slide)
• Multiply the proportion by n:Multiply the proportion by n:
(.7745)(212) = 164.19(.7745)(212) = 164.19
Approximately 164 peopleApproximately 164 people
And a Little More
• Finally, we can find a z-score from the Finally, we can find a z-score from the table if we know the proportion of scores table if we know the proportion of scores (i.e., we can work backwards):(i.e., we can work backwards):
• Suppose the birth weight of newborns is Suppose the birth weight of newborns is normally distributed with = 7.73 and = normally distributed with = 7.73 and = 0.830.83
• What birth weight identifies the top What birth weight identifies the top (heaviest) 10% of newborns?(heaviest) 10% of newborns?
Example cont.
• Look at Table E.10 and find the z-Look at Table E.10 and find the z-score that identifies the top score that identifies the top proportion of 0.1000: look in the proportion of 0.1000: look in the smaller portion column (the tail)smaller portion column (the tail)
z = ?
.1000
Example cont.
• Looking in the smaller portion Looking in the smaller portion column, we find that column, we find that z=1.28 has an area of .1003z=1.28 has an area of .1003z=1.29 has an area of .0985z=1.29 has an area of .0985Which do we pick?Which do we pick?
• Pick the one that is closest to an Pick the one that is closest to an area of .1000: this is z=1.28area of .1000: this is z=1.28
Example cont.
• Now solve for X:Now solve for X:
X = (1.28)(0.83) + 7.73X = (1.28)(0.83) + 7.73 = 1.06 + 7.73 = 8.79= 1.06 + 7.73 = 8.79
So any weight equal to or greater So any weight equal to or greater than 8.79 pounds is in the top 10% than 8.79 pounds is in the top 10% of birth weightsof birth weights
))((zX
Probability
• Everything that can possibly Everything that can possibly happen has some likelihood of happen has some likelihood of happening: probability is a happening: probability is a measure of that likelihoodmeasure of that likelihood
• ProbabilityProbability: The quantitative : The quantitative expression of likelihood of expression of likelihood of occurrenceoccurrence
Probability
• Probability is a ratio of frequenciesProbability is a ratio of frequencies• The numerator (top) is the The numerator (top) is the
frequency of the outcome of frequency of the outcome of interestinterest
• The denominator (bottom) is the The denominator (bottom) is the frequency of all possible outcomes frequency of all possible outcomes
Coin Toss Example
• If a fair* coin is tossed in the air, it If a fair* coin is tossed in the air, it can land on either heads or tailscan land on either heads or tails
• This means a coin has 2 possible This means a coin has 2 possible outcomesoutcomes
• If we want to know the probability If we want to know the probability of tossing a fair* coin and having it of tossing a fair* coin and having it land on heads, we calculate as land on heads, we calculate as follows:follows:*Note: fair means a normal coin, one *Note: fair means a normal coin, one
that is not weighted differentlythat is not weighted differently
Coin Toss
Frequency of interestFrequency of interest
Frequency of all possible outcomesFrequency of all possible outcomes
For a coin toss, this is :For a coin toss, this is :
11
22
The probability of the coin landing on heads The probability of the coin landing on heads is: p(heads) = ½, or p(heads) = .5is: p(heads) = ½, or p(heads) = .5
Another Example
• Suppose there are 90 students in a Suppose there are 90 students in a class, 59 of them are women and class, 59 of them are women and 31 are men31 are men
• If one of the students is chosen at If one of the students is chosen at random, the probability of random, the probability of choosing a woman is:choosing a woman is:
p(woman) = p(woman) = 59/9059/90
More Probability
• If the entire class was women If the entire class was women (e.g., there were no male (e.g., there were no male students), the probability of students), the probability of choosing a woman would be 90/90choosing a woman would be 90/90
• If the entire class was men, the If the entire class was men, the probability of choosing a woman probability of choosing a woman would be 0/90would be 0/90
More Probability
• As a numerical value, probabilities As a numerical value, probabilities can range from 0.00 to 1.00 can range from 0.00 to 1.00
• The numerator can range from a The numerator can range from a minimum of 0 to a maximum equal minimum of 0 to a maximum equal to the denominatorto the denominator
Express Yourself!
• Probability can be expressed as a Probability can be expressed as a fraction, e.g., p(woman) = 59/90 fraction, e.g., p(woman) = 59/90
• Or as a decimal fraction: Or as a decimal fraction: p(woman) = .6556p(woman) = .6556
• Although not usually expressed as Although not usually expressed as a percentage (e.g., 65.56%), they a percentage (e.g., 65.56%), they often are in popular mediaoften are in popular media
Probability cont.
• Even if we do not know the actual Even if we do not know the actual observed frequencies (e.g., the observed frequencies (e.g., the number of women), probabilities number of women), probabilities can be determined theoreticallycan be determined theoretically
• Without throwing a die, we can Without throwing a die, we can deduce the probability of landing deduce the probability of landing on a 5on a 5
Die Example cont.
• We know the die has 6 sides - 6 We know the die has 6 sides - 6 possible outcomespossible outcomes
• We are only interested in one side We are only interested in one side (the 5), so the probability of (the 5), so the probability of landing on a 5 is:landing on a 5 is:
p(5) = 1/6 = 0.1667p(5) = 1/6 = 0.1667
Probability and the Normal Distribution
• The normal distribution can be thought of The normal distribution can be thought of as a probability distribution. Here’s how:as a probability distribution. Here’s how:
• We know (from Table E.10) the proportion We know (from Table E.10) the proportion of scores that fall above or below a given z of scores that fall above or below a given z scorescore
• If you were to randomly pick a score from a If you were to randomly pick a score from a sample of scores, what is the sample of scores, what is the probabilityprobability that you would pick a score that has a that you would pick a score that has a corresponding z score of .40 or greater?corresponding z score of .40 or greater?
Probability and the Normal Distribution
• The proportion of scores above or The proportion of scores above or below a given z score is the same below a given z score is the same as the as the probabilityprobability of selecting a of selecting a score above or below the z scorescore above or below the z scoree.g., the probability of selecting a e.g., the probability of selecting a
score from a normal distribution that score from a normal distribution that has a z score of .40 or greater has a z score of .40 or greater is .3446 (the area in the smaller is .3446 (the area in the smaller portion of z = .40)portion of z = .40)
Example #1• Suppose people’s scores on a Suppose people’s scores on a
personality test are normally distributed personality test are normally distributed with a mean of 50 and a population with a mean of 50 and a population standard deviation of 10.standard deviation of 10.
• If you were to pick a person completely If you were to pick a person completely at random, what is the probability that at random, what is the probability that you would pick someone with a score on you would pick someone with a score on this personality test that is higher than this personality test that is higher than 60?60?
Example #1
• Step #1: Write down what you knowStep #1: Write down what you know
• Step #2: What do you want to find?Step #2: What do you want to find?
• Step #3: Draw the normal distribution, write Step #3: Draw the normal distribution, write in the mean, standard deviation, and the X in the mean, standard deviation, and the X and shade the area you are looking forand shade the area you are looking for
60X 1050
)60( Xp
Example #1, Step #3
X: 20 30 40 50 60 70 80
Example #1
• Step #4: Calculate z score(s)Step #4: Calculate z score(s)
• Step #5: Use Table E.10 to find the Step #5: Use Table E.10 to find the probability of selecting a score in your probability of selecting a score in your shaded areashaded areaHere we want or Here we want or Look up the smaller portion of z=1.00Look up the smaller portion of z=1.00
X
z10
5060 z 00.1
10
10z
)60( Xp )00.1( zp
1587.)00.1( zp
Example #1
• Step #6: Interpret:Step #6: Interpret:The probability of picking someone at The probability of picking someone at
random who has a personality test random who has a personality test score of 60 or greater is .1587score of 60 or greater is .1587
Example #2• Length of time spent waiting in line to Length of time spent waiting in line to
buy tickets at the movies is normally buy tickets at the movies is normally distributed with a mean of 12 minutes distributed with a mean of 12 minutes and a population standard deviation of 3 and a population standard deviation of 3 minutes.minutes.
• If you go to see a movie, what is the If you go to see a movie, what is the probability that you will wait in line to probability that you will wait in line to buy tickets for between 7.5 and 15 buy tickets for between 7.5 and 15 minutes?minutes?
Example #2
• Step #1: Write down what you knowStep #1: Write down what you know
• Step #2: What do you want to find?Step #2: What do you want to find?
• Step #3: Draw the normal distribution, write Step #3: Draw the normal distribution, write in the mean, standard deviation, and both X in the mean, standard deviation, and both X scores and shade the area you are looking forscores and shade the area you are looking for
5.71 X 3152 X
)155.7( Xp
12
Example #2, Step #3
X: 3 6 7.5 9 12 15 18 21
Example #2• Step #4: Calculate z score(s)Step #4: Calculate z score(s)
• Step #5: Use Table E.10 to find the probability of Step #5: Use Table E.10 to find the probability of selecting a score in your shaded areaselecting a score in your shaded area
Here we want Here we want oror
Look up the mean to z of z = 1.00 = .3413Look up the mean to z of z = 1.00 = .3413Look up the mean to z of z = -1.50 = .4332 Look up the mean to z of z = -1.50 = .4332
X
z
50.13
5.4
3
125.71
Xz
)00.150.1( zp
00.13
3
3
12152
Xz
)155.7( Xp
Example #2
• Add the two areas together! (Each Add the two areas together! (Each represent the mean to z, so adding them represent the mean to z, so adding them together gives you the overall shaded together gives you the overall shaded area) = .3413+.4332=.7745area) = .3413+.4332=.7745
7745.)00.150.1( zp
Example #2
• Step #6: Interpret:Step #6: Interpret:The probability of waiting in line to The probability of waiting in line to
buy tickets at the movie for between buy tickets at the movie for between 7.5 and 15 minutes is .7745. (Note: 7.5 and 15 minutes is .7745. (Note: This means that you will wait in line This means that you will wait in line for between 7.5 and 15 minutes for between 7.5 and 15 minutes 77.45% of the time).77.45% of the time).