Basic Statistics
description
Transcript of Basic Statistics
![Page 1: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/1.jpg)
Basic Statistics
Measures of Variability
![Page 2: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/2.jpg)
Measures of Variability
The Range Deviation Score The Standard Deviation The Variance
![Page 3: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/3.jpg)
STRUCTURE OF STATISTICS
STATISTICS
DESCRIPTIVE
INFERENTIAL
TABULAR
GRAPHICAL
NUMERICAL
CONFIDENCEINTERVALS
TESTS OF HYPOTHESIS
Continuing with numerical approaches.
NUMERICAL
![Page 4: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/4.jpg)
STRUCTURE OF STATISTICSNUMERICAL DESCRIPTIVE MEASURES
DESCRIPTIVE
TABULAR
GRAPHICAL
NUMERICAL
CENTRALTENDENCY
VARIABILITY
SYMMETRY
![Page 5: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/5.jpg)
STRUCTURE OF STATISTICSNUMERICAL DESCRIPTIVE MEASURES
NUMERICAL
CENTRALTENDENCY
VARIABILITY
SYMMETRY
RANGE
VARIANCE
STANDARDDEVIATION
![Page 6: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/6.jpg)
You are an elementary school teacher who has been assigned a class of fifth graders whose mean IQ is 115. Because children with IQ of 115 can handle more complex, abstract material, you plan many sophisticated projects for the year.
Do you think your project will succeed ?
115100 130 145857055
General population
85%
We need the variability of IQs in the class!
![Page 7: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/7.jpg)
Having graduated from college, you are considering two offers of employment. One in sales and the other in management. The pay is about
same for both. After checking out the statistics for salespersons and managers at the library, you find that those who have been working for 5
years in each type of job also have similar averages.
Can you conclude that the pay for two occupations is equal?
Is the average salary enough? We need the variability!
management Sales
Much moreMuch less
$20,000
![Page 8: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/8.jpg)
Group of scores
Single scoreCentral Tendency measures
IQ of 100 students
Mean IQ=118
![Page 9: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/9.jpg)
Central Tendency Measures
???
Measures of Central Tendency do not tell you the differences that exist among the scores
More homogeneous
![Page 10: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/10.jpg)
Central Tendency
1 23
4
Same Mean---Different Variability So What?
60
How many are out here?
![Page 11: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/11.jpg)
1. The Range = The difference between the largest (Xmax) and the smallest (Xmin).
21
25, 21, 22, 23, 28, 26, 24, 29
24 25 26 28 29
Range = 29 –21 = 8
22 23
A large range means there is a lot of variability in data.
![Page 12: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/12.jpg)
10, 28, 26, 27, 29
10 26 28 29
Range = 29 –10 = 19
27
?
Drawbacks of The Range
Range = 29 –26=3
The Range depends on only the two extreme scores
![Page 13: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/13.jpg)
Because the range is determined by just two scores in the group, it ignores the spread of all scores except the largest and smallest.
One aberrant score or outlier can be greatly increase the range
Range and Extreme Observations
R
R R
![Page 14: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/14.jpg)
Range and Measurement Scales
1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,
Country Code SES F Age
3-1=2 3-1=2 3-1=2 3-1=21=American 2=Asian 3=Mexican
1=Upper 2=Middle 3=Lower
Before you determine the Range, all scores must be arranged in order
![Page 15: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/15.jpg)
3. The Variance
3566
53
88
83
35 42
6341
72
81
9549
4177
5762
7849
27
35
44
4941
81
49
53
27
3553
78
49
66
66
88?
Differences among Scores
Differen
ces amo
ng
Sco
res
![Page 16: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/16.jpg)
Total Variability = Sum of Individual Variability
How can you determine the variability of each individual in group?
72
67
7055
2222
31212
The amount of Individual difference entirely depends on comparison criteria.
![Page 17: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/17.jpg)
Can you figure out total amount of differences among scores ?
Can you figure out how much each score is different from other scores ?
![Page 18: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/18.jpg)
48
Reference score? Mean Score
You need a Common Criteria for computing Total Variability
46
47
53
50
52 45
51
49
![Page 19: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/19.jpg)
47
46
49
51
52
50
53
48
Reference score?49
You need a Common Criteria for computing Total Variability
45
-2
+4
-1-3
0
+2
-4+3
+1
Deviation Scores
A Deviation score tells you that a particular score deviate, or differs from the mean
![Page 20: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/20.jpg)
DEVIATION SCORE = (Xi - Mean)
A score a great distance from the mean will have large deviation score.
MeanA B C D E F
1 2
3
![Page 21: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/21.jpg)
Total amount of variability?!
Sum of all distance values!
Sum of Deviation Scores
No way!
conceptually
mathematically
![Page 22: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/22.jpg)
The idea makes sense…but
If you compute the sum of the deviation scores, the sum of the deviation scores equals zero!
Sum of Deviation scores =(-4) + (-3) + (-2) + 0 + (1) + (2) + (3) + (4) = 0
![Page 23: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/23.jpg)
The Sum of Absolute Deviation Scores
Sum of absolute deviation scores ( 4 + 3 + 2 + 1 + 0 +1 + 2 + 3 + 4) = 20
The sum of absolute deviations is rarely used as a measure of variability because the process
of taking absolute values does not provide meaningful information for inferential statistics.
![Page 24: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/24.jpg)
Sum of Squares of deviation scores
“SS”
Conceptually
And
Mathematically
![Page 25: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/25.jpg)
Sum of Squares of Deviation Scores, SS
Instead of working with the absolute values of deviation scores, it is preferable to (1) square each deviation score and (2) sum them to obtain a quantity know as the Sum of Squares.
SS=(-4)+(-3)+(-2)+(-1)+0+(1)+(2)+(3)+(4) =16+9+4+1+0+1+2+9+16 =60
2 2 2 2 2 2 2 2 2
SS=i
![Page 26: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/26.jpg)
Group of scores“A”
Group of scores“B”
SS(A)=30 SS(B)=40
Can you say that the variability of the data in Group B is greater than the data
in Group A?
So !
![Page 27: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/27.jpg)
3, 4 3, 4 3, 4
What happens to SS when we look at some data?
Group A Group B
Mean = 3.5 Mean = 3.5
SS = (3 - 3.5) + (4 - 3.5) +
(3 - 3.5) + (4 - 3.5)
=1.00
2 2 2 2
2 2
SS = (3 - 3.5) + (4 - 3.5)
=.50
![Page 28: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/28.jpg)
i=1
N
i
SS tends to increase as number of data(N) increase.
SS is not appropriate for comparing variability among groups having unequal sample size.
How can you overcome the limitation of SS
Mean
![Page 29: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/29.jpg)
If SS is divided by N
The resulting value will beMean of the Deviation Scores (Mean
Square)
VARIANCE
N
SS
N
)X(XS
2i2
![Page 30: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/30.jpg)
3, 4 3, 4 3, 4
Group A Group B
Mean = 3.5 Mean = 3.5
V = (3 - 3.5) + (4 - 3.5)
=.50/2 = .25
V = (3 - 3.5) + (4 - 3.5) +
(3 - 3.5) + (4 - 3.5)
= 1.00/4 = 2.5
2 2 2 2
2 2
![Page 31: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/31.jpg)
Variance
22
( )XN
Population Variance
22
1s X Xn
( )
Sample Variance
N
)X(X 2i
![Page 32: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/32.jpg)
POPULATION VARIANCE
Sigma Square Population size
Population meanIndividual value
22
( )XN
![Page 33: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/33.jpg)
SAMPLE VARIANCE SAMPLE VARIANCE
22
1s X Xn
( )
Sample variance
The sample variance (S2) is used to estimate the population variance (2)
Individual value Sample Mean
Sample size-1Degree of freedom
![Page 34: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/34.jpg)
Why n-1 instead n ?2
2
1s X Xn
( )
![Page 35: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/35.jpg)
Sampling n
Sampling error
=
?<100<?
100
population
sample
![Page 36: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/36.jpg)
The value of the squared deviations is less from X than from any other score .
Hence, in a sample, the value of (X-X) n would be less than n.
>n n
Ideally, a sample variance would be based on (x - )2. This is impossible since is not known if one has only a sample of n cases. is substituted by .X
![Page 37: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/37.jpg)
>
Ideal sample variance
One could correct for this bias by dividing by a factor somewhat less than n
n-1
![Page 38: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/38.jpg)
sample n=5
If we know that the mean is equal to 5, and the first 4
scores add to 18, then the last score MUST equal 7.
n-1 are free to change
Degree of freedom
7
5
?5
n
XX
We know that ? must equal 25.
![Page 39: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/39.jpg)
4. Standard Deviation SD
Positive square root of the variance
Population Sample
![Page 40: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/40.jpg)
The Standard Deviation and the Mean with Normal Distribution
![Page 41: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/41.jpg)
Normal Distribution
-1-2-3 +2 +3+1
Relationship between and
![Page 42: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/42.jpg)
Normal Distribution
Relationship between and S
-1S -2S -3S +1S +2S +3S
68%
95%
99.9%
![Page 43: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/43.jpg)
EMPIRICAL RULE
• For any symmetrical, bell-shaped distribution, approximately 68% of the observations will lie within 1 standard deviation of the mean; approximately 98% within 2 standard deviations of the mean; and approximately 99.9% within 3 standard deviations of the mean.
![Page 44: Basic Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062410/568156fe550346895dc4a403/html5/thumbnails/44.jpg)
You can approximately reproduce your data!
If a set of data has a Mean=50
and SD=10,then…
50403020 60 70 80
68%
95%
99%