I. Introduction to Data and Statistics
description
Transcript of I. Introduction to Data and Statistics
![Page 1: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/1.jpg)
I. Introduction to Data and Statistics
A. Basic terms and concepts
Data set
- variable
- observation
- data value
![Page 2: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/2.jpg)
5625786535
8912657825
7889581434
2598341953TX
> 65 $< 19 Rent $age
LA
AL
MS
CentralGulf States
![Page 3: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/3.jpg)
B. Primary and Secondary data
1. Primary data
- original data
- collected for a specific purpose
- sample design and procedures
- time and $
![Page 4: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/4.jpg)
2. Secondary data
- archival data
- agency or organization
- organized in a set format
- time and $
- data quality an issue
- sample design
![Page 5: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/5.jpg)
C. Individual and spatially aggregated data
State 1
State 4State 3
State 2
State 1
State 4State 3
State 2
Region
Region
![Page 6: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/6.jpg)
D. Discreet and Continuous data
1. Discreet
![Page 7: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/7.jpg)
2. Continuous
![Page 8: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/8.jpg)
E. Qualitative and Quantitative data
1. Qualitative (categorical)
Ex: land cover, sex, political party, race
2. Quantitative
Ex: population, precipitation, grades
![Page 9: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/9.jpg)
II. Scales of Measurement
A. Nominal
B. Ordinal
C. Interval
D. Ratiofor comparison must use the same scale of measurement
![Page 10: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/10.jpg)
A. Nominal
Name: George = 1, Wanda = 2, Bob = 3
Land Cover: Forested = 45, urban = 39, etc...
Climate regimes: polar = 1, temperate = 2, tropical = 3
Sex: Male = 1, Female = 2
- Mutually exclusive
- Exhaustive
Ex:
![Page 11: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/11.jpg)
B. Ordinal
- ranked data
- arbitrary
- comparisons
- not a set interval between rankings
Ex:
Places rated (cities, beaches…)
Level of satisfaction (poor, ok, good)
![Page 12: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/12.jpg)
C. Interval
- separated by absolute differences
- does not have an absolute zero
Ex:
- temperature
- elevation
![Page 13: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/13.jpg)
D. Ratio
- separated by absolute differences
- absolute zero
Ex:
- precipitation
- tree growth
- income
![Page 14: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/14.jpg)
III. Graphing procedures (univariate)
A. frequency histogramB. cumulative histogram
![Page 15: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/15.jpg)
1000 50
A. frequency histogram
Freq.
(#, %)
income, grades
(-)
(+)(frequency polygon)
![Page 16: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/16.jpg)
0 50
B. Cumulative frequency histogram
Cumu- lative Freq.
(#, %)
(-)
(+)
100
(cumulative frequency polygon)
![Page 17: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/17.jpg)
IV. Descriptive Statistics (univariate)- summary of data characteristics- inferential; extend sample to a larger population
A. Measures of Central TendencyB. Measures of DispersionC. Measures of Shape
![Page 18: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/18.jpg)
A. Measures of Central Tendency• attempt to define the most typical value of a larger data set
1. Mode2. Median3. Mean (average)
![Page 19: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/19.jpg)
Mode (nominal only)• value that occurs most frequently
• only measure of central tendency appropriate for nominal level data• works better for grouped data, not raw values• many data sets will not have two exact data sets
![Page 20: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/20.jpg)
2. Median• the middle value from a set of ranked observations• equal number of observations on either side• appropriate when data is heavily skewed• interval or ratio level data, not nominal
![Page 21: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/21.jpg)
3. Mean (average), .xi / n• most commonly used value of central tendency• interval or ratio level data• sensitive to outliers• most easily understood• assumptions:
• unimodal• symmetric distribution
![Page 22: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/22.jpg)
(-) (+)
0 100
mode
median
mean
Normal distribution
50
![Page 23: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/23.jpg)
(-) (+)
0 10050
mode
median
mean
![Page 24: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/24.jpg)
B. Measures of Dispersion• provide information about distribution of data
1. Range2. Standard deviation3. Coefficient of variation
![Page 25: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/25.jpg)
1. Rangedifference between largest and smallest value
• simplest measure of dispersion• easy to calculate• can be misleading
• ignores all other values• does not take into account clustering of data
![Page 26: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/26.jpg)
2. Standard deviation• the average deviation of each value from the mean
• based on the mean• better indicator of the dispersion of the entire sample (in comparison to the range)• scale dependent value
![Page 27: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/27.jpg)
3. Coefficient of variation• standard deviation / mean
• allows you to compare dispersion independent of scale• should be used to make comparisons where there are differences in mean
![Page 28: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/28.jpg)
(-) (+)
15 8550
Range: 85 - 15 = 70
1000
Std. dev. ~ .xi - X
X = 50
C.V. = Std. dev. / mean
![Page 29: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/29.jpg)
C.V. = Std. dev. / mean
![Page 30: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/30.jpg)
C. Measures of Shape
1. Skewness2. Kurtosis
![Page 31: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/31.jpg)
Leptokurtic
Mesokurtic
Platykurtic
![Page 32: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/32.jpg)
(-) skew(+) skewSymmetrical
(bell shaped)
![Page 33: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/33.jpg)
I.D. Xi Yi
A 2.8 1.5B 1.6 3.8C 3.5 3.3D 4.4 2.0E 4.3 1.1F 5.2 2.4G 4.9 3.5
Mean Center
![Page 34: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/34.jpg)
0 6
4 B (1.6, 3.8)
A (2.8, 1.5)
C (3.5, 3.3)
D (4.4, 2.0)
E (4.3, 1.1)
G (4.9, 3.5)
F (5.2, 2.4)
54321
1
2
3
![Page 35: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/35.jpg)
0 6
B (1.6, 3.8)
A (2.8, 1.5)
C (3.5, 3.3)
D (4.4, 2.0)
E (4.3, 1.1)
G (4.9, 3.5)
F (5.2, 2.4)Mean Center (3.81, 2.51)
54321
1
2
3
4
![Page 36: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/36.jpg)
I.D. Xi Yi f (w)
A 2.8 1.5 5B 1.6 3.8 20C 3.5 3.3 8D 4.4 2.0 4E 4.3 1.1 6F 5.2 2.4 5G 4.9 3.5 3
Weighted Mean Center
![Page 37: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/37.jpg)
0 6
B (20)
A (5)
C (8)
D (4)
E (6)
G (3)
F (5)
54321
1
2
3
4
![Page 38: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/38.jpg)
I.D. Xi Yi f (w) w Xi wYi
A 2.8 1.5 5 14 7.5B 1.6 3.8 20 32 76C 3.5 3.3 8 28 26.4D 4.4 2.0 4 17.6 8.0E 4.3 1.1 6 25.8 6.6F 5.2 2.4 5 26 12G 4.9 3.5 3 14.7 10.5
![Page 39: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/39.jpg)
0 6
B (20)
A (5)
C (8)
D (4)
E (6)
G (3)
F (5)
54321
1
2
3
4
Weighted MeanCenter (3.10, 2.88)
![Page 40: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/40.jpg)
![Page 41: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/41.jpg)
Correlation
1. Directionnegative or positive
2. Strength of relationshipperfect, strong, weak, no
- Bivariate relationship
Scattergrams
![Page 42: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/42.jpg)
(-) (+)
(+)
Positive (direct) correlation
![Page 43: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/43.jpg)
(-) (+)
(+)
Negative (inverse) correlation
![Page 44: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/44.jpg)
(-) (+)
(+)
Perfect correlation
![Page 45: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/45.jpg)
(-) (+)
(+)
Strong correlation
![Page 46: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/46.jpg)
(-) (+)
(+)
Weak correlation
![Page 47: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/47.jpg)
(-) (+)
(+)
No correlation ??
![Page 48: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/48.jpg)
(-) (+)
(+)
Controlled Correlation
![Page 49: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/49.jpg)
(-) (+)
(+)
Controlled correlation (clumping)
![Page 50: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/50.jpg)
(-) (+)
(+)
![Page 51: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/51.jpg)
(-) (+)
(+)
Threshold
![Page 52: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/52.jpg)
(-) (+)
(+)
Curvilinear
![Page 53: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/53.jpg)
![Page 54: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/54.jpg)
![Page 55: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/55.jpg)
![Page 56: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/56.jpg)
![Page 57: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/57.jpg)
![Page 58: I. Introduction to Data and Statistics](https://reader035.fdocuments.in/reader035/viewer/2022062500/56815225550346895dc06ae0/html5/thumbnails/58.jpg)