Introduction to Statistics. Objectives: Understand certain statistical concepts terminology...

65
Introduction to Statistics

description

Introduction Statistics - a set of concepts, rules, and procedures that help us to: – organize numerical information in the form of tables, graphs, and charts; – understand statistical techniques underlying decisions that affect our lives and well-being; and – make informed decisions.

Transcript of Introduction to Statistics. Objectives: Understand certain statistical concepts terminology...

Page 1: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Introduction to Statistics

Page 2: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Objectives:• Understand certain statistical concepts &

terminology• Describe types of measurement scales• Differentiate between descriptive and inferential

statistics• Identify measures of central tendency and

understand their uses. • Identify measures of dispersion

Page 3: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Introduction

• Statistics - a set of concepts, rules, and procedures that help us to:– organize numerical information in the form

of tables, graphs, and charts;– understand statistical techniques

underlying decisions that affect our lives and well-being; and

– make informed decisions.

Page 4: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Descriptive Statistics • Statistics is a branch of mathematics designed to allow

people to accomplish two goals:1. The first is to accurately describe data and trends in data (descriptive statistics).

Descriptive statistics: Collection, classification, analysis, and interpretation of data.

- Any method or formula which yields some number and tells us about a set of data is referred to as descriptive statistics.

Page 5: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

2. The second is to make predictions on future behavior, based on current data (predictive statistics).

Predictive statistics: Using statistics generated from the sample in order to make predictions, this is also often called inferential statistics.

Page 6: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Terminology• Data - facts, observations, and information

that come from investigations. There are two types of data:

1. Measurement data sometimes called quantitative data -- the result of using some instrument to measure something (e.g., test score, weight).

Page 7: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

data

• 2. Categorical data also referred to as frequency or qualitative data. Things are grouped according to some common property(ies) and the number of members of the group are recorded (e.g., males/females, vehicle type).

Page 8: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Variable

• property of an object or event that can take on different values. For example, college major is a variable that takes on values like mathematics, computer science, English, psychology, etc.

• Discrete Variable - a variable with a limited number of values (e.g., gender (male/female), employee (junior/senior).

Page 9: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Variable• Continuous Variable - a variable that can take on

many different values, in theory, any value between the lowest and highest points on the measurement scale.

• Independent Variable - a variable that is manipulated, measured, or selected by the researcher as an antecedent condition to an observed behavior. In a hypothesized cause-and-effect relationship, the independent variable is the cause and the dependent variable is the outcome or effect.

Page 10: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

• Dependent Variable - a variable that is not under the experimenter's control -- the data. It is the variable that is observed and measured in response to the independent variable.

• Qualitative Variable - a variable based on categorical data.

• Quantitative Variable - a variable based on quantitative data.

Page 11: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Types of Measurement Scales

1. Nominal:

For qualitative data with distinct categories. For example the categories German, French, and Italian are categories but are not ordered in any way.

Page 12: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

2. Ordinal: For quantitative data with distinct categories in which ordering (or ranking) is implied. A good example is the Likert scale that you see on many surveys:

1=Strongly disagree; 2=Disagree; 3=Neutral; 4=Agree; 5=Strongly agree.

Page 13: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

3. Interval: For quantitative data with an ordered scale in which the interval between data values is meaningful. For example the categories of rank in the military. Clearly a major is higher ranked than a captain, but how much higher? Does he have twice the authority of a captain? It is impossible to say. You can only say he is higher ranked.

Page 14: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

4. Ratio: For quantitative data which have an inherently defined zero and the ratio of data values is meaningful. Weight in kilograms is a very good example since it has a definite ratio from one weight to another. 50kg is indeed twice as heavy as 25 kg.

Page 15: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Two Types of StatisticsTwo Types of Statistics• Descriptive statistics of a POPULATION• Relevant notation (Greek):

– mean– N population size– sum

• Inferential statistics of SAMPLES from a population.– Assumptions are made that the sample reflects the

population in an unbiased form. Roman Notation:– X mean– n sample size– sum

Page 16: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Measures of Central TendencyMeasures of Central Tendency

• These measures tap into the average distribution of a set of scores or values in the data. – Mean– Median– Mode

Page 17: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

What is “Mean”?What is “Mean”?

The “mean” of some data is the average score or value, such as the average age of an MPA student or average weight of professors that like to eat donuts.

Inferential mean of a sample: X=(X)/nMean of a population: =(X)/N

Page 18: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Mean• The mean is the most common measure of

central tendency and the one that can be mathematically manipulated. It is defined as the average of a distribution is equal to the SX / N. Simply, the mean is computed by summing all the scores in the distribution (SX) and dividing that sum by the total number of scores (N).

Page 19: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Mean• The mean is the balance point in a distribution

such that if you subtract each value in the distribution from the mean and sum all of these deviation scores, the result will be zero.

Example: 2, 5, 8,10,12,17Mean = 54/6= 9-7, -4, -1, 1, 3, 8 then the sum is Zero.

Page 20: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Problem of being “mean”Problem of being “mean”• The main problem associated with the mean value

of some data is that it is sensitive to outliers (extreme values).

• Example, the average weight of 10 students might be affected if there was one who weighed 200 kg.

Page 21: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

The Median

• Because the mean average can be sensitive to extreme values, the median is sometimes useful and more accurate.

• The median is simply the middle value among some scores of a variable. (no standard formula for its computation).

Page 22: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

The median• The median is the score that divides the

distribution into halves; half of the scores are above the median and half are below it when the data are arranged in numerical order. The median is also referred to as the score at the 50th percentile in the distribution.

Page 23: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

• When we have odd number of observations, the formula yields an integer that represents the value in a numerically ordered distribution corresponding to the median location. (For example, in the distribution of numbers (3 1 5 4 9 9 8) the median location is the 4th value.

• When applied to the ordered distribution (1 3 4 5

8 9 9), the value 5 is the median, three scores are above 5 and three are below 5.

Page 24: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

The Median

• If there were only 6 values (1 3 4 5 8 9), the median location in this case is half-way between the 3rd and 4th scores (4 and 5) or 4.5.

Page 25: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

What is the Median?Boxer Weight

Schmuggles 165Bopsey 213Pallitto 189Homer 187Schnickerson 165Levin 148Honkey-Doorey 251Zingers 308Boehmer 151Queenie 132Googles-Boop 199Calzone 227  194.6

Weight

132148151165165187189199213227251308

Rank order and choose middle value.

If the number of values is even then the median is the average between two in the middle

Page 26: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

The Mode

• Mode - The mode of a distribution is simply defined as the most frequent or common response or value for a variable.

• Multiple modes are possible: bimodal or multimodal.

Page 27: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Figuring the ModeBoxer Weight

Schmuggles 165Bopsey 213Pallitto 189Homer 187Schnickerson 165Levin 148Honkey-Doorey 251Zingers 308Boehmer 151Queenie 132Googles-Boop 199Calzone 227

What is the mode?

Answer: ??

Page 28: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

PercentilesPercentiles

• If we know the median, then we can go up or down and rank the data as being above or below certain thresholds.

• You may be familiar with standardized tests. 90th percentile, your score was higher than 90% of the rest of the sample.

Page 29: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

To calculate the kth percentile (where k is any number between zero and one hundred), do the following steps:

1. Order all the values in the data set from smallest to largest.2. Multiply k percent by the total number of values, n.

This number is called the index.

3. If the index obtained in Step 2 is not a whole number, round it up to the nearest whole number and go to Step 4a. If the index obtained in Step 2 is a whole number, go to Step 4b.

Page 30: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

4a. (Index is not a whole number)Count the values in your data set from left to right (from the smallest to the largest value) until you reach the number indicated by Step 3.

The corresponding value in your data set is the kth percentile.

4b. (Index is a whole number)Count the values in your data set from left to right until you reach the number indicated by Step 2.

The kth percentile is the average of that corresponding value in your data set and the value that directly follows it.

Page 31: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

For example, suppose you have 25 test scores, and in order from lowest to highest they look like this: 43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99, 99. To find the 90th percentile for these (ordered) scores, start by multiplying 90% times the total number of scores, which gives 90% 25 = 0.90 25 = 22.5 (the index). Rounding up to the ∗ ∗nearest whole number, you get 23.

Counting from left to right (from the smallest to the largest value in the data set), you go until you find the 23rd value in the data set. That value is 98, and it is the 90th percentile for this data set.

Make sure. 22/25=0.88

Page 32: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Now say you want to find the 20th percentile. Start by taking 0.20 x 25 = 5 (the index); this is a whole number, so proceed from Step 3 to Step 4b, which tells you the 20th percentile is the average of the 5th and 6th values in the ordered data set (62 and 66). The 20th percentile then comes to (62 + 66) ÷ 2 = 64.

The median (the 50th percentile) for the test scores is the 13th score: 77.

Page 33: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Measures of DispersionMeasures of Dispersion

• Measures of dispersion tell us about variability in the data.

• Basic question: how much do values differ for a variable from the minimum to maximum, and distance among scores in between. We use:– Range– Standard Deviation– Variance

Page 34: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

• Remember that we said in order to assemble information from data, i.e. to make an inference, we need to see variability in our variables.

• Measures of dispersion give us information about how much our variables vary from the mean, because if they don’t, this makes it difficult infer anything from the data. Dispersion is also known as the spread or range of variability.

Page 35: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

The RangeThe Range• Range = highest- lowest

r = h – l – Where h is high and l is low

• In other words, the range gives us the value between the minimum and maximum values of a variable.

• Understanding this statistic is important in understanding your data, especially for management and diagnostic purposes.

Page 36: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Example:Problem:   Cheryl took 7 math tests in one

marking period. What is the range of her test scores?

  89,  73,  84,  91,  87,  77,  94

Solution:   Ordering the test scores from least to greatest, we get:

  73,  77,  84,  87,  89,  91,  94   highest - lowest = 94 - 73 = 21

Page 37: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

The Standard Deviation The Standard Deviation • A standardized measure of distance from the

mean.

• Very useful and something you do read about when making predictions or other statements about the data.

Page 38: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Standard Deviation

• most popular and important measure of variability

• a measure of how far all of the individual scores in the distribution are from a standard (mean)

Page 39: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Standard Deviation

m eanm ean m eanlow variability

small SD

high variability

large SD

Page 40: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

=square root=sum (sigma)X=score for each point in data_X=mean of scores for the variablen=sample size (number of observations or cases

S =

Formula for Standard DeviationFormula for Standard Deviation

1)-(n

2)( XX

Page 41: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Example: Calculate the SD for the following values: 4, 2, 5, 8, 6.1. Calculate the mean: 2. Calculate deviation from the mean for each value in the sample:4-5=-1, 2-5=-3, 5-5=0, 8-5=3, 6-5=13. Calculate sum of all these deviations and square it (=20)4. Calculate the standard deviation:

1)-(n

2)( XX

Page 42: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

VarianceVariance

1)-(n

2)( XX

• Note that this is the same equation except for no square root taken.

• Its use is not often directly reported in research but instead is a building block for other statistical methods.

S2 =

Page 43: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Standard Deviation of the Mean or the standard error (SE)

• It is the variation in means of repeated samples.

• SE= Standard deviation divided by the square root of n.

Page 44: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Coefficient of Variation• It measures variability in relation to mean (or

average).• Used to compare the relative dispersion of

more than one data set. Data to be compared may be in the same units, in different units, with similar mean or with different mean.

CV= Standard deviation divided by mean and multiplied by percentage.

CV= S/M

Page 46: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Goal of Graphing?

1. Presentation of Descriptive Statistics2. Presentation of Evidence

3. Some people understand subject matter better with visual aids.

4. Provide a sense of the underlying data generating process (data pattern).

Page 47: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Normal Distribution

• Most widely used continuous distribution• Also known as the Gaussian distribution• Symmetric

Page 48: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Graphing Data: Histograms

Page 49: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Graphing Data: Bar Graph

Page 50: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Pie Charts:

Proportions of Donut-Eating Professors by Weight Class

130-150

151-185

186-210

211-240

241-270

271-310

311+

Page 51: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Line Graphs: A Time Series

Page 52: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Frequency Distribution Table

VAR00003

2 7.7 7.7 7.73 11.5 11.5 19.23 11.5 11.5 30.85 19.2 19.2 50.04 15.4 15.4 65.42 7.7 7.7 73.14 15.4 15.4 88.52 7.7 7.7 96.21 3.8 3.8 100.0

26 100.0 100.0

1.002.003.004.005.006.007.008.009.00Total

ValidFrequency Percent Valid Percent

CumulativePercent

Page 53: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Properties of a Distribution

• Shape– symmetric vs. skewed– unimodal vs. multimodal

• Central Tendency– where most of the data are??– mean, median, and mode

• Variability (spread)– how similar the scores are??– range, variance, and standard deviation

Page 54: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Representing a Distribution

• Often it is helpful to visually represent distributions in various ways.

• Graphs– continuous variables (histogram, line graph)– categorical variables (pie chart, bar chart)

• Tables– frequency distribution table.

Page 55: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Shape of a Distribution

• Symmetrical (normal)– scores are equally distributed about the central

tendency (i.e., mean)

Page 56: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Shape of a Distribution

• Skewed– extreme high or low scores can skew the

distribution in either direction

Negative skew Positive skew

Page 57: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Shape of a Distribution

• Unimodal

• Multimodal

Minor Mode Major Mode

Page 58: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Central Tendency

• Mode: the most frequent score– good for nominal scales (eye color)

• Median: the middle score– separates the bottom 50% and the top 50% of

the distribution– good for skewed distributions (net worth).

Page 59: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Central Tendency

• Mean: the arithmetic average– add all of the scores and divide by total number of

scores– This the preferred measure of central tendency

(takes all of the scores into account)

XN

X Xn

population sample

Page 60: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Central Tendency

• Is the mean always the best measure of central tendency?

• No, skew pulls the mean in the direction of the skew

Page 61: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Central Tendency and Skew

If negative skew:

Mode

Median

Mean

Page 62: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Central Tendency and Skew

If positive skew:

Mode

Median

Mean

Page 63: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Normal Distribution

• Gives us a picture of the variability and central tendency.

Page 64: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Normal Distribution

95.0)22(68.0)(50.0)( YPYPYP

Page 65: Introduction to Statistics. Objectives: Understand certain statistical concepts  terminology Describe types of measurement scales Differentiate between.

Standard Deviation

• In a perfectly symmetrical (i.e. normal) distribution 2/3 of the scores will fall within +/- 1 standard deviation (suppose SD= 3.27)

6.4

+1-1

9.673.13