Descriptive statistics

40
DESCRIPTIV E STATISTICS Adapted from the Presentation of Mrs. Zennifer L. Oberio Presented by: Balbido, Aileen U. Latap, Kenneth John R. Tejuco, Kerwin Chester C.

description

 

Transcript of Descriptive statistics

Page 1: Descriptive statistics

DESCRIPTIVE

STATISTICSAdapted from the Presentation

of Mrs. Zennifer L. Oberio

Presented by:Balbido, Aileen U.

Latap, Kenneth John R.Tejuco, Kerwin Chester C.

Page 2: Descriptive statistics

STATISTICS

The Study of how to:

CollectOrganize

Analyze& interpret

numerical information

DESCRIPTIVE STATISTICS

Page 3: Descriptive statistics

STATISTICS: Two Categories• Descriptive Used to summarize or

Methods to describe a data set(a set of measurements obtained on some variable)

Used to draw conclusions• Inferential or to make inferences

Methods about a population basedon the observation of asample

DESCRIPTIVE STATISTICS

Page 4: Descriptive statistics

Nature of Statistical Data/Levels of Measurement

1. Nominal2. Ordinal3. Interval4. Ratio

Importance: The nature of a set of data may suggest the use of particular statistical techniques

DESCRIPTIVE STATISTICS

Page 5: Descriptive statistics

Nature of Statistical Data/ Levels of Measurement

1. 2. 3. 4.NOMINAL ORDINAL INTERVAL

RATIO

Categories or Qualities

Numbers are used simply as labels for groups orclasses

Number convey no numerical information

Example:1 for YES, 2 for NO 1 – Red, 2 – Yellow, 3 - Green

DESCRIPTIVE STATISTICS

Page 6: Descriptive statistics

Nature of Statistical Data/ Levels of Measurement

1. 2. 3. 4.NOMINAL ORDINAL INTERVAL

RATIO

Data may be ordered using inequality according

to their size or quality

Example: Mohs’ Scale of Hardness1 – Talc 2 – Gypsum, 3 – Calcite 4 –

Fluorite

DESCRIPTIVE STATISTICS

Page 7: Descriptive statistics

Nature of Statistical Data/ Levels of Measurement 1. 2. 3. 4.NOMINAL ORDINAL INTERVAL

RATIOExample: Mohs’ Scale of Hardness1 – Talc 2 – Gypsum, 3 – Calcite 4 – Fluorite

Data may be ranked (but no indication of how much of the variable exists)

3>2 : Calcite is harder than gypsum

Differences and Ratios between data values aremeaningless2 – 1 = 4 – 3 : The difference in hardness between gypsum and talc is equal to the difference in hardness between fluorite and calcite.

4 ÷ 2 – 2: Fluorite is twice as hard as gypsum.

DESCRIPTIVE STATISTICS

Page 8: Descriptive statistics

Nature of Statistical Data/ Levels of Measurement

1. 2. 3. 4.NOMINAL ORDINAL INTERVAL

RATIO

Differences between data values represent equalamounts in the magnitude of the variablemeasured

No true zero (the complete absence of the

variable measured)

Example: Temperatures in degrees Fahrenheit

and degrees Celsius

DESCRIPTIVE STATISTICS

Page 9: Descriptive statistics

Nature of Statistical Data/ Levels of Measurement

1. 2. 3. 4.NOMINAL ORDINAL INTERVAL

RATIO

Example: Temperatures in degrees Fahrenheit and degreesCelsius

Ranking and taking differences are permitted.

100˚F > 98˚F : 100˚F is warmer than 98˚F. 100˚F - 98˚F = 52˚F - 50˚F : The same amount of heat is requiredto raise the temperature of an object from 98˚F to 100˚F andfrom 50˚F to 52˚F

Ratios are meaningless.

100˚F is twice as hot as 50˚F

In degrees Celsius, 100˚ is 37.8˚C and 50˚F is 10˚C.

DESCRIPTIVE STATISTICS

Page 10: Descriptive statistics

Nature of Statistical Data/ Levels of Measurement

1. 2. 3. 4.NOMINAL ORDINAL INTERVAL RATIO

Has a true zero as a starting point for allmeasurements.

Example: length, height, elapsed time,

volume

Taking ratios and differences, and ranking are

permitted

DESCRIPTIVE STATISTICS

Page 11: Descriptive statistics

Generating Data

The Mach PyramidIs this pyramid with a small square projecting out towards you?

OR

Is this a room with the small square as the far wall?

DESCRIPTIVE STATISTICS

Page 12: Descriptive statistics

Generating Data

The Mach Pyramid

Within a 1-minuteperiod, how long(in seconds) doesit take for a personto see this as a Pyramid?

DESCRIPTIVE STATISTICS

Page 13: Descriptive statistics

The Data Set from 25 subjects(The Mach Pyramid Test)

1 2 2 3 33 4 4 5 55 6 6 7 77 7 8 8 99 10 10 11 11

DESCRIPTIVE STATISTICS

Page 14: Descriptive statistics

Descriptive Statistics – provides a picture about the data set

1. What is the shape of the distribution? Dothe values tend to fall into somerecognizable pattern?

2. What is the location of the variable? That is,where are the numbers centered?

3. How much variation is involved? Are the values widely dispersed or are they all fairlyclose in value?

DESCRIPTIVE STATISTICS

Page 15: Descriptive statistics

Picturing the DistributionThe distribution of a data set is a listing of the frequenciesof occurrence of the measurements in the data set.

Tools:

1. Stem-and-Leaf Display

2. Dot plot

3. Frequency Distributions

4. Graphical Presentations

DESCRIPTIVE STATISTICS

Page 16: Descriptive statistics

DOTPLOT: Data Set from 25 subjects(The Mach Pyramid Test)

1 2 2 3 3 3 4 4 55 5 6 6 7 7 7 7 88 9 9 10 10 11 11

DESCRIPTIVE STATISTICS

Page 17: Descriptive statistics

The Shape of The Distribution

Symmetric bell-shaped (data tend to cluster about a center point)

Skewed right or positively skewed (data are clustered off-center to the left)

Skewed left or negatively skewed (data are clustered off-center to the right)

DESCRIPTIVE STATISTICS

Page 18: Descriptive statistics

Describing the ‘Center’ of the Data Set: Measures of Central Location

1. Mean The ratio of the sum of all the values in the data set

and the total number of values in data set.

2. Median The middle value of an ordered data set. If the total number of values is odd. If the total number of values is even, it is the mean of the two middle values.

3. Mode The most frequently occurring value in the data set.

DESCRIPTIVE STATISTICS

Page 19: Descriptive statistics

Measures of Central Location

1. Mean = 6.12The ratio of the sum of all the values in the data set and the total number of values in the data set.

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11

Mean

= 153 ÷ 25

= 6.12

Data set from 25 subjects

DESCRIPTIVE STATISTICS

Page 20: Descriptive statistics

Measures of Central Location

1. Mean = 6.12

2. Median = 6The middle value of an ordered data set if the total number of values is odd. If the total number of values is even, it is the mean of the two middle values.

3. Mode = 7The most frequently occurring value in the data set.

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11

Data set from 25 subjects

DESCRIPTIVE STATISTICS

Page 21: Descriptive statistics

Describing the ‘Center’ of the Data Set: Measures of Central

Location1. Mean Summarizes all the information in the data

set.

2. Median Splits the data sets into two halves: there are an equal number of values above and below it.

3. Mode The most common value in the data set.

DESCRIPTIVE STATISTICS

Page 22: Descriptive statistics

Measures of Central Location: Mean, Median, Mode

SHORTCOMINGSMerits

MEAN

Takes into account every value in the data set. Always exists and unique. Most useful of the three for inferential statistics.

Can be influenced by extremely high or low values (outliers)

DESCRIPTIVE STATISTICS

Page 23: Descriptive statistics

Measures of Central Location: Mean, Median, Mode

SHORTCOMINGSMerits

MEDIAN

Not easily affected by outliers (extreme values). Always exists and unique.

Less reliable than the mean – the medians of many samples drawn from the same population will vary more widely than the corresponding sample means.

DESCRIPTIVE STATISTICS

Page 24: Descriptive statistics

Measures of Central Location: Mean, Median, Mode

SHORTCOMINGSMerits

MODERequires no calculation, only counting

Not a stable measure – it depends only a few valuesMay not existMoy not be unique

DESCRIPTIVE STATISTICS

Page 25: Descriptive statistics

The Shape of The Distribution

Symmetaric bell-shaped

Skewed right or positively skewed

Skewed left or negatively skewed

DESCRIPTIVE STATISTICS

Page 26: Descriptive statistics

Locating the “Centers” in the DOTPLOT

Mean = 6.12 Median = 6Mode = 7

1 2 3 4 5 6 7 8 9 10 11

DESCRIPTIVE STATISTICS

Page 27: Descriptive statistics

Describing the ‘Spread’ of the Data: Measures of Variability

1. Range The largest value minus the smallest value in a data set.

2. Semi- One half of the difference between the Interquartile 75th percentile and the 25th percentile range (SIQR

3. Standard Deviation The square root of the average of the squared deviations from the mean

DESCRIPTIVE STATISTICS

Page 28: Descriptive statistics

Describing the ‘Spread’ of the Data:

Measures of Variability

1. Range = 10

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11

Data set from 25 subjects

The largest value minus the smallest value in a data set

Range = 11- 1 = 10

DESCRIPTIVE STATISTICS

Page 29: Descriptive statistics

Measures of Variability: Range, SIQR, SD

RANGE

A ‘quick and easy’ indication of variability

Provides no indication concerning the dispersion of the

values which wall between the two extremes

Relatively unstable measure of variability because it can be

influenced by change in the highest or lowest value

DESCRIPTIVE STATISTICS

MERITS SHORTCOMINGS

Page 30: Descriptive statistics

Measures of Variability: Range, SIQR, SD

MERITS SHORTCOMINGS

SEMI-INTERQUARTILE RANGE

More resistant to extreme values than the range

Does not utilize all the values in the data or set for its

computation

DESCRIPTIVE STATISTICS

Page 31: Descriptive statistics

Measures of Variability: Range, SIQR, SD

MERITS SHORTCOMINGS

STANDARD DEVIATION

Use all the values in the data for its computation

DESCRIPTIVE STATISTICS

Page 32: Descriptive statistics

Descriptive Statistics: What to use ?

Considering in choosing:

• The scale of the measurement represented by the data set

• The shape of the distribution

• The intended use of the descriptive statistics for further statistical analysis

DESCRIPTIVE STATISTICS

Page 33: Descriptive statistics

Describing the ‘Spread’ and the Center of the Data1. Range = 10

Indicates the variation between the smallest and the largest valuesin the data set; but does not tell how much the other values vary

DESCRIPTIVE STATISTICS Mean = 6.12 Median = 6 Mode = 7

Range = 10

Page 34: Descriptive statistics

Describing the ‘Spread’ and the Center of the Data2. Semi-interquartile range = 2.375

Describes the spread of the values in the data set for 25% of the total itemsabove and below the median If the SIQR is small, the values are concentrated near the median

DESCRIPTIVE STATISTICS Mean = 6.12 Median = 6 Mode = 7

belowabove

median 8.375 (6 + 2.375)

3.625 (6 + 2.375)

Page 35: Descriptive statistics

Describing the ‘Spread’ and the Center of the Data

2. Standard Deviation = 2.934A measure of variability based on how far each value if from themean of the data set

DESCRIPTIVE STATISTICS

Empirical Rule for symmetric bell-shaped distributions

About 68% of the values will liewithin 1 standard deviation of themeanAbout 95% of the values will lieWithin 2 standard deviation of theMean

About 99.7% of the values will liewithin 3 standard deviation of themean

Page 36: Descriptive statistics

Describing the ‘Spread’ and the Center of the Data

3. Standard Deviation = 2.934If the standard deviation is small, the values are concentrated near the

mean.If the standard deviation is LARGE, the values are scattered widely about the

mean.

DESCRIPTIVE STATISTICS Mean = 6.12 Median = 6 Mode = 7

3.186 (6.12 – 2.934)

1 sd below1 sd above

mean

9.054 ( 6.12 + 2.934)

Page 37: Descriptive statistics

Describing the ‘Spread’ of the Data:Measures of Variability

2. Semi-interquartile rangeOne half of the differenceBetween the 75th and the

25th percentileData Set from 25 subjects

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11

DESCRIPTIVE STATISTICS

18th and 19th items

75th percentile – the value in thedate set which is exceeded by 25%of the total number of items in theset25 x (0.75) = 18.7518.75 : rank of the 75th percentile18th + 19th items = 875th percentile = 8

Page 38: Descriptive statistics

Describing the ‘Spread’ of the Data:

Measures of Variability

2. Semi-interquartile range

One half of the differenceBetween the 75th and the

25th percentileData Set from 25 subjects

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11

DESCRIPTIVE STATISTICS

6th and 7th items

25th percentile – the value in thedate set which is exceeded by 75% of the total number of items in the set25 x (0.25) = 6.256.25 : rank of the 25th percentile6th item = 3 7th item = 425th percentile = 3 + (0.25)(4-3)25th percentile = 3.25

Page 39: Descriptive statistics

Describing the ‘Spread’ of the Data:

Measures of Variability

2. Semi-interquartile = 2.375 range

One half of the difference

Between the 75th and the 25th

percentile

Data Set from 25 subjects

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11

DESCRIPTIVE STATISTICS

75th percentile = 8

25th percentile = 3.25

SIQR = ½ (8 - 3.25)

SIQR = 2.375

Page 40: Descriptive statistics

Describing the ‘Spread’ of the Data:

Measures of Variability

3. Standard = 2.934 Deviation

The Square root of the average of the squared deviations from

the mean

Data Set from 25 subjects

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11

DESCRIPTIVE STATISTICS

SD = ∫∑ ( x - x1 ) ²

__________

n-1

= 2.934

Mean (x1 ) = 6.12