Some definitions In Statistics. A sample: Is a subset of the population.

117
Some definitions In Statistics

Transcript of Some definitions In Statistics. A sample: Is a subset of the population.

Page 1: Some definitions In Statistics. A sample: Is a subset of the population.

Some definitions

In Statistics

Page 2: Some definitions In Statistics. A sample: Is a subset of the population.

A sample:

Is a subset of the population

Page 3: Some definitions In Statistics. A sample: Is a subset of the population.

In statistics:

One draws conclusions about the population based on data collected from a sample

Page 4: Some definitions In Statistics. A sample: Is a subset of the population.

Reasons:

Cost

It is less costly to collect data from a sample then the entire population

Accuracy

Page 5: Some definitions In Statistics. A sample: Is a subset of the population.

Accuracy

Data from a sample sometimes leads to more accurate conclusions then data from the entire population

Costs saved from using a sample can be directed to obtaining more accurate observations on each case in the population

Page 6: Some definitions In Statistics. A sample: Is a subset of the population.

Types of Samples

different types of samples are determined by how the sample is selected.

Page 7: Some definitions In Statistics. A sample: Is a subset of the population.

Convenience Samples

In a convenience sample the subjects that are most convenient to the researcher are selected as objects in the sample.

This is not a very good procedure for inferential Statistical Analysis but is useful for exploratory preliminary work.

Page 8: Some definitions In Statistics. A sample: Is a subset of the population.

Quota samples

In quota samples subjects are chosen conveniently until quotas are met for different subgroups of the population.

This also is useful for exploratory preliminary work.

Page 9: Some definitions In Statistics. A sample: Is a subset of the population.

Random Samples

Random samples of a given size are selected in such that all possible samples of that size have the same probability of being selected.

Page 10: Some definitions In Statistics. A sample: Is a subset of the population.

Convenience Samples and Quota samples are useful for preliminary studies. It is however difficult to assess the accuracy of estimates based on this type of sampling scheme.

Sometimes however one has to be satisfied with a convenience sample and assume that it is equivalent to a random sampling procedure

Page 11: Some definitions In Statistics. A sample: Is a subset of the population.

Population

Sample

Case

Variables

X

Y

Z

Page 12: Some definitions In Statistics. A sample: Is a subset of the population.

Some other definitions

Page 13: Some definitions In Statistics. A sample: Is a subset of the population.

A population statistic (parameter):

Any quantity computed from the values of variables for the entire population.

Page 14: Some definitions In Statistics. A sample: Is a subset of the population.

A sample statistic:

Any quantity computed from the values of variables for the cases in the sample.

Page 15: Some definitions In Statistics. A sample: Is a subset of the population.

Since only cases from the sample are observed

– only sample statistics are computed– These are used to make inferences about

population statistics– It is important to be able to assess the accuracy

of these inferences

Page 16: Some definitions In Statistics. A sample: Is a subset of the population.

To download lectures1. Go to the stats 244 web site

a) Through PAWS or

b) by going to the website of the department of Mathematics and Statistics -> people -> faculty -> W.H. Laverty -> Stats 244-. Lectures.

2. Then a) select the lecture

b) Right click and choose Save as

Page 17: Some definitions In Statistics. A sample: Is a subset of the population.

To print lectures1. Open the lecture using MS Powerpoint

2. Select the menu item File -> Print

Page 18: Some definitions In Statistics. A sample: Is a subset of the population.

The following dialogue box appear

Page 19: Some definitions In Statistics. A sample: Is a subset of the population.

In the Print what box, select handouts

Page 20: Some definitions In Statistics. A sample: Is a subset of the population.

Set Slides per page to 6 or 3.

Page 21: Some definitions In Statistics. A sample: Is a subset of the population.

6 slides per page will result in the least amount of paper being printed

1 2

3 4

5 6

Page 22: Some definitions In Statistics. A sample: Is a subset of the population.

3 slides per page leaves room for notes.

1

2

3

Page 23: Some definitions In Statistics. A sample: Is a subset of the population.

Organizing and describing Data

Page 24: Some definitions In Statistics. A sample: Is a subset of the population.

Techniques for continuous variables

Page 25: Some definitions In Statistics. A sample: Is a subset of the population.

The Grouped frequency table:The Histogram

Page 26: Some definitions In Statistics. A sample: Is a subset of the population.

To Construct

• A Grouped frequency table

• A Histogram

Page 27: Some definitions In Statistics. A sample: Is a subset of the population.

1. Find the maximum and minimum of the observations.

2. Choose non-overlapping intervals of equal width (The Class Intervals) that cover the range between the maximum and the minimum.

3. The endpoints of the intervals are called the class boundaries.

4. Count the number of observations in each interval (The cell frequency - f).

5. Calculate relative frequencyrelative frequency = f/N

Page 28: Some definitions In Statistics. A sample: Is a subset of the population.

 Data Set #3

The following table gives data on Verbal IQ, Math IQ,Initial Reading Acheivement Score, and Final Reading Acheivement Score

for 23 students who have recently completed a reading improvement program 

Initial FinalVerbal Math Reading Reading

Student IQ IQ Acheivement Acheivement 

1 86 94 1.1 1.72 104 103 1.5 1.73 86 92 1.5 1.94 105 100 2.0 2.05 118 115 1.9 3.56 96 102 1.4 2.47 90 87 1.5 1.88 95 100 1.4 2.09 105 96 1.7 1.7

10 84 80 1.6 1.711 94 87 1.6 1.712 119 116 1.7 3.113 82 91 1.2 1.814 80 93 1.0 1.715 109 124 1.8 2.516 111 119 1.4 3.017 89 94 1.6 1.818 99 117 1.6 2.619 94 93 1.4 1.420 99 110 1.4 2.021 95 97 1.5 1.322 102 104 1.7 3.123 102 93 1.6 1.9

Page 29: Some definitions In Statistics. A sample: Is a subset of the population.

Verbal IQ Math IQ70 to 80 1 180 to 90 6 290 to 100 7 11

100 to 110 6 4110 to 120 3 4120 to 130 0 1

In this example the upper endpoint is included in the interval. The lower endpoint is not.

Page 30: Some definitions In Statistics. A sample: Is a subset of the population.

Histogram – Verbal IQ

0

1

2

3

4

5

6

7

8

70 to 80 80 to 90 90 to100

100 to110

110 to120

120 to130

Page 31: Some definitions In Statistics. A sample: Is a subset of the population.

Histogram – Math IQ

0

2

4

6

8

10

12

70 to 80 80 to 90 90 to100

100 to110

110 to120

120 to130

Page 32: Some definitions In Statistics. A sample: Is a subset of the population.

Example

• In this example we are comparing (for two drugs A and B) the time to metabolize the drug.

• 120 cases were given drug A.

• 120 cases were given drug B.

• Data on time to metabolize each drug is given on the next two slides

Page 33: Some definitions In Statistics. A sample: Is a subset of the population.

Drug A22.6 17.8 18.8 10.5 6.5 11.831.5 6.3 7.2 3.5 4.7 5.17.2 11.4 12.9 12.7 5.3 18.0

13.0 6.4 6.3 20.1 7.4 4.111.2 8.1 13.6 25.3 2.5 9.06.4 5.7 4.3 11.2 18.7 6.54.8 3.2 7.5 2.0 5.6 15.43.5 13.4 14.1 1.8 2.3 3.9

11.9 7.8 21.9 22.0 7.9 4.84.1 16.8 7.4 5.1 6.8 6.36.7 9.0 8.8 20.1 12.3 4.36.7 8.9 10.5 7.0 10.1 17.46.0 10.5 12.6 6.0 14.9 11.37.7 13.1 14.9 8.0 19.2 2.7

11.7 6.4 6.2 6.0 10.8 30.011.7 21.9 2.9 3.8 9.3 3.18.5 6.3 5.2 13.6 14.9 10.9

30.0 6.2 3.8 8.5 11.8 3.37.2 5.4 9.7 9.8 12.7 28.3

10.0 17.2 19.6 33.5 1.5 6.4

Page 34: Some definitions In Statistics. A sample: Is a subset of the population.

Drug B4.2 12.8 3.2 7.8 3.2 8.8

10.4 5.4 5.0 5.1 5.1 14.18.2 6.0 4.9 5.9 17.0 2.5

13.4 4.3 2.7 10.3 20.9 15.310.5 6.0 14.3 12.4 8.1 5.25.6 7.3 9.6 4.7 4.8 7.8

19.0 5.9 10.6 6.3 9.3 11.44.5 10.2 2.8 9.4 24.1 9.2

25.9 10.4 12.9 4.5 2.6 10.63.2 2.7 4.2 3.3 13.7 3.75.5 4.6 2.7 7.5 5.1 5.07.8 3.5 5.4 12.6 8.8 8.56.0 2.9 4.4 4.1 5.0 12.15.3 3.0 5.7 3.0 9.7 8.54.8 4.6 7.7 4.8 4.1 6.9

10.8 13.4 5.8 5.3 7.7 12.15.4 8.3 4.1 9.3 8.3 8.0

25.2 2.9 11.5 8.8 5.9 4.16.6 15.1 12.3 10.9 6.0 2.35.1 4.0 5.1 7.4 16.0 2.8

Page 35: Some definitions In Statistics. A sample: Is a subset of the population.

Grouped frequency tablesClass interval Drug A Drug B

0 to 4 15 194 to 8 43 54

8 to 12 26 2612 to 16 15 1516 to 20 9 220 to 24 6 124 to 28 1 328 to 32 4 032 to 36 1 036 to 40 0 040 to 44 0 044 to 48 0 0

Page 36: Some definitions In Statistics. A sample: Is a subset of the population.

Histogram – drug A(time to metabolize)

0

10

20

30

40

50

60

Page 37: Some definitions In Statistics. A sample: Is a subset of the population.

Histogram – drug B(time to metabolize)

0

10

20

30

40

50

60

Page 38: Some definitions In Statistics. A sample: Is a subset of the population.

Some comments about histograms

• The width of the class intervals should be chosen so that the number of intervals with a frequency less than 5 is small.

• This means that the width of the class intervals can decrease as the sample size increases

Page 39: Some definitions In Statistics. A sample: Is a subset of the population.

• If the width of the class intervals is too small. The frequency in each interval will be either 0 or 1

• The histogram will look like this

Page 40: Some definitions In Statistics. A sample: Is a subset of the population.

• If the width of the class intervals is too large. One class interval will contain all of the observations.

• The histogram will look like this

Page 41: Some definitions In Statistics. A sample: Is a subset of the population.

• Ideally one wants the histogram to appear as seen below.

• This will be achieved by making the width of the class intervals as small as possible and only allowing a few intervals to have a frequency less than 5.

0

10

20

30

40

50

60

70

80

60 -

65

70 -

75

80 -

85

90 -

95

100

- 105

110

- 115

120

- 125

130

- 135

140

- 145

150

- 155

Page 42: Some definitions In Statistics. A sample: Is a subset of the population.

• As the sample size increases the histogram will approach a smooth curve.

• This is the histogram of the population

0

10

20

30

40

50

60

70

80

60 -

65

70 -

75

80 -

85

90 -

95

100

- 105

110

- 115

120

- 125

130

- 135

140

- 145

150

- 155

Page 43: Some definitions In Statistics. A sample: Is a subset of the population.

N = 25

01

23

45

67

89

10

60 - 70 70 - 80 80 - 90 90 - 100 100 -110

110 -120

120 -130

130 -140

140 -150

Page 44: Some definitions In Statistics. A sample: Is a subset of the population.

N = 100

0

5

10

15

20

25

30

60 - 70 70 - 80 80 - 90 90 - 100 100 - 110 110 - 120 120 - 130 130 - 140 140 - 150

Page 45: Some definitions In Statistics. A sample: Is a subset of the population.

N = 500

0

10

20

30

40

50

60

70

80

60 -

65

70 -

75

80 -

85

90 -

95

100

- 105

110

- 115

120

- 125

130

- 135

140

- 145

150

- 155

Page 46: Some definitions In Statistics. A sample: Is a subset of the population.

N = 2000

0

20

40

60

80

100

120

140

62 -

64

70 -

72

78 -

80

86 -

88

94 -

96

102

- 104

110

- 112

118

- 120

126

- 128

134

- 136

142

- 144

Page 47: Some definitions In Statistics. A sample: Is a subset of the population.

N = ∞

0

0.005

0.01

0.015

0.02

0.025

0.03

50 60 70 80 90 100 110 120 130 140 150

Page 48: Some definitions In Statistics. A sample: Is a subset of the population.

Comment: the proportion of area under a histogram between two points estimates the proportion of cases in the sample (and the population) between those two values.

Page 49: Some definitions In Statistics. A sample: Is a subset of the population.

Example: The following histogram displays the birth weight (in Kg’s) of n = 100 births

1 13

1011

1917

20

12

4

1 1

0

5

10

15

20

25

0.085to

0.113

0.113to

0.142

0.142to

0.17

0.17to

0.198

0.198to

0.227

0.227to

0.255

0.255to

0.283

0.283to

0.312

0.312to

0.34

0.34to

0.369

0.369to

0.397

0.397to

0.425

0.425to

0.454

0.454to

0.482

Page 50: Some definitions In Statistics. A sample: Is a subset of the population.

Find the proportion of births that have a birthweight less than 0.34 kg.

Page 51: Some definitions In Statistics. A sample: Is a subset of the population.

Proportion = (1+1+3+10+11+19+17)/100 = 0.62

Page 52: Some definitions In Statistics. A sample: Is a subset of the population.

The Characteristics of a Histogram

• Central Location (average)

• Spread (Variability, Dispersion)

• Shape

Page 53: Some definitions In Statistics. A sample: Is a subset of the population.

Central Location

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

Page 54: Some definitions In Statistics. A sample: Is a subset of the population.

Spread, Dispersion, Variability

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

Page 55: Some definitions In Statistics. A sample: Is a subset of the population.

Shape – Bell Shaped (Normal)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

Page 56: Some definitions In Statistics. A sample: Is a subset of the population.

Shape – Positively skewed

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

Page 57: Some definitions In Statistics. A sample: Is a subset of the population.

Shape – Negatively skewed

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

Page 58: Some definitions In Statistics. A sample: Is a subset of the population.

Shape – Platykurtic

0

-3 -2 -1 0 1 2 3

Page 59: Some definitions In Statistics. A sample: Is a subset of the population.

Shape – Leptokurtic

0

-3 -2 -1 0 1 2 3

Page 60: Some definitions In Statistics. A sample: Is a subset of the population.

Shape – Bimodal

0

-3 -2 -1 0 1 2 3

Page 61: Some definitions In Statistics. A sample: Is a subset of the population.

The Stem-Leaf Plot

An alternative to the histogram

Page 62: Some definitions In Statistics. A sample: Is a subset of the population.

Each number in a data set can be broken into two parts

– A stem

– A Leaf

Page 63: Some definitions In Statistics. A sample: Is a subset of the population.

Example

Verbal IQ = 84

84

–Stem = 10 digit = 8

– Leaf = Unit digit = 4

LeafStem

Page 64: Some definitions In Statistics. A sample: Is a subset of the population.

Example

Verbal IQ = 104

104

–Stem = 10 digit = 10

– Leaf = Unit digit = 4

LeafStem

Page 65: Some definitions In Statistics. A sample: Is a subset of the population.

To Construct a Stem- Leaf diagram

• Make a vertical list of “all” stems

• Then behind each stem make a horizontal list of each leaf

Page 66: Some definitions In Statistics. A sample: Is a subset of the population.

Example

The data on N = 23 students

Variables

• Verbal IQ

• Math IQ

• Initial Reading Achievement Score

• Final Reading Achievement Score

Page 67: Some definitions In Statistics. A sample: Is a subset of the population.

 Data Set #3

The following table gives data on Verbal IQ, Math IQ,Initial Reading Acheivement Score, and Final Reading Acheivement Score

for 23 students who have recently completed a reading improvement program 

Initial FinalVerbal Math Reading Reading

Student IQ IQ Acheivement Acheivement 

1 86 94 1.1 1.72 104 103 1.5 1.73 86 92 1.5 1.94 105 100 2.0 2.05 118 115 1.9 3.56 96 102 1.4 2.47 90 87 1.5 1.88 95 100 1.4 2.09 105 96 1.7 1.7

10 84 80 1.6 1.711 94 87 1.6 1.712 119 116 1.7 3.113 82 91 1.2 1.814 80 93 1.0 1.715 109 124 1.8 2.516 111 119 1.4 3.017 89 94 1.6 1.818 99 117 1.6 2.619 94 93 1.4 1.420 99 110 1.4 2.021 95 97 1.5 1.322 102 104 1.7 3.123 102 93 1.6 1.9

Page 68: Some definitions In Statistics. A sample: Is a subset of the population.

We now construct:

a stem-Leaf diagram

of Verbal IQ

Page 69: Some definitions In Statistics. A sample: Is a subset of the population.

A vertical list of the stems8

9

10

11

12

We now list the leafs behind stem

Page 70: Some definitions In Statistics. A sample: Is a subset of the population.

8

9

10

11

12

86 104 86 105 118 96 90 95 105 84

94 119 82 80 109 111 89 99 94 99

95 102 102

Page 71: Some definitions In Statistics. A sample: Is a subset of the population.

8

9

10

11

12

86 104 86 105 118 96 90 95 105 84

94 119 82 80 109 111 89 99 94 99

95 102 102

Page 72: Some definitions In Statistics. A sample: Is a subset of the population.

8 6 6 4 2 0 9

9 6 0 5 4 9 4 9 5

10 4 5 5 9 2 2

11 8 9 1

12

Page 73: Some definitions In Statistics. A sample: Is a subset of the population.

8 0 2 4 6 6 9

9 0 4 4 5 5 6 9 9

10 2 2 4 5 5 9

11 1 8 9

12

The leafs may be arranged in order

Page 74: Some definitions In Statistics. A sample: Is a subset of the population.

8 0 2 4 6 6 9

9 0 4 4 5 5 6 9 9

10 2 2 4 5 5 9

11 1 8 9

12

The stem-leaf diagram is equivalent to a histogram

Page 75: Some definitions In Statistics. A sample: Is a subset of the population.

8 0 2 4 6 6 9

9 0 4 4 5 5 6 9 9

10 2 2 4 5 5 9

11 1 8 9

12

The stem-leaf diagram is equivalent to a histogram

Page 76: Some definitions In Statistics. A sample: Is a subset of the population.

Rotating the stem-leaf diagram we have

80 90 100 110 120

Page 77: Some definitions In Statistics. A sample: Is a subset of the population.

The two part stem leaf diagram

Sometimes you want to break the stems into two parts

for leafs 0,1,2,3,4

* for leafs 5,6,7,8,9

Page 78: Some definitions In Statistics. A sample: Is a subset of the population.

Stem-leaf diagram for Initial Reading Acheivement

1. 01234444455556666677789

2. 0

This diagram as it stands does not

give an accurate picture of the

distribution

Page 79: Some definitions In Statistics. A sample: Is a subset of the population.

We try breaking the stems into

two parts

1.* 012344444

1. 55556666677789

2.* 0

2.

Page 80: Some definitions In Statistics. A sample: Is a subset of the population.

The five-part stem-leaf diagram

If the two part stem-leaf diagram is not adequate you can break the stems into five parts

for leafs 0,1

t for leafs 2,3

f for leafs 4, 5

s for leafs 6,7

* for leafs 8,9

Page 81: Some definitions In Statistics. A sample: Is a subset of the population.

We try breaking the stems into

five parts

1.* 01

1.t 23

1.f 444445555

1.s 66666777

1. 89

2.* 0

Page 82: Some definitions In Statistics. A sample: Is a subset of the population.

Stem leaf Diagrams

Verbal IQ, Math IQ, Initial RA, Final RA

Page 83: Some definitions In Statistics. A sample: Is a subset of the population.

Some Conclusions

• Math IQ, Verbal IQ seem to have approximately the same distribution

• “bell shaped” centered about 100

• Final RA seems to be larger than initial RA and more spread out

• Improvement in RA

• Amount of improvement quite variable

Page 84: Some definitions In Statistics. A sample: Is a subset of the population.

Numerical Measures

• Measures of Central Tendency (Location)

• Measures of Non Central Location

• Measure of Variability (Dispersion, Spread)

• Measures of Shape

Page 85: Some definitions In Statistics. A sample: Is a subset of the population.

Measures of Central Tendency (Location)

• Mean

• Median

• Mode

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

Central Location

Page 86: Some definitions In Statistics. A sample: Is a subset of the population.

Measures of Non-central Location

• Quartiles, Mid-Hinges

• Percentiles

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

Non - Central Location

Page 87: Some definitions In Statistics. A sample: Is a subset of the population.

Measure of Variability (Dispersion, Spread)

• Variance, standard deviation

• Range

• Inter-Quartile Range

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

Variability

Page 88: Some definitions In Statistics. A sample: Is a subset of the population.

Measures of Shape• Skewness

• Kurtosis

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

0

-3 -2 -1 0 1 2 3

0

-3 -2 -1 0 1 2 3

Page 89: Some definitions In Statistics. A sample: Is a subset of the population.

Measures of Central Location (Mean)

Summation Notation

Let x1, x2, x3, … xn denote a set of n numbers.

Then the symbol

denotes the sum of these n numbers

x1 + x2 + x3 + …+ xn

n

iix

1

Page 90: Some definitions In Statistics. A sample: Is a subset of the population.

Example

Let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table.

i 1 2 3 4 5

xi 10 15 21 7 13

Page 91: Some definitions In Statistics. A sample: Is a subset of the population.

Then the symbol

denotes the sum of these 5 numbers

x1 + x2 + x3 + x4 + x5

= 10 + 15 + 21 + 7 + 13

= 66

5

1iix

Page 92: Some definitions In Statistics. A sample: Is a subset of the population.

Meaning of parts of summation notation

n

mi

i in expression

Quantity changing in each term of the sum

Starting value for i

Final value for i

each term of the sum

Page 93: Some definitions In Statistics. A sample: Is a subset of the population.

Example

Again let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table.

i 1 2 3 4 5

xi 10 15 21 7 13

Page 94: Some definitions In Statistics. A sample: Is a subset of the population.

Then the symbol

denotes the sum of these 3 numbers

= 153 + 213 + 73

= 3375 + 9261 + 343

= 12979

34

33

32 xxx

4

2

3

iix

Page 95: Some definitions In Statistics. A sample: Is a subset of the population.

Mean

Let x1, x2, x3, … xn denote a set of n numbers.

Then the mean of the n numbers is defined as:

n

xxxxx

n

xx nn

n

ii

13211

Page 96: Some definitions In Statistics. A sample: Is a subset of the population.

Example

Again let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table.

i 1 2 3 4 5

xi 10 15 21 7 13

Page 97: Some definitions In Statistics. A sample: Is a subset of the population.

Then the mean of the 5 numbers is:

5554321

5

1 xxxxxx

x ii

2.135

66

5

137211510

Page 98: Some definitions In Statistics. A sample: Is a subset of the population.

Interpretation of the Mean

Let x1, x2, x3, … xn denote a set of n numbers.

Then the mean, , is the centre of gravity of those the n numbers.

That is if we drew a horizontal line and placed a weight of one at each value of xi , then the balancing point of that system of mass is at the point .

x

x

Page 99: Some definitions In Statistics. A sample: Is a subset of the population.

x1 x2x3 x4xn

x

Page 100: Some definitions In Statistics. A sample: Is a subset of the population.

107 15 2113

2.13x

In the Example

100 20

Page 101: Some definitions In Statistics. A sample: Is a subset of the population.

The mean, , is also approximately the center of gravity of a histogram

0

5

10

15

20

25

30

60 - 70 70 - 80 80 - 90 90 - 100 100 - 110 110 - 120 120 - 130 130 - 140 140 - 150

x

x

Page 102: Some definitions In Statistics. A sample: Is a subset of the population.

The Median

Let x1, x2, x3, … xn denote a set of n numbers.

Then the median of the n numbers is defined as the number that splits the numbers into two equal parts.

To evaluate the median we arrange the numbers in increasing order.

Page 103: Some definitions In Statistics. A sample: Is a subset of the population.

If the number of observations is odd there will be one observation in the middle.

This number is the median.

If the number of observations is even there will be two middle observations.

The median is the average of these two observations

Page 104: Some definitions In Statistics. A sample: Is a subset of the population.

Example

Again let x1, x2, x3, x3 , x4, x5 denote a set of 5 denote the set of numbers in the following table.

i 1 2 3 4 5

xi 10 15 21 7 13

Page 105: Some definitions In Statistics. A sample: Is a subset of the population.

The numbers arranged in order are:

7 10 13 15 21

Unique “Middle” observation – the median

Page 106: Some definitions In Statistics. A sample: Is a subset of the population.

Example 2

Let x1, x2, x3 , x4, x5 , x6 denote the 6 denote numbers:

23 41 12 19 64 8

Arranged in increasing order these observations would be:

8 12 19 23 41 64

Two “Middle” observations

Page 107: Some definitions In Statistics. A sample: Is a subset of the population.

Median

= average of two “middle” observations =

212

42

2

2319

Page 108: Some definitions In Statistics. A sample: Is a subset of the population.

Example

The data on N = 23 students

Variables

• Verbal IQ

• Math IQ

• Initial Reading Achievement Score

• Final Reading Achievement Score

Page 109: Some definitions In Statistics. A sample: Is a subset of the population.

 Data Set #3

The following table gives data on Verbal IQ, Math IQ,Initial Reading Acheivement Score, and Final Reading Acheivement Score

for 23 students who have recently completed a reading improvement program 

Initial FinalVerbal Math Reading Reading

Student IQ IQ Acheivement Acheivement 

1 86 94 1.1 1.72 104 103 1.5 1.73 86 92 1.5 1.94 105 100 2.0 2.05 118 115 1.9 3.56 96 102 1.4 2.47 90 87 1.5 1.88 95 100 1.4 2.09 105 96 1.7 1.7

10 84 80 1.6 1.711 94 87 1.6 1.712 119 116 1.7 3.113 82 91 1.2 1.814 80 93 1.0 1.715 109 124 1.8 2.516 111 119 1.4 3.017 89 94 1.6 1.818 99 117 1.6 2.619 94 93 1.4 1.420 99 110 1.4 2.021 95 97 1.5 1.322 102 104 1.7 3.123 102 93 1.6 1.9

Total 2244 2307 35.1 48.3

Initial FinalVerbal Math Reading Reading

IQ IQ Acheivement AcheivementMeans 97.57 100.30 1.526 2.100

Page 110: Some definitions In Statistics. A sample: Is a subset of the population.

Computing the Median

Stem leaf Diagrams

Median = middle observation =12th observation

Page 111: Some definitions In Statistics. A sample: Is a subset of the population.

Summary

Initial FinalVerbal Math Reading Reading

IQ IQ Acheivement AcheivementMeans 97.57 100.30 1.526 2.100Median 96 97 1.5 1.9

Page 112: Some definitions In Statistics. A sample: Is a subset of the population.

Some Comments

• The mean is the centre of gravity of a set of observations. The balancing point.

• The median splits the obsevations equally in two parts of approximately 50%

Page 113: Some definitions In Statistics. A sample: Is a subset of the population.

• The median splits the area under a histogram in two parts of 50%

• The mean is the balancing point of a histogram

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

50%

50%

xmedian

Page 114: Some definitions In Statistics. A sample: Is a subset of the population.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

• For symmetric distributions the mean and the median will be approximately the same value

50% 50%

xMedian &

Page 115: Some definitions In Statistics. A sample: Is a subset of the population.

00.020.040.060.080.1

0.120.140.16

0 5 10 15 20 25

50%

xmedian

• For Positively skewed distributions the mean exceeds the median

• For Negatively skewed distributions the median exceeds the mean

50%

Page 116: Some definitions In Statistics. A sample: Is a subset of the population.

• An outlier is a “wild” observation in the data

• Outliers occur because – of errors (typographical and computational)– Extreme cases in the population

Page 117: Some definitions In Statistics. A sample: Is a subset of the population.

• The mean is altered to a significant degree by the presence of outliers

• Outliers have little effect on the value of the median

• This is a reason for using the median in place of the mean as a measure of central location

• Alternatively the mean is the best measure of central location when the data is Normally distributed (Bell-shaped)