Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic...

11
Basic Concepts and Measures of Location Daniel Y.T. Fong (Email: [email protected]) NURS4302 - STATISTICS School of Nursing, The University of Hong Kong Common Comments to a course on Statistics 1. Difficult ! 2. Not relevant to Nursing ! Florence Nightingale (1820-1910) Common Symptoms of Failed Students (student claims) 1. Did not come to class (they claimed they learnt statistics before) 2. Did not study notes (they claimed they learnt statistics before) 3. Shared assignments (they claimed they worked together) 4. Did not ask (teachers) for help! (they had friends who studied Statistics before)

Transcript of Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic...

Page 1: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

Basic Concepts and Measures of LocationDaniel Y.T. Fong(Email: [email protected])

NURS4302 - STATISTICS

School of Nursing, The University of Hong Kong

Common Comments to a course on Statistics

1. Difficult !2. Not relevant to Nursing !

Florence Nightingale(1820-1910)

南丁格爾

Common Symptoms of Failed Students (student claims)

1. Did not come to class (they claimed they learnt statistics before)

2. Did not study notes (they claimed they learnt statistics before)

3. Shared assignments (they claimed they worked together)

4. Did not ask (teachers) for help! (they had friends who studied Statistics before)

Page 2: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

Most of us live comfortably with some level ofuncertainty

What makes statistics unique is its ability to QUANTIFY UNCERTAINTY, to make it PRECISE. This allows

statisticians to make CATEGORICAL STATEMENTS, with complete assurance – about their level of uncertainty

Learning Objectives

1. To know the key concepts underlying statistics

2. To know what statistics can do

3. To identify the type of data

4. To understand the use of different location measures

Page 3: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

The Key Concepts- Variability

- Population and sample

Variability?

1 + 2 +3 + 4 = ?

10 10 10 10

NO !

Variability?

1 + 2 + · · · + 20 = ?

210 156 ? 250

Yes ! Due to uncertainty !

Variability is the Reality

The number of patients you saw each day varies

Your blood pressure varies Your body temperature varies Your height varies Your weight varies Your mood varies

Page 4: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

What Statistics is About …

Managing variability Quantifying uncertainty and the

strength of evidence of an experiment(Evidence based)

Problems Solvable by Statistics (1)

~ www.polaroid.com

Like capture laugher with your camera?

Men

Women

Yes

47%

58%

significantly different?

A telephone survey, conducted during 11-14 March 2004 in US, of 1013 subjects

Descriptive Statistics

Inferential Statistics

Problems Solvable by Statistics (2)

Group 1: New drug Group 2: Placebo

10 hypertensive

patients

11 hypertensive

patients

Were their BPs

after treatment different

?

If yes, how different they were?

Descriptive Statistics

Inferential Statistics

So, What is Statistics?

Descriptive Statistics

Inferential Statistics+

• For characterization

• For reporting

• To summarize data at hand

• For decision making

• To make generalization outside the data at hand

Page 5: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

Population

The prevalence of back pain in HK?

Quality of life in Asians?

The BP of a patient after calcium intake?

Population

The entire collection of individual units, which can be people or measurements, about which

information is desired

What do you want to do?The whole HK population

All Asians

All possible times after calcium intake in the patient

Often NOT manageable !

(Random) Sample

500 subjects in HK

500 subjects in each Asian country

1, 4, 26 weeks after calcium intake

Population The entire collection of

individual units, which can be people or measurements, about which information is desired

Sample

The whole HK population

All Asians

All possible times after calcium intake in the patient

A subset of the population selected for study

Must be manageable !

Population and Sample

• Sample mean BP• Sample proportion of

females• Mean BP• Proportion of females

Population Sample

Parameters (unknown)descriptive statistics

(known)

- The fundamental concept from which the statistical concept is based

Q & A

consists only of people

may be finite

may be infinite

can be any set of things in which we are interested

1. In statistical terms, a population

True or False ?

Page 6: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

Classification of Data- Affects the way the data are summarized

and analyzed

Summarize? variables

Two Types of Data/Variable

Age

Temperature

Gender

Educational level

Quantitative- takes values in numbers

Qualitative- takes values in words

Statistical Analysis is always feasible

Statistical Analysis may not be feasible

Religion

Remarks

When is Statistical Analysis of Qualitative Data Feasible?

Only when the qualitative data can be quantified

FemaleMale

10

Tertiary or aboveCollegeSecondaryPrimary

3210

Quantifiable?

Categorical data

:- quantifiable qualitative data

Page 7: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

Two Types of Quantitative Data

1. Continuous data■ May take values between any two plausible

values (uncountable)■ e.g. age

2. Discrete Data■ May take no values between two plausible

values (countable)■ e.g. number of hospital admissions per day

Between 2 and 3, one can be 2.1, 2.11, 2.111, 2.1111, etc.

Between 2 and 3, there are no values.

Levels of Measurement (1-2)

1. Nominal- categorical data without ranking order

2. Ordinal- categorical data with ranking order

Levels of Measurement (3-4)

3. Interval- quantitative data without a well-defined zero

4. Ratio- quantitative data with a well-defined zero

Samestatistical treatment

(in C)

Measurement Hierarchy

1. Nominal- categorical data withoutranking order

2. Ordinal- categorical data with ranking order

3. Interval/Ratio- quantitative data

20 or below (20, 30] (30, 40] (40, 50]60 or above

• Even• Odd

Age

...Decreasing information

Page 8: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

Data Types

Quantitative(takes numerical values)

• Discrete(whole numbers)e.g. number of accidents, household size

• Continuous(takes decimal places)e.g. height, weight

Qualitative/Categorical(takes coded numerical values)

• Ordinal(ranking order exists)e.g. Poor/Average/Good

• Nominal(no ranking order)e.g. gender, race

Q & A

Type of delivery (vaginal versus cesarean) is categorical

White blood cell count is continuous

Examination result (pass versus fail) is ordinal

Military rank (private, sergeant, etc.) is nominal

2. In statistical terms, …

True or False ?

Descriptive Statistics- To summarize data

- Mean, Median and Mode (measures of location)

Why Descriptive Statistics?

6 2 13 17 22 9 19 7 10 5

0 11 7 19 24 16 3 7 13 4

12 29 3 4 33 1 2 6 13 3

25 30 13 25 16 30 12 10 14 2

20 30 4 2 6 12 31 10 3 3

8 24 8 8 4 8 26 12 12 15

2 8 8 20 15 6 14 21 3 8

10 11 10 23 10 14 13 35 22 17

4 10 4 0 20 53 19 5 12 8

11 20 4 13 17 12 11 15 10 2

Presenting the raw data is often infeasible Presenting the distribution is still be overwhelming Good to have only a few numbers to summarize a dataset

55 - 60

50 - 55

45 - 50

40 - 45

35 - 40

30 - 35

25 - 30

20 - 25

15 - 20

10 - 15

5 - 10

0 - 5

30

20

10

0

Depression

Freq

uenc

y

Dep

ress

ion

leve

ls o

f 10

0 ca

ncer

pat

ient

s

Page 9: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

The Mean (平均數)

Easy to calculate and handled Use all data at hand Sensitive to aberrant values

54321 ,,,, XXXXX

54321 XXXXX

XXi i

5/)( 5

1

5

1i iX

Sample data

Sum

Mean

6, 2, 13, 17, 22

6+2+13+17+22 = 60

60/5 = 12

9006 XNew comer:

160XNow,

The average value

Use of the Mean – Example

1999/2000 Household Expenditure Survey

~ Census & Statistics Department

The Median(中位數)

2321191715131197531-1

1

0

Sample data

Rank

Median

6, 2, 13, 17, 22

13

2, 6, 13, 17, 22

12.5 13.5

50%

The middle value, i.e. there are 50% of values

below the median

Values below 13 = 2/5 = 40% ?

23211917151311108642-0

1

0900

The Median – Revisit of New Comer

50%

Sample data

Rank

Median

6, 2, 13, 17, 22

13

2, 6, 13, 17, 22

6, 2, 13, 17, 22, 900

2, 6, 13, 17, 22, 900

(13+17)/2 = 15

Page 10: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

Use of the Median– Example2001 Population Census The Mode (眾數)

The most popular value

Sample data

Frequency

Mode

6, 2, 13, 17, 22

All or None

6, 2, 13, 17, 22, 22

22

1, 1, 1, 1, 11, 1, 1, 1, 2

6, 2, 13, 2, 17, 22, 22

1, 2, 1, 1, 2

2 and 22

The Mode – Another Example

May not be unique Impractical for continuous data

Sample data

Frequency

Mode All or None

Height of students in the class

But the least popularly used descriptive statistics

The most popular value

Likely 1 for all

Use of the Mode – Example

Guffaws (狂笑)

Chuckle (輕笑)

Giggle (傻笑)

A telephone survey, conducted during 11-14 March 2004 in US, of 1013 subjects

Cackle (格格地笑)

Snort (闭嘴低声咯咯轻笑)

8%

31%

41%

3%

6%

Total Men Women

49%

41%

Mode Chuckle Chuckle Giggle~ www.polaroid.com

Page 11: Basic Concepts and Measures of Location - HKU …nursing.hku.hk/biostats/lab_ft/lecture1.pdfBasic Concepts and Measures of Location Daniel Y T.F.ong (Email: dytfong@hku.hk) NURS4302

Measures of Location

Advantages Disadvantages

Median• Middle value• 中位數

Mean• Average

value• 平均數

Mode• Most popular

value• 眾數

1. “Robust”, i.e. not affected by aberrant values

1. Does not use all the data2. Not easy to manipulate

mathematically

1. Is the “expected” value2. Uses all the data 3. Easy to calculate

1. Not robust to aberrant values

2. Can be difficult to interpret due to aberrant values

1. Can be useful for discrete and categorical measurements

1. Not useful for continuous data2. May not be unique3. Does not use all the data

Q & A

the mean does not use all the data there can be more than one mode in a dataset The median does not use all the data

3. Among the descriptive statistics,

True or False ?