Lec. biostatistics

33
Public Health Methodologies Biostatist ics [email protected]

Transcript of Lec. biostatistics

Page 1: Lec. biostatistics

Public Health Methodologies

Biostatistics

[email protected]

Page 2: Lec. biostatistics

Dr. Riaz A. Bhutto 2

Data• Data is a collection of facts, such as values or

measurements.OR

• Data is information that has been translated into a form that is more convenient to move or process.

OR• Data are any facts, numbers, or text that can be

processed by a computer.3/3/2012

Page 3: Lec. biostatistics

Dr. Riaz A. Bhutto 3

Statistics

Statistics is the study of the collection, summarizing, organization, analysis, and interpretation of data.

3/3/2012

Page 4: Lec. biostatistics

Dr. Riaz A. Bhutto 4

Vital statistics

Vital statistics is collecting, summarizing, organizing, analysis, presentation, and interpretation of data related to vital events of life as births, deaths,

marriages, divorces, health & diseases.

3/3/2012

Page 5: Lec. biostatistics

Dr. Riaz A. Bhutto 5

Biostatistics

Biostatistics is the application of statistical techniques to scientific research in health-related fields, including medicine, biology, and public health.

3/3/2012

Page 6: Lec. biostatistics

Dr. Riaz A. Bhutto 6

Descriptive Statistics

The term descriptive statistics refers to statistics that are used to describe. When using descriptive statistics, every member of a group or population is measured. A good example of descriptive statistics is the Census, in which all members of a population are counted.

3/3/2012

Page 7: Lec. biostatistics

Dr. Riaz A. Bhutto 7

Inferential or Analytical Statistics

Inferential statistics are used to draw conclusions and make predictions based on the analysis of numeric data.

3/3/2012

Page 8: Lec. biostatistics

Dr. Riaz A. Bhutto 8

Primary & Secondary Data

• Raw or Primary data: when data collected having lot of unnecessary, irrelevant & un wanted information

• Treated or Secondary data: when we treat & remove this unnecessary, irrelevant & un wanted information

• Cooked data: when data collected not genuinely and is false and fictitious

3/3/2012

Page 9: Lec. biostatistics

Dr. Riaz A. Bhutto 9

Ungrouped & Grouped Data

• Ungrouped data: when data presented or observed individually. For example if we observed no. of children in 6 families

2, 4, 6, 4, 6, 4

• Grouped data: when we grouped the identical data by frequency. For example above data of children in 6 families can be grouped as:

No. of children Families 2 1

4 3 6 2

or alternatively we can make classes:

No. of children Frequency 2 - 4 4

5 - 7 2 3/3/2012

Page 10: Lec. biostatistics

Dr. Riaz A. Bhutto 10

Variable

A variable is something that can be changed, such as a characteristic or value. For example age, height, weight, blood pressure etc

3/3/2012

Page 11: Lec. biostatistics

Dr. Riaz A. Bhutto 11

Types of Variable Independent variable: is typically the

variable representing the value being manipulated or changed. For example smoking

Dependent variable: is the observed result of the independent variable being manipulated. For example ca of lung

Confounding variable: is associated with both exposure and disease. For example age is factor for many events

3/3/2012

Page 12: Lec. biostatistics

Dr. Riaz A. Bhutto 12

Categories of DATA

9/3/2012

Page 13: Lec. biostatistics

Dr. Riaz A. Bhutto 13

Quantitative or Numerical data

This data is used to describe a type of information that can be counted or expressed numerically (numbers)

2, 4 , 6, 8.5, 10.5

9/3/2012

Page 14: Lec. biostatistics

Dr. Riaz A. Bhutto 14

Quantitative or Numerical data (cont.) This data is of two types1. Discrete Data: it is in whole numbers or values and

has no fraction. For example Number of children in a family = 4 Number of patients in hospital = 320 2. Continuous Data (Infinite Number): measured on a

continuous scale. It can be in fraction. For example Height of a person = 5 feet 6 inches 5”.6’ Temperature = 92.3 °F

9/3/2012

Page 15: Lec. biostatistics

Dr. Riaz A. Bhutto 15

Qualitative or Categorical dataThis is non numerical data as Male/Female, Short/TallThis is of two types1. Nominal Data: it has series of unordered categories ( one can not √ more than one at a time) For example

Sex = Male/Female Blood group = O/A/B/AB 2. Ordinal or Ranked Data: that has distinct ordered/ranked categories.

For example

Measurement of height can be = Short / Medium / Tall Degree of pain can be = None / Mild /Moderate / Severe

9/3/2012

Page 16: Lec. biostatistics

Dr. Riaz A. Bhutto 169/3/2012

Measures of Central Tendency & Variation (Dispersion)

Page 17: Lec. biostatistics

Dr. Riaz A. Bhutto 17

Measures of Central Tendency

are quantitative indices that describe the center of a distribution of data. These are

• Mean • Median (Three M M M)• Mode

9/3/2012

Page 18: Lec. biostatistics

Dr. Riaz A. Bhutto 18

Mean Mean or arithmetic mean is also called AVERAGE and

only calculated for numerical data. For example• What average age of children in years? Children 1 2 3 4 5 6 7 Age 6 4 4 3 2 4 6

--

Formula X = ∑ X ___

n

Mean = 6 4 4 3 2 4 5 = 28 = 4 years 7 79/3/2012

Page 19: Lec. biostatistics

Dr. Riaz A. Bhutto 19

Median

• It is central most value. For example what is central value in 2, 3, 4, 4, 4, 5, 6 data?

• If we divide data in two equal groups 2, 3, 4, 4, 4, 5, 6 hence 4 is the central most value

• Formula to calculate central value is: Median = n + 1 (here n is the total no. of value)

2 Median = (n + 1)/2 = 7 + 1 = 8/2 = 4

9/3/2012

Page 20: Lec. biostatistics

Dr. Riaz A. Bhutto 20

Mode

• is the most frequently (repeated) occurring value in set of observations. Example

• No mode Raw data: 10.3 4.9 8.9 11.7 6.3 7.7 • One mode Raw data: 2 3 4 4 4 5 6• More than 1 mode Raw data: 21 28 28 41 43 43

9/3/2012

Page 21: Lec. biostatistics

Dr. Riaz A. Bhutto 21

Measures of Dispersion

quantitative indices that describe the spread of a data set. These are

• Range• Mean deviation• Variance• Standard deviation• Coefficient of variation• Percentile

9/3/2012

Page 22: Lec. biostatistics

Dr. Riaz A. Bhutto 22

Range

It is difference between highest and lowest values in a data series. For example:

the ages (in Years) of 10 children are 2, 6, 8, 10, 11, 14, 1, 6, 9, 15 here the range of age will be 15 – 1 = 14 years

9/3/2012

Page 23: Lec. biostatistics

Dr. Riaz A. Bhutto 23

Mean Deviation This is average deviation of all observation

from the mean -

Mean Deviation = ∑ І X – X І _______ _ n here X = Value, X = Mean n = Total no. of value

9/3/2012

Page 24: Lec. biostatistics

Dr. Riaz A. Bhutto 24

Mean Deviation ExampleA student took 5 exams in a class and had scores of 92, 75, 95, 90, and 98. Find the mean deviation for her test

scores.

9/3/2012

• First step find the mean.

_ x = ∑ x ___

n = 92+75+95+90+98

5

= 450 5 = 90

Page 25: Lec. biostatistics

Dr. Riaz A. Bhutto 259/3/2012

Values = X ˉ Mean = X

Deviation from ˉ Mean = X - X

Absolute value ofDeviationIgnoring + signs

92 90 2 2

75 90 -15 15

95 90 5 5

90 90 0 0

98 90 8 8

Total = 450

n = 5 Mean Deviation = _ ∑І X – X І _______ = 30/5 n

--∑ X - X = 30

= 6

Average deviation from mean is 6

• 2nd step find mean deviation

Page 26: Lec. biostatistics

Dr. Riaz A. Bhutto 26

Variance

• It is measure of variability which takes into account the difference between each observation and mean.

• The variance is the sum of the squared deviations from the mean divided by the number of values in the series minus 1.

• Sample variance is s² and population variance

is σ²9/3/2012

Page 27: Lec. biostatistics

Dr. Riaz A. Bhutto 27

Variance (cont.)

• The Variance is defined as:• The average of the squared differences from the

Mean.• To calculate the variance follow these steps:• Work out the Mean (the simple average of the

numbers)• Then for each number: subtract the Mean and square

the result (the squared difference)• Then work out the average of those squared

differences. 9/3/2012

Page 28: Lec. biostatistics

Dr. Riaz A. Bhutto 289/3/2012

Step 1

Step 2 Step 3

Step 4

Values = X ˉ Mean = XDeviation from ˉ Mean = X - X

ˉ ( X – X)²

2 4 -2 45 4 1 14 4 0 06 4 2 43 4 -1 1

Step 6 s² _ ∑ ( X – X )² = _______ = 10/5 n

∑ = 10 Step 5

= 2 S²= 2 persons²

Example: House hold size of 5 families was recorded as following: 2, 5, 4, 6, 3 Calculate variance for above data.

Page 29: Lec. biostatistics

Dr. Riaz A. Bhutto 29

Standard Deviation

• The Standard Deviation is a measure of how spread out numbers are.

• Its symbol is σ (the greek letter sigma)• The formula is easy: it is the square root of

the Variance.i-e s = √ s²• SD is most useful measure of dispersion s = √ (x - x²) n (if n > 30)

s = √ (x - x²) n-1 (if n < 30)

9/3/2012

Page 30: Lec. biostatistics

Dr. Riaz A. Bhutto 30

ExampleYou and your friends have just measured the heights of your

dogs (in millimeters):

• The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.

• Find out the Mean, the Variance, and the Standard Deviation.9/3/2012

:

                                                                                                                                                  

Page 31: Lec. biostatistics

Dr. Riaz A. Bhutto 319/3/2012

Your first step is to find the Mean:Answer:

Mean = 600 + 470 + 170 + 430 + 300 = 1970 = 394 5 5

so the mean (average) height is 394 mm. Let's plot this on the chart:

Page 32: Lec. biostatistics

329/3/2012 So, the Variance is 21,704.Dr. Riaz A. Bhutto

Now, we calculate each dogs difference from the Mean:

To calculate the Variance, take each difference, square it, and then average the result:

Page 33: Lec. biostatistics

Dr. Riaz A. Bhutto 33

And the Standard Deviation is just the square root of Variance, so:Standard Deviation: σ = √21,704 = 147.32... = 147 (to the nearest mm)

And the good thing about the Standard Deviation is that it is useful. Now we can

show which heights are within one Standard Deviation (147mm) of the Mean:

• So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra large or extra small.

9/3/2012