Statistics introduction

Post on 13-Apr-2017

56 views 3 download

Transcript of Statistics introduction

STATISTICS IN MEDICINE

What is statistics ?

`Statistics involves

data or informationeg.Information regarding a countryInformation about weatherInformation on health status Information about a patient Information on normal subjects

WHO

Unesco

CDC

What is statistics ?Statistics is the science dealing with

analysis, presentationcollection,

& interpretation of data

When is statistics useful in day-to-day life

weather reportin a shopin an examinationin match makingin lottery

Weather report rain will be

expected in the central area

rest of the country most likely will be dry

Likelihood, chance

ColomboWeather

In a shop

what will you buy?

how to select?which one is

good?how much is

worth?

Previous knowledge

Banana

In an examinationwhat is the pass

mark?who has sound

knowledge & who has poor knowledge?

half knowledge, should you pass?

performance, cut off point

In match making in matching two

people previous known

details details of the family

based on these a prediction is made

does it always work ???

prediction

In lotterychoice between

number to win and number to lose

previous experience

guessworkprediction

prediction

SummaryLikelihoodChancePrevious knowledgeCut off pointGuessworkPredictionProbability

Statistics

When is statistics useful in medicine ?

decision making on diagnosis eg. a patient with headache

• does all signs and symptoms fit into already known diseases

• previous knowledge, experience from previous patients

• guesswork• does it work ???

When is statistics useful in medicine ?

decision making on treatment eg. a patient with headache

• which drug to be given among A,B,C analgesics

• previous knowledge, experience of using them

• guesswork• does it work ???

When is statistics useful in medicine ?

acquiring new knowledge eg. how does blood flow against

gravity• planning a study• doing the study• arriving at conclusions

When is statistics useful in medicine ?

surveys of diseases in a population eg. how far malaria has spread

• planning a study• doing the study• arriving at conclusions

Summary

Statistics is useful in medicine in different areas

Research studies use statistics in data analysis

VariablesTask: write 3 variables and 1 constant

found in the human body eg. Height as a variable

almost all biological features show variation

it is extremely difficult to find a feature which does not vary???

Variables

What is basis of this ‘variation’ ?

fundamentally genetical but environmental factors are

always important

Variablesheightweightblood pressurepulse ratebody temperaturesize of a swellingsocial status: income

Variables

response to a question Do you like to treat a patient with

AIDS? Y / N / undecided Do you agree with what I say in

this lecture? Y/N/?Attitudes - Opinions - What we feel

Types of variables

a) nominalb) ordinalc) intervald) ratio

Nominal variables

Qualitative classificationDistinct categoriesNo ranking eg: gender, race, color, city

males

females

Ordinal variables

Qualitative classificationCategories have order or rankeg. Socioeconomic status

• Upper• Middle (upper & lower)• Lower

Interval & ratio variables

Quantitative classificationInterval variable has no absolute

zero eg. T°C

Ratio variable has absolute zero eg. Kelvin temp, time, space

Methods of collecting data

questionnaireinterviewmeasurement using special

instrumentBHT studies (Bed Head Ticket)Postal surveys

Different types of sampling

Simple random sampleSystematic samplingStratified samplingCluster sampling

Population Sample

Simple random sampleeach subject in the population has an equal chance of getting selected

to the sample each subject is given a number subjects selected using random numbers use random tables or computer generated

random numbers

random table20 17 42 28 31 17 59 66 38 61 03 51 10 55 92 52 44 25 88

74 49 04 03 08 33 53 70 11 54 48 94 60 49 57 38 65 15 40

Non-random sample this is a biased sample certain subjects have more probability of getting selected

to the sample

certain situations randomisation is not possible either due to practical difficulties or difficulty in finding subjects

Statistical concepts

In order to arrive at conclusions data are analysed

Conclusions are based on concepts just like geometric theorems

Descriptive statistics Background information

Inferential statistics Hypothesis testing

Central TendencyCentral Tendency

A single value representing a datasetA single value representing a dataset eg. pulse rateeg. pulse rate

Measures of Central Measures of Central TendencyTendency

MeanMean (x) (x) averageaverage x = x = x x

nn ModeMode

commonest value or the most frequent valuecommonest value or the most frequent value MedianMedian

central valuecentral value

--

exampleexample

In a dataset In a dataset 1 2 2 3 3 3 4 4 5 1 2 2 3 3 3 4 4 5

mean = 3mean = 3 mode = 3mode = 3 median = 3median = 3

VariationVariation

Central value gives only the Central value gives only the representative figurerepresentative figure

variation of data set is not shownvariation of data set is not shown

Measures of VariationMeasures of Variation

RangeRange from minimum to maximumfrom minimum to maximum

Charts or graphsCharts or graphs bar chartbar chart histogramhistogram

Bar chartBar chart

05

101520253035

frequency

males females

Patients with headache

generally used for categorical variables

Pie chartPie chartPatients with headache

malesfemales

Multiple bar chartMultiple bar chart

0

50

100

150

200

250

average income

1986 1987 1988

average income in 3 years

malefemale

Histogram Histogram

Age of employee

38.037.0

36.035.0

34.033.0

32.031.0

30.029.0

28.027.0

26.025.0

24.023.0

50

40

30

20

10

0

Std. Dev = 3.35 Mean = 29.4N = 292.00

for continuous variables

Frequency distribution curve Frequency distribution curve

Standard deviation (SD)Standard deviation (SD)

This is an accurate measure of This is an accurate measure of variabilityvariability

Calculated for continuous dataCalculated for continuous data Based on deviations of data from the Based on deviations of data from the

meanmean

Standard deviation (SD)Standard deviation (SD) If the data set is If the data set is

1, 2, 2, 3, 3, 3, 4, 4, 51, 2, 2, 3, 3, 3, 4, 4, 5 mean will be 3mean will be 3 deviations are calculateddeviations are calculated

(3-1), (3-2), (3-2), (3-3), (3-3), (3-4), (3-1), (3-2), (3-2), (3-3), (3-3), (3-4), (3-4), (3-5)(3-4), (3-5)2, 1, 1, 0, 0, 0, -1, -1, -22, 1, 1, 0, 0, 0, -1, -1, -2

deviations are squareddeviations are squared4, 1, 1, 0, 0, 0, 1, 1, 44, 1, 1, 0, 0, 0, 1, 1, 4

mean of squared deviation is calculated= 12/9mean of squared deviation is calculated= 12/9 square root is taken= root of 12/9square root is taken= root of 12/9

Standard deviation (SD)Standard deviation (SD) datadata deviationsdeviations square of deviationssquare of deviations 11 3-13-1 22 44 22 3-23-2 11 11 22 3-23-2 11 11 33 3-33-3 00 00 33 3-33-3 00 00 33 3-33-3 00 00 44 3-43-4 -1-1 11 44 3-43-4 -1-1 11 5 5 3-53-5 -2-2 44

12/9

variance = 12/9SD = 12/9

Standard deviation (SD)Standard deviation (SD) In the previous example of 292 subjectsIn the previous example of 292 subjects

mean mean = 29.41 yrs= 29.41 yrs SD SD = 3.35 yrs= 3.35 yrs minimum = 23.00 yrsminimum = 23.00 yrs maximum= 37.83 yrsmaximum= 37.83 yrs (n = 292)(n = 292)

Coefficient of variation Coefficient of variation (COV)(COV)

SDSDCOV = ---------COV = --------- X 100 %X 100 %meanmean

COV has no unitsCOV has no units Variations between two variables could be compared Variations between two variables could be compared

using COV (eg. Blood pressure and pulse rate)using COV (eg. Blood pressure and pulse rate)

Normal distributionNormal distribution

generally continuous variables in the human generally continuous variables in the human body such as height and weight has a definite body such as height and weight has a definite patternpattern

It is a symmetrical, an inverted bell shaped curveIt is a symmetrical, an inverted bell shaped curve mean, mode and median are similarmean, mode and median are similar this type of distribution is called this type of distribution is called “normal “normal

distribution”distribution”

Normal distributionNormal distribution

VAR00001

9.08.07.06.05.04.03.02.01.0

12

10

8

6

4

2

0

Std. Dev = 2.02

Mean = 5.0

N = 50.00

Normal distributionNormal distribution

VAR00001

9.08.07.06.05.04.03.02.01.0

12

10

8

6

4

2

0

Std. Dev = 2.02

Mean = 5.0

N = 50.00

•symmetrical•bell-shaped•mean, mode, median similar•also called a Gaussian curve

Normal distributionNormal distribution

mean

mode

median