1 Descriptive Studies
-
Upload
muditsjoshi -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Descriptive Studies
-
8/8/2019 1 Descriptive Studies
1/65
Descriptive Studies
-
8/8/2019 1 Descriptive Studies
2/65
Statistical methods fall into two broad areas:Descriptive statistics
Inferential statistics.
-
8/8/2019 1 Descriptive Studies
3/65
Descriptive statisticsDescriptive statistics merely describe,
organize, or summarize data; they referonly to the actual data available.
Examples include the mean blood pressure
of a group of patients and the success rateof a surgical procedure.
-
8/8/2019 1 Descriptive Studies
4/65
Inferential statisticsInferential statistics involve making
inferences that go beyond the actual data.
They usually involve inductive reasoning(i.e., generalizing to a population after
having observed only a sample).Examples include the mean blood pressure
of all Americans and the expected successrate of a surgical procedure in patients
who have not yet undergone theoperation.
-
8/8/2019 1 Descriptive Studies
5/65
POPULATIONS, SAMPLES,AND ELEMENTS
-
8/8/2019 1 Descriptive Studies
6/65
A population is the universe about which aninvestigator wishes to draw conclusions; itneed not consist of people but may be apopulation of measurements.
Strictly speaking, if an investigator wants todraw conclusions about the blood pressureof Americans, the population consists ofthe blood pressure measurements, not the
Americans themselves.
-
8/8/2019 1 Descriptive Studies
7/65
A sample is a subset of the populationthe partthat is actually being observed or studied.
Because researchers rarely can study wholepopulations, inferential statistics are almostalways needed to draw conclusions about a
population when only a sample has actuallybeen studied.
A single observationsuch as one person'sblood pressureis an element, denoted by X.
The number of elements in a population isdenoted by N, and the number of elements ina sample by n.
A population therefore consists of all theelements from X to XN, and a sample of n of
these N elements .
-
8/8/2019 1 Descriptive Studies
8/65
Most samples used in biomedical research areprobability samples samples in which theresearcher can specify the probability of anyone element in the population being included.
For example, if someone is picking a sample of 4
playing cards at random from a pack of 52cards, the probability that any one card will beincluded is 4/52.
Probability samples permit the use of inferentialstatistics, whereas non-probability samplesallow only descriptive statistics to be used.
There are four basic kinds of probabilitysamples:Simple random samplesStratified random samplesCluster samples, andSystematic samples.
-
8/8/2019 1 Descriptive Studies
9/65
Simple random samplesThe simple random sample is the simplest
kind of probability sample.
It is drawn in such a way that every elementin the population has an equal probability
of being included, such as in the playingcard example above.
A random sample is defined by the methodof drawing the sample, not by the
outcome.If four hearts were picked out of the pack of
cards, this does not in itself mean that thesample is not random.
-
8/8/2019 1 Descriptive Studies
10/65
A sample is representative if it closelyresembles the population from which it isdrawn.
All types of random samples tend to be
representative, but they cannot guaranteerepresentativeness.
Nonrepresentative samples can causeserious problems. (Four hearts are clearly
not representative of all the cards in apack.)
-
8/8/2019 1 Descriptive Studies
11/65
A sample or a result demonstrates bias if itconsistently errs in a particular direction.
For example, in drawing a sample of 10 froma population consisting of 500 white
people and 500 black people, a samplingmethod that consistently produces morethan 5 white people would be biased.
Biased samples are therefore
unrepresentative, and true randomizationis proof against bias.
-
8/8/2019 1 Descriptive Studies
12/65
Stratified random samplesIn a stratified random sample, the
population is first divided into relativelyinternally homogeneous groups, or strata,from which random samples are then
drawn.This stratification results in greater
representativeness.
For example, instead of drawing one sample
of 10 people from a total populationconsisting of 500 white and 500 blackpeople, one random sample of 5 could betaken from each ethnic group (or stratum)
separately, thus guaranteeing the racial
-
8/8/2019 1 Descriptive Studies
13/65
Cluster samplesCluster samples may be used when it is too
expensive or laborious to draw a simplerandom or stratified random sample.
For example, in a survey of 100 medical
students in the United States, aninvestigator might start by selecting arandom set or - groups or "clusters"suchas a random set of 10 U.S. medical schoolsand then interviewing all the students in
those 10 schools.This method is much more economical and
practical than trying to take a randomsample of 100 directly from the population
of all U.S. medical students.
-
8/8/2019 1 Descriptive Studies
14/65
Systematic samplesThese involve selecting elements in a
systematic waysuch as every fifthpatient admitted to a hospital or everythird baby born in a given area.
This type of sampling usually provides theequivalent of a simple random samplewithout actually using randomization.
-
8/8/2019 1 Descriptive Studies
15/65
Sampling problems are commonin clinical research.
For example, if a researcher advertises in anewspaper to recruit people suffering froma particular problemwhether it is acne,diabetes, or depressionthe people whorespond form a self-selected sample, whichis probably not representative of thePopulation of all people with this problem.
Similarly, if a dermatologist reports on theresults of a new treatment for acne which
he has been using with his patients, thesample may not be representative of allpeople with acne, as it is likely that onlypeople with more severe acne (or withgood insurance coverage!) seek treatmentfrom a dermatologist.
-
8/8/2019 1 Descriptive Studies
16/65
In any case, his practice is probably limitedto people in a particular geographic,climatic, and possibly ethnic area.
In this case, although his study may be valid
as far as his or her patients are concerned(this is called internal validity), it may notbe valid to generalize his findings topeople with acne in general (so the study
may lack external validity).
-
8/8/2019 1 Descriptive Studies
17/65
PROBABILITYProbability of an event is denoted by p.Probabilities are usually expressed as
decimal fractions, percentages, and mustlie between zero (zero probability) and one
(absolute certainty).The probability of an event cannot be
negative.
The probability of an event can also be
expressed as a ratio of the number oflikely outcomes to the number of possibleoutcomes.
-
8/8/2019 1 Descriptive Studies
18/65
For example, if a fair coin was tossed an infinitenumber of times, heads would appear on 50%of the tosses, therefore, the probability ofheads, or p (heads), is 0.50.
If a random sample of 10 people was drawn an
infinite number of times from a population of100 people, each person would be included inthe sample 10% of the time; therefore, p(being included in any one sample) is 0.10.
The probability of an event not occurring isequal to one minus the probability that it willoccur; this is denoted by q.
In the above example, the probability of any oneperson not being included in any one sample,
q, is therefore (1 - p) = (1 - 0.10) = 0.90.
-
8/8/2019 1 Descriptive Studies
19/65
There are three main method of calculatingprobability:The ADDITION rule
The MULTIPLICATION rule
The BINOMIAL DISTRIBUTION
-
8/8/2019 1 Descriptive Studies
20/65
-
8/8/2019 1 Descriptive Studies
21/65
Addition rule
Addition-rule of probability states that theprobability of any one or several particularevents occurring is equal to the sum oftheir individual probabilities, provided the
events are mutually exclusive; i.e., theycannot both happen.
Because the probability of picking a heartcard from a deck of cards is 0.25, this rule
states that the probability of picking a cardthat is either a diamond or heart is 0.25 +0.25 = 0.50. Because no card can be botha heart and diamond, these events meet
the requirement of mutual exclusiveness.
-
8/8/2019 1 Descriptive Studies
22/65
Multiplication rule
The multiplication rule of probability statesthat the probability of two or morestatistically independent events alloccurring is equal to the product of their
individual probabilities.If the lifetime probability of a person
developing cancer is 0.25, and the lifetimeprobability of developing schizophrenia is
0.01, the lifetime probability that a personmight have both cancer and schizophreniais 0.25 X0.01 = .0025, provided that thetwo illnesses are independentin other
words, that having one illness neitherincreases nor decreases the risk of havin
-
8/8/2019 1 Descriptive Studies
23/65
Binomial Distribution
The probability that a mutually exclusiveindependent events will occur can bedetermined by the use of binomialdistribution.
A binomial distribution is one in which thereare only two possibilities, such as yes / no,male/female, healthy/sick.
If an experiment has exactly two outcomes,
one of which is generally termed success,the binomial distribution gives theprobability of obtaining an exact number ofsuccesses in a series of independent
trials.
-
8/8/2019 1 Descriptive Studies
24/65
A typical use of binomial distribution is ingenetic counseling.
Inheritance of a disorder such asPhenylketonuria follows a binomial
distribution : there are two possible events,inheriting the disease and not inheritingthe disease; and the possibilities areindependent (if a child in a family inherits
the disorder, this does not affect thechance of another child inheriting it).
-
8/8/2019 1 Descriptive Studies
25/65
A physician could therefore use thebinomial distribution to inform the couplewho are the carrier of the disease howprobable it is that some specific
combination of events might occur- suchas the probability that if they are to havetwo children , neither will inherit thedisease.
-
8/8/2019 1 Descriptive Studies
26/65
Types of Data
-
8/8/2019 1 Descriptive Studies
27/65
Types of Data
The choice of an appropriate statisticaltechnique depends upon the type of datain question.
Data forms one of the four scales of
measurement:Nominal
Ordinal
Interval
Ratio
-
8/8/2019 1 Descriptive Studies
28/65
Nominal scale data
Nominal scale data are divided intoqualitative categories or groups such asmale/female, urban/rural, or red/green.
There is no implication of order or ratio.
Nominal data that fall under only two groupsare called dichotomous data.
-
8/8/2019 1 Descriptive Studies
29/65
Ordinal scale data
Ordinal scale data can be placed inmeaningful order; e.g. ranking of students.
However, there is no information about thesize of the interval; no conclusion can be
drawn about whether the differencebetween the first and second students issame as that between second and third.
-
8/8/2019 1 Descriptive Studies
30/65
Interval scale data
They are like ordinal data in that they can beplaced in a meaningful order.
In addition, they have meaningful intervalsbetween items, which are usually
measured quantities. E.g. temperaturescale.
However, because interval scales do nothave an absolute zero, ratios of scores are
not meaningful. E.g. 100 C is not twice ashot as 50 C.
-
8/8/2019 1 Descriptive Studies
31/65
Ratio scale data
A ratio scale has the same properties asinterval scale, however meaningful ratiosexist as there is an absolute zero.
Most biomedical variables form a ratio scale:
weights in pounds, time in seconds ordays, blood pressure in mm of Hg, pulserate in beats per minute are all ratio data.
A pulse rate of zero indicates absolute lack
of pulse. Therefore it is correct to say thata pulse rate of 120 BPM is twice that of 60BPM.
-
8/8/2019 1 Descriptive Studies
32/65
Discrete variables
Discrete variables can take only certainvalues and nothing in between.
For example, the number of patients in ahospital census may be 200 or 220, but it
cannot be in between these two; thenumber of syringes used in a clinic on anygiven day may increase or decrease onlyby units of one.
-
8/8/2019 1 Descriptive Studies
33/65
Continuous variables
Continuous variables may take any value(typically between certain limits).
Most biomedical variables are continuous(e.g., a patient's weight, height, age, and
blood pressure).However, the process of measuring or
reporting continuous variables will reducethem to a discrete variable.
Blood pressure may be reported to thenearest whole millimeter of mercury,weight to the nearest pound, and age tothe nearest year.
-
8/8/2019 1 Descriptive Studies
34/65
FREQUENCY DISTRIBUTIONS
-
8/8/2019 1 Descriptive Studies
35/65
A set of unorganized data is difficult to digestand understand.
Consider a study of the serum cholesterol levelsof a sample of 200 men: a list of the 200 levels
would be of little value in itself.A simple first way of organizing the data is to list
all the possible values between the highestand the lowest in order, recording thefrequency (f) with which each score occurs.
This forms a frequency distribution.
If the highest serum cholesterol level were 260mg/dl, and the lowest were 161 mg/dl, thefrequency distribution would be:
-
8/8/2019 1 Descriptive Studies
36/65
G d f
-
8/8/2019 1 Descriptive Studies
37/65
Grouped frequencydistributions
Data can be made more manageable bycreating a grouped frequency distribution.
Individual scores are grouped (between 5and 20 groups are usually appropriate).
Each group of scores encompasses an equalclass interval.
In this example there are 10 groups with aclass interval of 10 (161 to 170, 171 to
180, and so on.
-
8/8/2019 1 Descriptive Studies
38/65
Interval requency f e la t i v e f% e l f
u m u l at iv e f% u m f
251-260 5 2.5 100.0
241-250 13 6.5 97.5
231-240 19 9.5 91.0
221-230 18 9.0 81.5
211-220 38 19.0 72.5
201-210 72 36.0 53.5
191-200 14 7.0 17.5
181-190 12 6.0 10.5
171-180 5 2.5 4.5
161-170 4 2.0 2.0
R l ti f
-
8/8/2019 1 Descriptive Studies
39/65
Relative frequencydistributions
A grouped frequency distribution can betransformed into a relative frequencydistribution, which shows the percentage of allthe elements that fall within each classinterval.
The relative frequency of elements in any givenclass interval is found by dividing f, thefrequency (or number of elements) in thatclass interval, by n (the sample size, which in
this case is 200).By multiplying the result by 100, it is converted
into a percentage.
Thus, this distribution shows, for example, that
19% of this sample had serum cholesterol
-
8/8/2019 1 Descriptive Studies
40/65
Interval requency f e la t i v e f% e l f
u m u l at iv e f% u m f
251-260 5 2.5 100.0
241-250 13 6.5 97.5
231-240 19 9.5 91.0
221-230 18 9.0 81.5
211-220 38 19.0 72.5
201-210 72 36.0 53.5
191-200 14 7.0 17.5
181-190 12 6.0 10.5
171-180 5 2.5 4.5
161-170 4 2.0 2.0
C l ti f
-
8/8/2019 1 Descriptive Studies
41/65
Cumulative frequencydistributions
This is also expressed as a percentage; itshows the percentage of elements lyingwithin and below each class interval.
Although a group may be called the 211-220group, this group actually includes therange of scores that lie from 210.5 up toand including 220.5so these figures arethe exact upper and lower limits of thegroup.
The relative frequency column shows that2% of the distribution lies in the 161-170group and 2.5% lies in the 171-180 group;therefore, a total of 4.5% of thedistribution lies at or below a score of180.5, as shown by the cumulative
-
8/8/2019 1 Descriptive Studies
42/65
A further 6% of the distribution lies in the181-190 group; therefore, a total of (2 +2.5 + 6) = 10.5% lies at or below a scoreof 190.5.
A man with a serum cholesterol level of 190mg/dl can be told that roughly 10% of thissample had lower levels than his, andapproximately 90% had scores above his.
The cumulative frequency of the highestgroup (251-260) must be 100, showingthat 100% of the distribution lies at orbelow a score of 260.5.
-
8/8/2019 1 Descriptive Studies
43/65
Interval requency f e la t i v e f% e l f
u m u l at iv e f% u m f
251-260 5 2.5 100.0
241-250 13 6.5 97.5
231-240 19 9.5 91.0
221-230 18 9.0 81.5
211-220 38 19.0 72.5
201-210 72 36.0 53.5
191-200 14 7.0 17.5
181-190 12 6.0 10.5
171-180 5 2.5 4.5
161-170 4 2.0 2.0
-
8/8/2019 1 Descriptive Studies
44/65
Presentation of StatisticalData
-
8/8/2019 1 Descriptive Studies
45/65
Statistical data, once collected, must bearranged purposively, in order to bring outthe important points clearly and strikingly.
Therefore the manner in which statistical
data is presented is of utmost importance.There are several methods of presenting
data - tables, charts, diagrams, graphs,pictures and special curves.
-
8/8/2019 1 Descriptive Studies
46/65
Methods of presenting data
TablesDiagrams
Bar Charts
Histogram
Frequency polygonPie charts
Pictogram
-
8/8/2019 1 Descriptive Studies
47/65
Bar charts
To display nominal scale data, a bar graph istypically used. For example, if a group of100 men had a mean serum cholesterolvalue of 212 mg/dl, and a group of 100
women had a mean value of 185 mg/dl,the means of these two groups could bepresented as a bar graph.
Bar graphs are identical to frequency
histograms, except that each rectangle onthe graph is clearly separated from theothers by a space, showing that the dataform separate categories (such as maleand female) rather than continuous
rou s.
-
8/8/2019 1 Descriptive Studies
48/65
Bar chart
-
8/8/2019 1 Descriptive Studies
49/65
-
8/8/2019 1 Descriptive Studies
50/65
Interval requency f e la t i v e f% e l f u m u l a t i v e f% u m f251-260 5 2.5 100.0
241-250 13 6.5 97.5
231-240 19 9.5 91.0
221-230 18 9.0 81.5
211-220 38 19.0 72.5
201-210 72 36.0 53.5
191-200 14 7.0 17.5
181-190 12 6.0 10.5
171-180 5 2.5 4.5
161-170 4 2.0 2.0
-
8/8/2019 1 Descriptive Studies
51/65
Histogram
-
8/8/2019 1 Descriptive Studies
52/65
Frequency polygon
For ratio or interval scale data, a frequencydistribution may be drawn as a frequencypolygon, in which the midpoints of eachclass interval are joined by straight lines.
-
8/8/2019 1 Descriptive Studies
53/65
-
8/8/2019 1 Descriptive Studies
54/65
A cumulative frequency distribution can alsobe presented graphically as a polygon.
Cumulative frequency polygons typicallyform a characteristic S-shaped curve
known as an ogive.
-
8/8/2019 1 Descriptive Studies
55/65
-
8/8/2019 1 Descriptive Studies
56/65
Pie chart
Instead of comparing the length of a bar, theareas of segments of a circle arecompared.
The area of each segment depends upon the
angle.It is often necessary to indicate the
percentages in the segments as it may notbe easy to compare the areas of segments.
-
8/8/2019 1 Descriptive Studies
57/65
Pie chart
-
8/8/2019 1 Descriptive Studies
58/65
Pictogram
Pictograms are a popular method ofpresenting data to the layman.
Small pictures or symbols are used topresent the data.
For example, a picture of doctor to represent& population per physician .
Fractions of the picture can be used torepresent numbers smaller than the valueof a whole symbol.
-
8/8/2019 1 Descriptive Studies
59/65
-
8/8/2019 1 Descriptive Studies
60/65
Centiles and other quantiles
The cumulative frequency polygon and thecumulative frequency distribution bothillustrate the concept of centile (orpercentile) rank, which states the
percentage of observations that fall belowany particular score.
In the case of a grouped frequencydistribution, centile ranks state the
percentage of observations that fall withinor below any given class interval.
Centile ranks provide a way of givinginformation about one individual score in
relation to all the other scores in a
-
8/8/2019 1 Descriptive Studies
61/65
Interval requency f e la t i v e f% e l f
u m u l at iv e f% u m f
251-260 5 2.5 100.0
241-250 13 6.5 97.5
231-240 19 9.5 91.0
221-230 18 9.0 81.5
211-220 38 19.0 72.5
201-210 72 36.0 53.5
191-200 14 7.0 17.5
181-190 12 6.0 10.5
171-180 5 2.5 4.5
161-170 4 2.0 2.0
-
8/8/2019 1 Descriptive Studies
62/65
For example, the cumulative frequencycolumn of above table shows that 91% ofthe observations fall below 240.5 mg/dl,which therefore represents the 91st centile
(which can be written as C91 ).A man with a serum cholesterol level of 240
mg/dl lies at the 91st centile-about 9% ofthe scores in the sample are higher than
his.
-
8/8/2019 1 Descriptive Studies
63/65
-
8/8/2019 1 Descriptive Studies
64/65
Centile ranks are widely used in reportingscores on educational tests.
They are one member of a family of valuescalled quantiles, which divide distributions
into a number of equal parts.Centiles divide a distribution into 100 equal
parts.
Other quantiles include quartiles, which
divide the data into 4 parts, and deciles,which divide a distribution into 10 parts.
-
8/8/2019 1 Descriptive Studies
65/65