Experimental Design

72
Experimental Design Nilesh B. Patel, PhD Dept Medical Physiology University of Nairobi Kenya

description

Design of Experiments.. A brief note

Transcript of Experimental Design

  • Experimental Design

    Nilesh B. Patel, PhDDept Medical Physiology

    University of NairobiKenya

  • What is science?

    The systemic study of the structure and behavior of the physical world, involving experimentation and measurement and the development of theories to describe the results of these activities.

    Cambridge International Dictionary of English

  • What is an experiment?

    A test done in order to learn something or to discover whether something works or is true (plausible).

    Experiments are done to prove the hypothesis is false.

    Only falsification is possible with certainty; proving something can never be done with certainty

  • What can be studied with the methods of science?

    1. The capital of Kenya is Nairobi.2. Holland is twice the size of Kenya.3. All humans are mortal.4. All rabbits are grey.5. All ziwats are blue.6. Mentally ill people are possessed by an evil

    spirit that7. Mentally ill people are possessed by an evil

    spirit that cannot be detected by any known means.

    Adapted from: Circadian Physiology. Roberto Refinetti. CRC Press 2000

  • Galileo Galilei(1564-1642)

    Father of modern physics Father of modern

    observational astronomy Father of modern science Pioneered the use of

    quantitative experiments whose results could be analyzed with mathematical precision

    How many of you accept that the sun goes round the earth?How many of you accept that the earth goes round the sun?

  • Types of Experiments

    Experiments are done to test a hypothesis; a prediction.

    Observational Experimental Quasi-experimental

  • Experimental Design

    Whether or not a studys findings are useful or not depends crucially on design.

    No matter how ingenious or important an idea for an experiment might be, if the study is badly designed, its worthless.

    An excellent procedure for determining cause and effect.

    Does night cause day or does day cause night?

  • Three Aims of Research Validity

    Results actually show what it is that you intent them to show

    Reliable Potentially replicable by yourself or anyone else

    Important Subjective

    Research can possess all the above qualities but still be essentially trivial.

    Research cannot be important if its findings are unreliable or invalid.

  • Score or Measurement Consists of a number made up of different

    components1) A true measure of the thing we want to measure2) A measure of other things3) Systemic (non-random) basis: measuring other

    things inadvertently.4) Random (non-systemic) error, which should cancel

    out over large number of observations. We want the score or measurement to consist

    of as much true score, as possible and little of the other factors (validity and reliability).

  • Types of Measurement Nominal Ordinal Interval Ratio

    Is what you measured, a real measure of what you are interested in?

    It is a measure what you are interested in. Random (non-systematic) errors. Non-random errors (systemic) errors.

  • Evidence for the intellectual inferiority of women

    Paul Broca (19th Century) make careful measurement of brain weight and found Caucasian men have larger brains than Caucasian women, who in turn have a larger brain than negros or any other non-Caucasian for that matter.

    Modern brains were supposedly heavier than medieval brains, and French brains were heavier than German brains.

    The brain weights differences were considered to reflect differences in intelligence between these different groups.

  • MWM and disruption of biological clock where the measure is the time to reach the platform.

    Anesthesia and learning.

  • Experimental Design

    This is important so that the results can be interpreted and other causes can be excluded from the most likely conclusion.

    The results have to be valid, reliable and generalizable (important)

  • Simplest Experiment

    MeasurementMeasurementBaseline Treatment

    There is a difference so my treatment has an effect

    Time animal handler effectHabituation

  • The need of control group

  • Lazaro Sapallanzani 1793 Found that bats and owls differ in their ability to

    avoid objects in the dark. With hoods the bats could not avoid the objects. So the bats have better night vision than owls. Covered with transparent hoods; the bats have a

    problem (key control experiment). Blind the bats and they are fine Wax in the ears Pierce invented the ultrasound microphone and

    Griffin walked in with a bat. Association is not causation but can be.

  • TreatmentExperimental

    TreatmentControl

    Measurement

    Measurement

    Experiment with control group

    Great!!! There is a difference so my treatment has an effect.

  • fr

    e

    q

    u

    e

    n

    c

    y

    2cm height 100 cm

    treatmentcontrol

    Population distribution

  • Randomization

  • Random allocation

    TreatmentExperimental

    TreatmentControl

    Measurement

    Measurement

    Post-test/control groupIndependent variable and dependent variable

    Measurement

    MeasurementMeasurement

    MeasurementTreatmentExperimental

    TreatmentControl

  • Between-group Design

    Advantages Simplicity Less chance of practice or fatigue Useful when it is impossible for an individual

    to participate in all experimental conditions Disadvantages

    Expense in terms of time, effort and number

  • Treatment

    Treatment MeasurementNo Treatment

    Measurement MeasurementNo Treatment

    Measurement

    Repeated-Measure Design

  • Repeated-Measures Design

    Advantages Economy Sensitivity

    Disadvantages Carry-over effect The need for the condition to be reversible

  • Treatment

    Treatment MeasurementNo Treatment

    Measurement MeasurementNo Treatment

    Measurement

    Repeated-Measure Design

  • Ra

    n

    d

    o

    m

    a

    l

    l

    o

    c

    a

    t

    i

    o

    n

    TreatmentLevel A

    TreatmentControl

    Measurement

    Measurement

    Measurement

    Measurement Measurement

    MeasurementTreatmentLevel C

    TreatmentLevel B

    Measurement

    Measurement

    Multiple Levels of Independent Variable4 levels

  • Latin Square Design

    To deal with the problem of order effects in within-subject designs.

    The order of the various conditions is counter-balanced so each possible order of condition occurs just once.

  • Example

    Three conditions A, B, and C.

    Avoid systemic order effect

    6 possible combinations of order

    Order of conditions or trials

    Group 1 A B CGroup 2 B C AGroup 3 C A B

    James Brown EDGAR programmewww.jic.bbsrc.ac.uk/services/statistics/edgar.htm

  • Multi-Factorial Design

    Two or three independent variables in the same study

    Not recommended to use more than 2 as the analysis gets quite complicated and conclusion have to be qualified.

    Gender and treatment Day and performance

  • Correlations

    How changes in one variable alters another variable?

    Can have more than one variable and determine the contribution or weight or % that each variable has on the measured value.

    Multi-factorial, multi-dimensions, e.g. physics

    Correlations are not cause and effect

  • Statistical Analysis

  • STATISTICS

    Descriptive Inferential

    Calculation of probabilityTests of significance

    Graphs/tablesNumerical summary

  • Descriptive statistics summarizes the data obtained

    Mode, Median, Mean Range, Quartiles, Squared Sum, Variance,

    Standard deviation

    Inferential statistics is used to reach certain conclusions that can be applied to public health planning or patient care.

    Modern knowledge methods and objective decision making process

  • Measurement

    Types of measurements Nominal Ordinal Interval Ratio

    Is what you measured, a real measure of what you are interested in?

    It is a measure what you are interested in. Random (non-systematic) errors. Non-random errors (systemic) errors.

  • Descriptive Statistics

    Summarize the data

    Measure of central tendency Spread of data

    Standard deviation (sd)Standard error of the mean (sem)

    rannge

    Graphs Tables

    MeanMode

    Median

    Line graphsHistogramsPie charts

  • Measuring the accuracy of the mean

  • The mean is a statistic that predicts the likely score of a person or measurement.

    Most of the time, it will not a number in the data or measurement a hypothetical value.

    Mean number of children is 2.5 per family in the UK.

  • The mean will only be a prefect representation of the data if all of the scores collected are the same.

  • 01

    2

    3

    4

    5

    6

    0 2 4 6 8

    mean

    mean

    The closer the mean is to all the data points the more representative is the mean of the data

  • Distribution of data Means are the same

    Distribution of the scores is different

    Range

  • Distribution measures

    Range Quartile Mean deviation Sum of squared deviation Mean of the sum of squared deviation

    variance Standard deviation Standard error of the mean (sem)

  • Standard deviation and percentage of data

    68%

    96%

    1 sd

    2 sd1.96 sd 95%

  • Population Parameters

    (mu, mean) (sigma, standard

    deviation)

    SampleParameters

    X (mean)

    Sd (standard deviation

    estimate

    Population and sample parameters

  • Standard error of the mean (sem)

    How well does the sample mean represent the population mean?

    This is after all the purpose of research. If we take different samples from the same

    population, the mean for some samples will be similar and for other different.

    But usually we do not take many samples, we take usually 1 sample.

    So how good an estimate of the population mean is it?

  • Population and sample parameters

    X = 10

    X = 10

    X = 12X = 9

    X = 11

    X = 11

    X = 10

  • Why Inferential Statistics?

  • Descriptive statistics does not help to answer research questions.

    Most of modern research is hypothesis driven. There are two common hypothesis

    (predictions) inherent1. Experimental hypothesis (HA)

    The experimental manipulation will have an effect.2. Null hypothesis (H0)

    The experimental manipulation will have no effect

  • So what if there is a difference between the means?

    Two samples from the same population will have different means.

    Samples taken from the extremes of the distribution will have large difference in the means.

    So large differences suggest that the means are from two different populations, but how can we be certain that it not due to sampling from the extreme of the distribution.

    Application of inferential statistic.

  • The question

    what is the probability that the difference between the means is due to chance and not due to my manipulation?

  • Carry out tests of significance The tests of significance give the probability of obtaining

    the difference between the means by chance. Choice of test depends on the type of measurement and

    the experimental design. Whichever test you use you will end up with a number

    the test statistic,e.g. t, z, F. What is the probability of obtaining that value of the test

    statistic for the means and the spread of data that I have?

    On what basis do I or anyone else accept that my manipulation did indeed have an effect?

  • Test Statistic A test statistic (t, F, z) is a statistic with a known

    distribution. e.g. Age and death

    Test statistic = systemic variation / unsystemicvariation.

    The exact form of this equation changes from test to test, but essentially the comparison is the amount of variance created by an experimental effect against the amount of variance due to random factors.

    If the experiment has had an effect then it will create more variance than random factors alone.

    Go back

  • Ronald Aylmer Fisher1890-1962

    "To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of."

  • Fisher tea test

    Is the milk added after or before the tea is added?

    The level of significance p < 0.05. There is a 5% or less probability that the

    difference between the means is due to chance. I am therefore willing to accept with a 95%

    confidence that the difference I obtained was due to my manipulation and not due to chance.

    My results are significant but are they important?

  • Which test to use?

    Sample size Data distribution Homogeneity of variance Independent measurements or repeated

    measurements Comparison between 2 groups Comparison between 3 or more groups Type of data

  • Parametric or Non-parametric

  • Parametric Tests 1. Arithmetic means

    Interval or ratio data2. Variance in the one condition = variance in the

    other conditionHomogeneity of variance (Levenes test)Sphericity assumption (Mauchlys test)

    3. Normal distribution1. Tests

    Plot histogramKolmogorov-Smirnov testShapiro-Wilk test

  • Parametric tests

    Students t-testIndependent t-testDependent t-test

    Analysis of Variance (ANOVA)One-Way ANOVATwo-Way ANOVARepeated measures ANOVAMixed ANOVA

  • Students t-test

    Only two groups of data Sample size < 30 Ratio difference between the means /

    estimated s.e. of the difference between those two sample means.

  • Independent t-test unpaired

    Two groups Different participants, i.e. each participant

    contributes only one data point Normal distribution Levenes test not significant (p = 0.492) Presenting the data:

    m = 24.2, se = 1.49; m = 20.00, se = 1.30.t(18) = 2.12, p < 0.05, r = 0.45.

  • Analysis of Variance (ANOVA) For 3 or more experimental group Generates a F ratio Tells whether the means of the samples are significantly

    different. But not which means. Additional tests need to be done to determine which

    means are statistically different (planned comparison or post-hoc tests).

    Types of ANOVA One-Way ANOVA Two-Way ANOVA Repeated measures ANOVA Mixed ANOVA

  • One-Way ANOVA

    3 or more groups Different participants in each group One-way for 1 independent variable Groups are independent, i.e. data in one group

    does not influence the data in another group Homogeneity of variance

    If Levenes test is significant Use non-parametric tests or transform the data

  • Output from SPSSSum of squares

    df Mean square

    F Sig

    Between groups

    450.664 5 90.133 269.733 0.000

    Within groups

    38.094 114 0.334

    Total 488.758

    Presentation of resultsF (5, 114) = 269.73, p < 0.05

  • Post-hoc tests9 REGWQ or HSD

    Equal sample size and homogeneity of variance9 Bonferroni

    Conservative, effected by group variance9 Gabriels Procedure

    Slight differences in sample size9 Hochbergs GT2

    Big differences in sample size9 Games-Howell Procedure

    If the variances are not equal

  • Assumption of sphericity Equality of variance of the differences between

    treatment levels Mauchlys test

    Violation of sphericity is loss of power, i.e. increased probability of Type II error

    Correction df by Greenhouse-Geisser estimate of sphericity Huyn-Feldt estimate of sphericity Lowest possible estimate of sphericity (lower-bound)

    If > 0.75 use Huyn-Feldt If < 0.75 use Greenhouse-Geiser

  • Some sort of result

    Mauchlys test indicated that the assumption of sphericity had been violated, 2(5) = 13.12, p < 0.05,therefore degrees of freedom were corrected using Huyn-Feldt estimates of sphericity ( = 0.85). The result showed that the number of women eyed-up was significantly affected by the amount of alcohol drunk, F(2.55, 48.40) = 4.73, p < 0.05, r = 0.40)

  • Non-Parametric Test

    Assumption-free tests Less strict about the distribution of the data

    being analyzed. Thus raw scores are used Data is ranked

    Lowest score rank 1, next lowest rank 2, etc Analysis is carried out on the ranks Less power than parameteric tests hence higher

    chance of Type II error.

  • Basic tests

    Mann-Whitney test= independent t-test

    Wilcoxon Signed-Rank test= dependent t-test

    Kruskal-Wallis test = one-way ANOVA

    Friedmans ANOVA= one-way repeated measures ANOVA

  • Mann-Whitney Test = independent t-test

    Two conditions and different participants in each condition

    Men (M dn = 27) did not seem to differ from dogs (M dn = 24) in the amount of dog-like behaviour they displayed (U = 194.5, ns)

    Graphs are usually box and whisker plots. Use of the median

    Non-parametric tests are testing the difference in ranks; not the difference in means.

    Means are biased by outliers; whereas ranks are not.

  • Kruskal-Wallis Test = one-way ANOVA

    More than 2 conditions and different participants have been used in each condition.

    The test statistic is H and has chi-squared distribution Childrens fear beliefs about clowns were significantly

    affected by the format of information given to them (H(3) = 17.06, p < 0.01). Mann-Whitney test were used to follow-up this finding. A Bonferroni correction was applied and so all effects are reported at a 0.167 level of significance. It appears that fear beliefs were significantly higher after the adverts compared to the control, U = 37.50, r = -.60.

  • Claude Bernard(1813-1878)

    Those whose minds are bound and cramped. They make poor observations, because they choose among the results of their experiments, observation, and reading only what suits their object, neglecting whatever is unrelated to it and carefully setting aside everything which might tend toward the idea they wish to combat.

  • Ignaz Semmelweis(1818-1865)

    Savior of mothers 10-35% mortality Death of colleague Jacob

    Kolletschka (1847) Introduced lime washing

    of hands 12.24% to 2.38%. The Etiology, Concept

    and Prophylaxis of Childhood Fever (1861).

  • Semmelweis Reflex

    Dismissing or rejecting out of hand information, automatically, without thought, inspection, or experiment.

  • References Andy Field and Graham Hole. How to Design

    and Report Experiments. Sage Publications. London. 2006.

    Martin Bland. An Introduction to Medical Statistics. 2nd ed. Oxford Medical Publications. 1996.

    Norman T. J. Bailey. Statistical Methods in Biology. 3rd Ed. Cambridge University Press. 1995.

    Beth Dawson and Robert G. Trapp. Basic and Clinical Biostatistics. Lange Medical Books/McGraw-Hill. 3rd ed. 2001.

    Experimental Design What is science?What is an experiment? What can be studied with the methods of science?Galileo Galilei(1564-1642)Types of ExperimentsExperimental Design Three Aims of Research Score or Measurement Types of Measurement Evidence for the intellectual inferiority of women Experimental DesignSimplest Experiment The need of control groupLazaro Sapallanzani 1793 Randomization Between-group Design Repeated-Measure DesignRepeated-Measures Design Repeated-Measure DesignMultiple Levels of Independent Variable4 levelsLatin Square Design ExampleMulti-Factorial Design Correlations Statistical AnalysisMeasurement Descriptive StatisticsMeasuring the accuracy of the meanDistribution of data Distribution measures Standard deviation and percentage of dataPopulation and sample parametersStandard error of the mean (sem)Population and sample parametersWhy Inferential Statistics?So what if there is a difference between the means?The questionCarry out tests of significanceTest StatisticRonald Aylmer Fisher1890-1962Fisher tea testWhich test to use?Parametric or Non-parametricParametric Tests Parametric testsStudents t-testIndependent t-test unpaired Analysis of Variance (ANOVA)One-Way ANOVAOutput from SPSSPost-hoc testsAssumption of sphericitySome sort of result Non-Parametric TestBasic tests Mann-Whitney TestKruskal-Wallis TestClaude Bernard(1813-1878)Ignaz Semmelweis(1818-1865)Semmelweis ReflexReferences