Statistics: The Science of Learning from Data Data Collection Data Analysis Interpretation...

Statistics: The Science of Learning from Data

• Data Collection• Data Analysis• Interpretation• PredictionTake Action• W.E. Deming “The value of statistics and

statisticians is to make predictions that form the basis for action”

Data Collection

• Observational Studies To study correlation in variables

Prediction OK ---infer causation No! • Sampling Surveys Estimate Population Totals, Ratios etc.

• Experimental Designs – to study cause and effect relationships “If you want to predict what will happen in the future when you

do something”

The only way to find out what happens whenyou manipulate a variable, is to go ahead andmanipulate it, then observe the result!

Typical Purposes for Experimentation

• To determine principal causes of variation in a measured response

• To find conditions that give rise to a maximum or minimum response

• To determine if there is a difference in (or how big that difference is) between responses achieved at different settings of controllable variables

• To obtain a mathematical model in order to predict future responses, when controllable variables are changed

Goals of This Class

• Students should be able to choose an experimental design plan that is appropriate for the research problem at hand

• Students should be able to construct the design (including performing proper randomization and determining the required number of replicates)

• Execute the plan to collect the data (or advise a researcher to do it)

• Determine the appropriate model to fit the data• Fit the model to the data and check the appropriateness of the

model• Interpret and explain the results in a meaningful way to answer

the research question

Some Basic Definitions

• Experiment or Run – experimenter changes at least one of the items under study and observes the effect of his action

• Experimental Unit – the “material” under study upon which something is changed

• Treatment Factor or Independent Variable – a variable under study which is controlled at some level during a given experiment and varied from experiment to experiment, at the will of the experimenter


•Treatment Factor Levels – the different settings the

treatment factor that will be used throughout the

course of experimentation

●Background or Lurking Variable A variable the experimenter is unaware of, or cannot control, that may affect the outcome

•Response or Dependent Variable – measurements

of experimental units that depend upon settings of

the factors.


• Effect – Change in the response caused by a change in the factor level

• Replication – more than one experimental unit assigned to the same combination of treatment factor levels

• Repeated measurements (Duplicates) – more one measures of the same characteristic of an experimental unit

• Subsamples – observational unit, random subsample of the larger experimental unit


• Experimental Design – Collection of experiments or runs to be made

• Confounded Factors – two or more factors are changed at the same time resulting in confused effects

• Biased Factor – Background variable changes when factor is changed resulting in confused effect.


• Experimental Error – the difference between the response for a given experiment and the long run average of all potential experiments that could be made at the same factor settings. This is usually caused by inherent differences in experimental units

• Sources of noise – anything that could cause the response for one experiment to be different than another (treatment factors, nuisance factors – variation in experimental units)

Examples of Experimental Units

Medical Experiments – human subjects

Agriculture – individual plots of land

Manufacturing – batch of raw materials

If an experiment has to be run over a period of time with observations collected sequentially over time, the time of the run (or conditions that exist at the time of the run) or trial may be regarded as the experimental unit

Experimental units should be representative of the material and conditions to which the conclusions of the experiment are applied

Blocking

• The act of grouping the experimental units together into similar groups or Blocks

• Each treatment factor level will be tested on at least one experimental level within each Block

Purpose of Blocking

• Increase precision of treatment factor level comparisons by comparing treatment factor levels within homogeneous groups of experimental factors

• Broaden the scope of the results by including blocks which are representative of all conditions where conclusions are to be applied

Randomization

• The act of assigning treatment factor levels to experimental units in a random manner (utilizing a table of random numbers or randomization computer algorithm)

Purpose of Randomization

• Prevent experimenter bias

• Prevent systematic bias

• Insure independence of experimental error

Types of Experimental Designs

Types of Experimental Designs

Classify Sources of Variation

Screen important factors

Constrained optimization

Unconstrained optimization

Mechanistic modeling

Planning Experiments

• Define objectives• Identify experimental units• Define meaningful and measurable response• List independent and lurking variables• Run pilot tests• Make flow diagram of the experimental procedure for run• Choose experimental design • Determine number of replicates• Randomize experimental conditions to experimental units• Describe the method of data analysis• Provide timetable and budget

Example 1

Problem: Orange cookies spread out while cooking

Recipe is the same for both chocolate and orange cookies up to the point of adding the syrup

Baking time and temperature is the same for both

What is the Purpose for Experimenting ?

Hypothesis: Maybe the baking temperature must be modified for the orange cookie recipe

Plan: Vary the oven temperature from one sheet of orange cookies to the next, and measure the diameter of each cookie and calculate the average for each tray of cookies.

The Plan

Run Temperature

Average Diameter

1 350º

2 350º

3 360º

4 360º

5 370º

6 370º

7 380º

8 380º

What is the response ?

What is an experiment ?

What is the experimental design ?

Are there any replicates or repeated reasurements?

Could blocking or randomization help?

What is the treatment factor?

What is the experimental unit?

What other sources of noise exist (besideTreatment factor levels)?

Example 2

Problem: Want to increase average flight time of paper helicopters made from one 8.5×11 sheet of paper

Adding to the tail width and length only increases Fw, therefore hold them constant at minimum

Hypothesis: Changing wing-length and wing-width should affect average flight time, and if so an optimal combination should exist

Plan: Construct four different prototypes to test, test each repeatedly, compare the average flight time

What are the treatment factor(s)?

What is the experimental unit?

What other sources of variation exist (besidetreatment factor levels)?

Will Plants be plantedFar enough apart to Prevent fertilizer Bleeding over?

Tomato Experiment Box Hunter and Hunter(1978)

Why only 11 Plants?

How will YieldsBe measured?

Analysis 1 - Plot the Data!

Tomato Example

0

5

10

15

20

25

30

35

0 2 4 6 8 10 12position

Yiel

d (lb

s.) A

B

Observations:

•quite a bit of variation (factor of 2-3 in yield, low to high);

•one possible outlier

•evidence of trend toward decreasing yield along the row from position 1 - 11

Conclusion: It matters more where you plant than what fertilizer you use! Has the positional trend been noted

before?

Statistics: The Science of Learning from Data Data Collection Data Analysis Interpretation...

Documents

Transcript of Statistics: The Science of Learning from Data Data Collection Data Analysis Interpretation...