Chapter 5: Producing Data

22
Chapter 5: Producing Data “An approximate answer to the right question is worth a good deal more than the exact answer to an approximate question.’ John Tukey

description

Chapter 5: Producing Data. “An approximate answer to the right question is worth a good deal more than the exact answer to an approximate question.’ John Tukey. 5.1 Designing Samples (p. 245-261) (Overview). - PowerPoint PPT Presentation

Transcript of Chapter 5: Producing Data

Page 1: Chapter 5: Producing Data

Chapter 5:Producing Data

“An approximate answer to the right question is worth a good deal more

than the exact answer to an approximate question.’

John Tukey

Page 2: Chapter 5: Producing Data

5.1 Designing Samples (p. 245-261)(Overview)

One must design the sampling process very carefully in order to obtain reliable statistical information.

Meaningful and useful results can be produced by good sampling techniques, many of which involve the use of chance.

Worthless data is produced by bad sampling techniques.

Page 3: Chapter 5: Producing Data

Definitions

Voluntary response sample Consists of people who chose themselves. Example:

Listeners who call in to respond to a talk show question

Two variables are confounded when their effects on a response variable cannot be distinguished from one another. See Example 5.2 in textbook in which the explanatory

variable (the reading of favorable propaganda) and the events of history are confounded.

Page 4: Chapter 5: Producing Data

Definitions (cont’d.)

Statistical Inference: provides ways to provide “reasonable” responses to specific questions by examining data.

Population: group from which information is desired.

Sample: part of the population that is examined in an attempt to obtain information about the population.

Page 5: Chapter 5: Producing Data

Definitions (cont’d.)

Sampling Frame: the list of individuals from which a sample is actually selected. Example:

Population: adult residents of Delaware County Sampling Frame: voter registration roll

Design: the method that is used to select the sample.

Page 6: Chapter 5: Producing Data

Definitions (cont’d.)

Convenience Sample: selecting individuals that are easiest to reach. Examples:

Opinions offered by shoppers entering or leaving a WaWa or Borders in Springfield (used by Daily Times)

Opinions offered by students of a Catholic school( used by Catholic Standard and Times)

Biased Sample: sample that has been systematically chosen because of favoritism of a specific outcome.

Page 7: Chapter 5: Producing Data

Definitions (cont’d.)

Simple random sample (SRS) of size n: sample that is chosen is such a way that every set of n individuals has an equal chance of being selected to be included in it.

Sometimes this is easier said than done! It can be tricky to obtain an SRS.

Probability sample: each member of the population is given a known chance of being chosen.

Page 8: Chapter 5: Producing Data

Definitions (cont’d.)

Stratified Random Sample: Steps:

Population is divided into groups called strata A SRS is chosen from each strata SRS’s are combined into one sample

Reasons: To reduce the variation of the estimators Administrative convenience Less expensive Estimates need “subgroups” of population

Page 9: Chapter 5: Producing Data

Definitions (cont’d.)

Multi stage sample design: the selection of smaller groups within a population by stages.

Undercoverage occurs when some groups in the population are left out in the process of choosing the sample.

Nonresponse occurs when an individual cannot be contacted or refuses to cooperate.

Response bias refers to a variety of things that can lead to an incorrect or false response.

Page 10: Chapter 5: Producing Data

Final Thoughts:

The wording of the question can greatly influence the response.

A poorly worded question can confuse those who are attempting to answer it.

Page 11: Chapter 5: Producing Data

5.2 Designing Experiments (p. 265-284)Am Overview

There are good and bad techniques for producing data.

Important and effective statistical practices are the use of random sampling and randomized comparative experiments.

The use of chance is vital in statistical design.

Page 12: Chapter 5: Producing Data

Concepts and Definitions

In an observational study, NO treatment is imposed on the individuals in the study. Variables of interest are measured,

usually over a period of time. In an experiment, treatment is imposed on

the individuals in the study. Responses to the treatment are

observed.

Page 13: Chapter 5: Producing Data

Definitions (cont’d.)

Experimental units are individuals on which the experiment is performed. i.e. participants in the experiment

A treatment is a specific experimental condition that is applied to the experimental units.

A placebo is a dummy treatment that can have no physical effect on an experimental unit. Commonly called a “sugar pill.”

Page 14: Chapter 5: Producing Data

Definitions (cont’d.)

The control group receives the placebo. This group helps the experimenter to

control the effects of any lurking variables.

The treatment group receives the treatment.

Page 15: Chapter 5: Producing Data

Definitions (cont’d.)

Completely randomized experimental design:

All experimental units are allocated at random among the treatments

Statistically significant observation:

An observed result that is too unusual to be an outcome determined by pure chance.

Page 16: Chapter 5: Producing Data

Three Principals of Experimental Design

1. CONTROL Needed to counter the effects of lurking variables. Comparison is the simplest form of control. Experiments should compare two or more treatments in order to

avoid confounding the effect of the treatment with some other influence.

2. RANDOMIZATION Subjects are assigned treatments by pure chance. Creates groups that are similar (except for chance variation) Table of random digits can be used to choose the uits for each

group

3. REPLICATION Experiment should be done on many subjects to reduce any

chance variation in the results.

Page 17: Chapter 5: Producing Data

Definitions (cont’d.)

In a double blind experiment, neither the subjects nor the people who have contact with them know which treatment a subject is receiving.

A block design Minimizes variation. Block: group of experimental units or subjects that are

similar in ways that are expected to affect the response of the treatments. Treatment is assigned randomly within similar blocks.

A form of control.

Page 18: Chapter 5: Producing Data

Definitions (cont’d.) Matched pairs:

Common form of blocking Compares two treatments The pairs are “alike” Common forms:

Using random process In pair, one receives treatment, other receives placebo Pairs are observed at a later time to see if treatment had

any effect Test scores from a before-after situation

Individual Takes a before-test Receives some type of treatment Takes an after-test

Purpose: to see if treatment improves test performance

Page 19: Chapter 5: Producing Data

5.3 Simulation Experiments (p. 286-296)An Overview

Empirical probabilities relating to real-life can be obtained

Chance outcomes can be imitated by using Random number generators

Tables Calculators Computers

Dice Cards Spinners

Page 20: Chapter 5: Producing Data

Simulation

The imitation of chance behavior in an attempt to gain information about a real-life situation

randInt( can be used on your TI-84 plus to generate random integers

Page 21: Chapter 5: Producing Data

Steps in Creating a Simulation Model

1. State the problem or describe the experiment.

2. State the assumptions.

3. Assign digits to represent outcomes.

4. Simulate your conclusions.

5. State your conclusions.

Page 22: Chapter 5: Producing Data

When Trials are Completed

Determine empirical probability by calculating the ratios Number of situations in which you are

interested divided by the total number of trials.