Elementary Statistics Professor K. Leppel. Introduction and Data Collection.

Post on 01-Jan-2016

217 views 1 download

Tags:

Transcript of Elementary Statistics Professor K. Leppel. Introduction and Data Collection.

Elementary Statistics

Professor K. Leppel

Introduction and Data Collection

Definitions

Population: All observations of interest in a given context

Sample: A subset of a population

Example 1Suppose you are the president of Widener University.

Population: All Widener students.

Sample: All Widener students taking classes in the School of Business Administration.

Example 2Suppose you are the head of the Economics Dept.

Population: All Widener students taking Economics classes.

Sample: All Widener students taking Professor Leppel’s classes.

More Definitions

Population parameter or parameter:numerical characteristics of a population

Sample statistics:numerical characteristics of a sample

Deductive vs. Inductive Reasoning

Deductive:

population sample

general specific

Probability

Example: Suppose you have a bowl with 2 red marbles & 3 green ones. If you pick one, what is the probability that the marble is green?

Deductive vs. Inductive Reasoning

Inductive:

sample population

specific general

Statistics

Example: If you take a poll & note the voting preferences of this sample, we will be able to draw some conclusions about the votes of the population.

Sampling with & without Replacement

Sampling without replacement: once an element of a population has been selected as part of a sample, it cannot be selected again.

Sampling with replacement: an element of a population that has been selected as part of a sample can be selected again.

Random sampling vs. non-random sampling

Random sampling or probability sampling:sampling in which the probability of inclusion of each element in the population is known.

Non-random sampling or judgment sampling:sampling in which judgment is exercised in deciding which elements of population to include in the sample.

Simple Random Sample

A sample of n elements is a simple random sample if sampling is performed such that every combination of n elements has an equal chance of being the sample selected.

Two Types of Studies

1. Observational or comparative studies

The analyst examines historical relationships among variables of interest.

Problem: Deriving cause & effect relationships from historical data is difficult because important environmental factors are generally not controlled & not stable.

2. Direct experimentation or controlled studies

The investigator directly manipulates factors that affect a variable of interest.

Control Group

To understand the effect of a “treatment,” we need to compare a group that received a treatment with a group that received no treatment. The “no-treatment” group is the control group.

Two types of errors

1. Systematic errors or bias:

These errors cause measurement to be incorrect in some systematic way.

They are caused by inaccuracies or deficiencies in the measuring instrument.

Systematic errors persist even when the sample size is increased.

These errors arise from a large number of uncontrolled factors - chance.

Random errors decrease on average as the sample size is increased.

2. Random error or sampling error:

Some of the variables with which you will work are qualitative and some are quantitative.

Qualitative variables are categorical and can be subdivided into nominal and ordinal measures.

Quantitative variables are numerical and can be subdivided into interval and ratio measures.

Qualitative (categorical) variables that are nominal have no order to them.

Example 1: U.S. citizenship (yes, no)

Example 2: On what continent were you born? (N. America, S. America, Africa, Antarctica, Asia, Australia, Europe)

Sex (male, female) is sometimes considered as a nominal variable. However, if you take into consideration intersex individuals, who can have any of a variety of anatomical conditions that don’t fit the typical definitions of female or male, you no longer have a simple nominal measure.

Qualitative (categorical) variables that are ordinal have an implied ranking of a characteristic.

Example 1: student class (freshman, sophomore, junior, senior)

Example 2: customer service satisfaction(very dissatisfied, somewhat dissatisfied, neither satisfied nor dissatisfied, somewhat satisfied, very satisfied)

Example 1: IntelligenceA person with an IQ of 150 is much more intelligent than a person with an IQ of 100, while a person with an IQ of 140 is somewhat more intelligent than a person with an IQ of 125.However, what would an IQ of 0 mean? And a person with an IQ of 200 is not twice as smart as one with an IQ of 100.

Example 2: SAT scores

They are measured on an ordered scale in which the difference between measurements is meaningful.However, there is no true zero point where there is none of a specific characteristic. Also, if the measure is twice as large, that does not imply that there is twice as much of the characteristic.

Switching to quantitative (numerical) variables, interval variables are a bit tricky.

Example 1: IncomeA person with zero income has no earnings or other source of money. And someone who has income of $100,000 has twice as much money coming in as someone who has income of $50,000.

Example 2: Age

Quantitative variables (numerical) that are ratio variables have true zero points and ratios work in the expected way.