Chris Morgan, MATH G160 [email protected] April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some...
-
Upload
corey-preston -
Category
Documents
-
view
217 -
download
0
Transcript of Chris Morgan, MATH G160 [email protected] April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some...
Chris Morgan, MATH [email protected]
April 6, 2012Lecture 27
Chapter 1 (and 7.8 for some reason):
Statistical Applications and Types of Data
2C. Morgan, STAT 225, Fall 2011
A few definitions:
Data – measurements from which information and knowledge are derived, facts and figures collected, analyzed, and summarized
Data Set – a collection of data, usually put in table form
Element – a single cell in a dataset
Observation – a subject on which data is being collected, makes up the rows of a dataset
Variable – any characteristic of an observation, makes up the columns of a dataset
3C. Morgan, STAT 225, Fall 2011
An example of a data set:
University In-State Tution University
Out-State Tuition
Iowa $14,828 Minnesota $24,245
Indiana $16,298 Iowa $30,202
Wisconsin $16,667 Wisconsin $31,927
Purdue $18,190 Ohio State $33,768
Michigan State $19,542 Indiana $34,958
Ohio State $19,584 Purdue $35,742
Minnesota $19,864 Penn State $35,960
Michigan $21,029 Michigan State $37,442
Illinois $23,372 Illinois $37,514
Penn State $24,096 Michigan $45,193
4C. Morgan, STAT 225, Fall 2011
Types of Data (part I):
Quantitative (continuous)
- can be measured (length, volume, weight, cost, etc)
- intervals, ratios, percentages
- differences in intervals do not have “natural” zeros
- differences in ratios do have “natural” zeros
Qualitative (categorical)
- is observed not measured (beauty, taste, texture, smell, color, etc)
- labels or names used to identify an attribute of each element
- nominal: order does not matter (gender, religion, race)
- ordinal: order does matter (class year, pain rating, salsa hotness)
5C. Morgan, STAT 225, Fall 2011
Quantitative
- Height of wheat thin: 1’ ¼’’
- Weight of wheat thin: 1.06 oz
- 22 servings per container
- 11 wheat thins = 1 serving size
Qualitative
- Yellow Box and brown wheat thins
- texture is smooth and yet slightly bumpy
- Chris is obsessed with them
- incredibly delicious!
What type of variable is… (qualitative or quantitative)
- GPA
- The amount of goodness in every wheat thin
- Time it takes to run a mile
- How many wheat thins I can stuff in my mouth at once
- Smoking status
- Income
- The number of places you’d rather be than here
7C. Morgan, STAT 225, Fall 2011
Types of Data (part II):
Cross-sectional data
- observes many objects at one time
- eg. How many wheat thins each of you can eat at once
- eg. Number of people who fall asleep today in class
- eg. The classes opinion on best ice cream flavor
- eg. Your height today
Time series
- observes one subject or many subjects over time
- eg. Average amount of wheat thins each of you can eat every week
- eg. Number of students who fall asleep at least once this semester
- eg. Student’s test scores over the semester
- eg. Your height from age 7 - 22
8C. Morgan, STAT 225, Fall 2011
Data Collection
• Existing Sources
• Surveys
• Observational Studies
• Experiments
9C. Morgan, STAT 225, Fall 2011
Existing Sources
- Look at what others have already collected- many people and companies already have existing databases:
• www.census.gov• www.swivel.com• www.who.org• www.cdc.gov
Surveys
- go out and ask people for their opinion- ask people for information
10C. Morgan, STAT 225, Fall 2011
Observational Studies
- Watch subjects over time and record results
- Comparing sales of different grocery stores in West Lafayette (simply observing their sales records and are not applying a treatment to any group)
- Look up past data and analyze outcomes
Experiments
- design a study to answer specific questions- set up specific treatment to see if there are any outcomes- have a control group- random samples
11C. Morgan, STAT 225, Fall 2011
Statistical Inferences
• Population: the set of all elements of interest in a particularstudy• Sample: a subset of the population• Census: the process of conducting a survey to collect data for the entire population• Sample Survey: the process of conducting a survey to collect data for a sample
Why sample? Logistics, cost, limitations, etc…
Statistical Inference: Using data from a sample to estimate the characteristic of a population
12C. Morgan, STAT 225, Fall 2011
• Take a census by counting the number of “e”s in the given paragraph.
• Take a sample by randomly selecting a line and counting the number of “e”s and then
multiplying by the number of lines in the paragraph.
• How close are we?
Statistical Inference Example:
Statistical Inference Example:
Elegant, extravagant elephants entertain every evening at seven. They serve escargot and eggs benedict. Eight elderly elegant elephants elevate themselves to the expensive entrance with elevators exceeding expectations. Eating everything edible, elephants expand exponentially. “Excellent!” the entertained elephants express after the entertaining entrees were served. Everything was expedited by the energetic efforts of the executive elephant empress. Everyone was entertained to excess and enjoyed the edible endeavors immensely. The evening ended enchantedly with Echinacea herbal tea.
Statistical Inference Example:
• Total “e” count: 126• I randomly chose line #3 with an “e” count of 12
–12x12=144• I randomly chose line #10 with an “e” count of 11
–11x12=132
Sampling Methods
• Stratified Sampling
• Cluster Sampling
• Systematic Sampling
• Convenience
Sampling
• Judgment Sampling
16C. Morgan, STAT 225, Fall 2011
Sampling Methods – Simple Random Sampling (SRS)
• Finite population: A sample of size n from a finite population
of size N is selected such that each possible sample of size n
has the same probability of being selected.
• Infinite population: A sample is selected from a population in
such a way that each element has the same probability of
being selected.
• Sampling With Replacement: Elements are put back in the
population after being selected for
• Sampling Without Replacement: Elements are not replaced
after being selected and are therefore only chosen once to be
in a sample.
Sampling Methods – SRS example
Say I want to take a sample of NFL football teams
1. make a list of all the teams 2. randomly select 8 teams
without replacement: select one team at a time and then remove the chosen team from the listwith replacement: select one team at a time, but do not remove the chosen team from the list
Stratified Sampling
- Divides population into groups called strata
- Takes a simple random sample (SRS) from each strata
- Divide students into class year and take a random sample from
eachCluster Sampling
- divides population into groups called clusters- takes a SRS of clusters- each element in the group is a part of the sample
Systematic Sampling
- number the units in the population from 1 to N, decide on the
n (sample size) that you want or need- set k = N/n first, one of the first k elements is selected
and then every kth element thereafter is selected.
19C. Morgan, STAT 225, Fall 2011
Convenience Sampling
- Easiest sampling method, usually cheapest and easiest to
implement
- Fliers on campus for people to participate in surveys or other
studies
- choosing a random box of wheat thins to determine quality
instead of sampling from twenty boxes
- Not supported as a probability sampleJudgement Sampling
- not scientific at all- based on one sampler’s opinion- does this one sample (one observation in this case) represent
the whole of the population? Why or why not?
20C. Morgan, STAT 225, Fall 2011
Sampling Methods: example
• Convenience Sample: select subjects 1-4• Stratified Random Sample: divide the 20 subjects into 4 non-overlapping groups each has 5 subjects, choose 1 subject from every group• Cluster Sample: divide the 20 subjects into 10non-overlapping groups each has 2 subjects, randomly choose 2 of these groups, those subject in the 2 chosen groups are selected in the sample• Systematic Sample: Randomly choose 1 from the first 5subjects, for example 4, then choose 4, 9, 14, 19 in the sample
21C. Morgan, STAT 225, Fall 2011
Bias
Bias is any deviation of your expected result of the survey from the true population
Sources of bias include:
- poorly worded questions
- bad communication
- sensitive questions that some may not want to answer, or answer incorrectly
- entry error (human error)
22C. Morgan, STAT 225, Fall 2011
Avoiding Bias• Confusing wording? – If you have to read it more than once to understand what its saying• Asking something no one would remember? – What were you doing between 8 and 8:15 on Tuesday November 5th
2005• Leading the question to a certain answer – Would you advocate a recycling plan that would help reduce landfill
mass? – Would you pass a bill outlawing the shipment of oil from Alaska to Russia due to the large death rate of the baby seals?• Something really embarrassing that they wouldn’t answer honestly – Do you always wash your hands after using the restroom? – Have you ever cheated on a test? – Have you ever done drugs?• Date sensitive question – How safe do you feel at Purdue University (what if this was asked right after the Virginia Tech shootings?)
23C. Morgan, STAT 225, Fall 2011