Things gone bye.. How to Predict The Future Either the world is driven completely by random chance...
-
Upload
stephanie-hood -
Category
Documents
-
view
218 -
download
0
description
Transcript of Things gone bye.. How to Predict The Future Either the world is driven completely by random chance...
![Page 1: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/1.jpg)
Things gone bye.
![Page 2: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/2.jpg)
How to Predict The Future
• Either the world is driven completely by random chance events (and your best bet for predicting the future is using Tarot cards or a Magic 8 Ball™), or there are detectable patterns in the world.
• If you talk to a preschool teacher or a PhD in math, they will tell you that math is all about pattern detection.
![Page 3: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/3.jpg)
What We Want….
• You want to do deterministic modeling where we’re able to fill in a table like this:
…and express it with a simple formula like this:lbs = weeks * something
Weeks of Gestation (in weeks)
Weight at birth (in lbs)
29 weeks
38 weeks39 weeks40 weeks
…
β (beta) value
![Page 4: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/4.jpg)
What (else) We Want….
• Once we have made guesses at those numbers, we want to say how confident we are that they are right.
![Page 5: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/5.jpg)
The Process
• The process of going from a single predictor or a set of predictors to a predicted outcome is called statistical modeling.
• People get far too excited about figuring out which statistic (with accompanying p-values anxiety) to use for the factors that are used in models.
![Page 6: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/6.jpg)
The Steps• Say what you are testing.• Note the scale (nominal, ordinal, interval) of all
the predictors.• Describe the predictors numerically and
graphically.– Measures of central tendency and variability
• Look for association between the predictors and the outcome.
• Look at the strength of the association.• Look for interactions.
![Page 7: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/7.jpg)
What is a model and why care?
• The predictors and the outcomes can be on a continuous scale (time in days) or categorical factors (mom smoked, yes or no).
• Generally we try to use all the information available when we make a prediction about the future.– The amount of blood ejected each time the heart
beats (continuous scale) as opposed to whether or not the heart is beating
– The number of cancer cells seen on a slide (or the presence or absence of malignant cells)
• The models we build are remarkably similar regardless of whether we have categorical or continuous outcomes.
![Page 8: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/8.jpg)
The Structure of a Model• All the models I learned in school were formulated
at their core like this:Outcome = baseline + predictor + predictor
• The math can get ugly very quickly depending on the properties of the outcome (continuous, count, categories) but the core idea is that these models are all using additive contributions from some predictors!
Baby’sWeight
Impact of time
Impact of being a smoker
Weeks * a number a numbersome number
![Page 9: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/9.jpg)
What Makes a Bad Model
• Predicts some outcomes poorly• Is strongly influenced by a small number of
data points• Shows systematic patterns in how it fails
to predict
![Page 10: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/10.jpg)
GoalsI see modeling as having two goals:• Estimate parameters.
– How much weight gain occurs each week as a baby is developing?
• Estimate how well it describes your data. (Is your guess precise?) – How far off will my guess be when I predict the next
child?– Are there regions where my guesses are far off, like
premature or late deliveries?– Is there a lot of variability at one point and not at
others?– Can I see any problems when I fit the model to THIS
data?
![Page 11: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/11.jpg)
Looking for Errors
• Statisticians use the word “error” differently than everyone else.– You know that you will not have perfect
prediction. Instead, you will be off. That is error. It does not mean somebody made a mistake! It just means you can’t make a perfect prediction.
– Specifying how far you will be off is the fun and interesting part of statistics. The rest is just math.
![Page 12: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/12.jpg)
Looking at Errors
Outcome = baseline + predictor + predictor + error
Baby Weight Impact of time
Impact being a smoker
Weeks * a number a number
some number
a numberdrawnfrom a
bell shaped distribution
![Page 13: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/13.jpg)
Looking for Errors• Hopefully you will see that, given any specific
predictor value, your guessed values for the outcome will be close to the values you actually observe in the outcome. Also, any observed outcome values that stray too far from your guess are unlikely.
• That pattern of how far off your guesses are from your observed data can frequently be described by a bell-shaped (“normal”) histogram. So, if you measure errors between your prediction and the observed outcomes, the distribution should be “normal.”
![Page 14: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/14.jpg)
Guesses and Errors
I guessed way too high rarely
I guessed way too low rarely
My model guesses7.5 lbs
9.5 lbs5.5 lbs
Histogram of actual weights at 40 week births
Histogram of errors at 40 weeks
Most errors are off by just a bit0 error if child was 7.5 lbs
![Page 15: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/15.jpg)
Variance vs. Standard Error
• The variability around a continuous outcome is frequently described as a variance. The variability around samples in a sampling distribution is frequently described as a standard error. – There are patterns in the variability affected
by the number of people in the sample.
![Page 16: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/16.jpg)
Looking at Errors• There are some kinds of errors that you will be
unwilling to accept.• If I want to predict the number of times an evil
lackey proposes marriage to a mad scientist, I will not accept a negative number!
• If I am predicting the chance of someone developing cancer, I will not accept a number less than 0% or greater than 100%.
• Specifying the type of errors is a critical part of building a model.
![Page 17: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/17.jpg)
More on Errors
• In addition to specifying the range of legal values, another critical component is specifying the variability in the errors.– You have met several probability distributions
which let you quantify what is an unusual score given a few parameters describing your data.
• Continuous outcomes– Uniform, Normal, T, F
• Categorical outcomes– The Binomial, Bernoulli, Chi-square
![Page 18: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/18.jpg)
Ordinary Least Squares
• Perhaps the easiest models to draw and understand are ones where you have a continuous outcome like weight and a continuous predictor like time.
• The model is just a line….• Y = mX + bWeight = estimated weight gain each week after conception * number
of weeks + weight at 0 weeks
![Page 19: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/19.jpg)
30 35 40
1000
2000
3000
4000
5000
GWKS_DEL
FETA
L_W
GT_
Maximum Likelihood Visual
![Page 20: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/20.jpg)
Bad Models
• All models are wrong.• Your data is sacred (after you remove the
pregnant men) and you fit models to the data. You do not fit data to a model. That difference is not a semantic minor detail.
![Page 21: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/21.jpg)
Poor Predictions
• Sometimes you have data points that are not well fit by the model. Go to extreme measures to document those points. If the data is not a true error, then run the analysis with it and without it. Include the point(s) in all your plots with a special symbol and if one person changes your inferences, consider excluding them. – You may have different subgroups that you
have not identified yet.
![Page 22: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/22.jpg)
A True Outlier
30 35 40
1000
2000
3000
4000
5000
GWKS_DEL
FETA
L_W
GT_
Induced because of HUGE size
![Page 23: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/23.jpg)
Looking at Residuals
• A critical step in examining the quality of a model is graphically looking at the residuals.
• Residuals are the differences between the estimated values and the observed values for each person/critter/observation.
• Look for curves, changing variability across the range of values or changes over time.
![Page 24: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/24.jpg)
Patterns in Residuals
From Crawley: Statistical Computing
![Page 25: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/25.jpg)
Curve Fitting
• Linear models can model curves– The math is not too bad….
• You can use explicit mathematical formulas. If you see curves in your residuals, you can use things like:– Polynomials or inverse polynomials– Exponentials– Power functions
![Page 26: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/26.jpg)
Nonlinear Regression
• Often the formulas to describe your data are extraordinarily complicated and you want to use non-linear or non-parametric modeling instead.
• Key words you will see include:– Non-parametric smoothing
• Lowess regression• Spine regression
– GAM– Tree models
![Page 27: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/27.jpg)
A Bad Fit
• What happens when you fit a straight linear model to curvilinear data?
0 10 20 30 40 50
020
4060
8010
012
014
0
X = Age
Y=
Siz
e
0 10 20 30 40 50
020
4060
8010
012
014
0
X = Age
Y=
Siz
e
Is this better than a flat line at the mean?
residual
![Page 28: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/28.jpg)
Is it good?
• A tiny p-value does not mean a good model!
• Where on the output does it tell that this is a good or a poor model?
![Page 29: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/29.jpg)
Residuals?
0 10 20 30 40 50
020
4060
8010
012
014
0
X = Age
Y=
Siz
e
Flatten the line, then look up and down to see if you are systematically off.
![Page 30: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/30.jpg)
Curve Fitting!
• You can build a model that has a curve using a polynomial… the degree of the polynomial determines how many “bends” appear in a curve. So a 2nd degree polynomial would use x and x2 while a 3rd degree polynomial would use x and x2 and x3. These squared or cubed values don’t do anything especially complicated. They are just like adding new variables.
![Page 31: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/31.jpg)
0 10 20 30 40 50
020
4060
8010
012
014
0
X = Age
Y=
Siz
e
Polynomialssize = intercept + X * something + X2*something else
0 10 20 30 40 50
020
4060
8010
012
014
0
X = Age
Y=
Siz
esize = intercept +
X * something + X2* something else +X3 * another thing
poly2 = lm(y~poly(x,2)) poly3 = lm(y~poly(x,3))
![Page 32: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/32.jpg)
Generalized Linear Models• You will eventually move out of the realm of
predicting continuous outcomes with normal error. When you do, you will move into the realm of Generalized Linear Models (GLM).
• You want to have a linear model predicting an outcome where you restrict the possible outcome values (e.g., only allow values between 0 and 1) and deal with errors not being consistently normal across the entire range.
• You can change (transform) your outcome and model this with just another linear model similar to what I have shown.
![Page 33: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/33.jpg)
GLM in English
• If you are predicting the number of bacteria you see in a Petri dish, you can not possibly see a negative number of bacteria. A GLM model can be written so that your predicted values can not be negative.
• Contrast this with the baby weight example where with a bit of bad data for your predictor value, you could have the formula spit out a negative weight or a baby weighing a ton.
![Page 34: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/34.jpg)
GLM
• Instead of modeling like this:Outcome =
baseline + predictor + predictor + error
• You can model with GLM like this:Tweaked outcome =
baseline + predictor + predictor + not normal error
normal/bell-shaped
log(odds of event) = baseline + predictor * β1+ predictor * β2 + binomial error
![Page 35: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/35.jpg)
Ordinary Regression
• So, the ordinary least squares regression models are really just a case of GLM. In these cases I specify that the tweak to the outcome is to just make the outcome identical to what it was originally and the error is normal.
• The tweak to the outcome is called the link and this case the link is called identity.
![Page 36: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/36.jpg)
Mort = 389 - 5.98* lattitude
![Page 37: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/37.jpg)
Link Functions
• The tweaks to the outcome are called links:
• Identity link = predicting a continuous outcome (baby weight)
• Log link = if you can’t have negative values
• Logit link = if you have to restrict the range to between 0 and 1
• There are other links.
![Page 38: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/38.jpg)
Error Structure• Why bother to specify an error structure other than
normal?– Strong skew, kurtosis errors, bounded errors, negative counts
• The shape of the error distribution is not a bell-shaped curve. Rather than worrying about the math to describe those curves, you simply need to know that different types of data have different error structures.– Normal errors – continuous outcomes– Poisson errors - counts– Binomial errors - proportions– Gamma errors - variation
![Page 39: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/39.jpg)
Binary Response
• If you are not dealing with a continuous outcome, or count data, you will likely have a binary (yes/no scored as 1 or 0) outcome.
• Clearly you need to do some major tweaking to the outcome because linear models, as we have seen, can predict very large and small numbers.
• Also, the variability of a binary outcome is very different from a continuous variable.
![Page 40: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/40.jpg)
Logistic Regression• The solution is to specify a link that limits values
to be between 0 and 1 (think of the changed outcome as being the probability of being scored 1) and use an error term that behaves well with binary outcomes.
• This is a GLM with a logit link and binomal errors.
• This kind of analysis is so popular that most people don’t know it is a GLM. Rather, they know it only as logistic regression.
![Page 41: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/41.jpg)
Logistic model
log(odds of high) = -17.81 + . 4539 * lattitude
![Page 42: Things gone bye.. How to Predict The Future Either the world is driven completely by random chance events (and your best bet for predicting the future.](https://reader035.fdocuments.in/reader035/viewer/2022062412/5a4d1b0b7f8b9ab05998adb8/html5/thumbnails/42.jpg)
So Long (and thanks for all the fish)
• Drop by and say hi or send me an email if you have questions in the future.