Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you...

37

Transcript of Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you...

Page 1: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Things you learn on one road can be really helpful on later roads

Brainy CEO with a Philosophy of Fun

Minneapolis St Paul Business Journal - January 6 2006

by Steve LeBeau Managing Editorhttpwwwbizjournalscomtwincitiesstories20060109focus1html

Robert Senkler knows his business from the bottom up because thats where he started -- as an actuarial trainee 31 years ago

His career path first emerged in second grade when he realized he was a math whiz He attended the University of Minnesota Duluth -- partly because it had a great mathematics department but also because it was close to some family land where he could hunt

bull After Senkler graduated from UMD hendash hired a writing coach ndash joined Toastmasters to work on his speaking

skills

bull Writing and speaking matter

bull Readndash Browse the current books ndash Reading effective writing helps guide your own writingndash John Grisham or Steven king work fine

bull Speakndash Take a speech classndash Act in a play

Career Advice

ldquoBe something you love and understandrdquo

Ronnie Van Zant

Consulting Advice

John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack

The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware

The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey

Understanding the Questions bull For example if your interest is in biological applications

bull Take real biology classes

bull Go to biology and medicine seminars

bull Read published reports

bull Work in a biology lab

bull Computational tasks

bull Web design

bull Volunteer

bullSimilarly if you have different areas of interest

Data Analysis

Plot the data

01

1

10

100

1000

01 1 10 100

Fetal Weight

Placental Weight

Computingbull Take CS or MIS classesbull Take other classes with computing

ndash GIS Geographic and Information Systems classndash Bioinformatics

bull Excel SAS and Rbull SAS Certification

ndash The Little SAS Book ndash SAS Certification Guide

bull Base Programmingbull Advanced Programming

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 2: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Brainy CEO with a Philosophy of Fun

Minneapolis St Paul Business Journal - January 6 2006

by Steve LeBeau Managing Editorhttpwwwbizjournalscomtwincitiesstories20060109focus1html

Robert Senkler knows his business from the bottom up because thats where he started -- as an actuarial trainee 31 years ago

His career path first emerged in second grade when he realized he was a math whiz He attended the University of Minnesota Duluth -- partly because it had a great mathematics department but also because it was close to some family land where he could hunt

bull After Senkler graduated from UMD hendash hired a writing coach ndash joined Toastmasters to work on his speaking

skills

bull Writing and speaking matter

bull Readndash Browse the current books ndash Reading effective writing helps guide your own writingndash John Grisham or Steven king work fine

bull Speakndash Take a speech classndash Act in a play

Career Advice

ldquoBe something you love and understandrdquo

Ronnie Van Zant

Consulting Advice

John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack

The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware

The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey

Understanding the Questions bull For example if your interest is in biological applications

bull Take real biology classes

bull Go to biology and medicine seminars

bull Read published reports

bull Work in a biology lab

bull Computational tasks

bull Web design

bull Volunteer

bullSimilarly if you have different areas of interest

Data Analysis

Plot the data

01

1

10

100

1000

01 1 10 100

Fetal Weight

Placental Weight

Computingbull Take CS or MIS classesbull Take other classes with computing

ndash GIS Geographic and Information Systems classndash Bioinformatics

bull Excel SAS and Rbull SAS Certification

ndash The Little SAS Book ndash SAS Certification Guide

bull Base Programmingbull Advanced Programming

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 3: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

bull After Senkler graduated from UMD hendash hired a writing coach ndash joined Toastmasters to work on his speaking

skills

bull Writing and speaking matter

bull Readndash Browse the current books ndash Reading effective writing helps guide your own writingndash John Grisham or Steven king work fine

bull Speakndash Take a speech classndash Act in a play

Career Advice

ldquoBe something you love and understandrdquo

Ronnie Van Zant

Consulting Advice

John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack

The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware

The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey

Understanding the Questions bull For example if your interest is in biological applications

bull Take real biology classes

bull Go to biology and medicine seminars

bull Read published reports

bull Work in a biology lab

bull Computational tasks

bull Web design

bull Volunteer

bullSimilarly if you have different areas of interest

Data Analysis

Plot the data

01

1

10

100

1000

01 1 10 100

Fetal Weight

Placental Weight

Computingbull Take CS or MIS classesbull Take other classes with computing

ndash GIS Geographic and Information Systems classndash Bioinformatics

bull Excel SAS and Rbull SAS Certification

ndash The Little SAS Book ndash SAS Certification Guide

bull Base Programmingbull Advanced Programming

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 4: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Career Advice

ldquoBe something you love and understandrdquo

Ronnie Van Zant

Consulting Advice

John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack

The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware

The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey

Understanding the Questions bull For example if your interest is in biological applications

bull Take real biology classes

bull Go to biology and medicine seminars

bull Read published reports

bull Work in a biology lab

bull Computational tasks

bull Web design

bull Volunteer

bullSimilarly if you have different areas of interest

Data Analysis

Plot the data

01

1

10

100

1000

01 1 10 100

Fetal Weight

Placental Weight

Computingbull Take CS or MIS classesbull Take other classes with computing

ndash GIS Geographic and Information Systems classndash Bioinformatics

bull Excel SAS and Rbull SAS Certification

ndash The Little SAS Book ndash SAS Certification Guide

bull Base Programmingbull Advanced Programming

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 5: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Consulting Advice

John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack

The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware

The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey

Understanding the Questions bull For example if your interest is in biological applications

bull Take real biology classes

bull Go to biology and medicine seminars

bull Read published reports

bull Work in a biology lab

bull Computational tasks

bull Web design

bull Volunteer

bullSimilarly if you have different areas of interest

Data Analysis

Plot the data

01

1

10

100

1000

01 1 10 100

Fetal Weight

Placental Weight

Computingbull Take CS or MIS classesbull Take other classes with computing

ndash GIS Geographic and Information Systems classndash Bioinformatics

bull Excel SAS and Rbull SAS Certification

ndash The Little SAS Book ndash SAS Certification Guide

bull Base Programmingbull Advanced Programming

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 6: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Understanding the Questions bull For example if your interest is in biological applications

bull Take real biology classes

bull Go to biology and medicine seminars

bull Read published reports

bull Work in a biology lab

bull Computational tasks

bull Web design

bull Volunteer

bullSimilarly if you have different areas of interest

Data Analysis

Plot the data

01

1

10

100

1000

01 1 10 100

Fetal Weight

Placental Weight

Computingbull Take CS or MIS classesbull Take other classes with computing

ndash GIS Geographic and Information Systems classndash Bioinformatics

bull Excel SAS and Rbull SAS Certification

ndash The Little SAS Book ndash SAS Certification Guide

bull Base Programmingbull Advanced Programming

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 7: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Data Analysis

Plot the data

01

1

10

100

1000

01 1 10 100

Fetal Weight

Placental Weight

Computingbull Take CS or MIS classesbull Take other classes with computing

ndash GIS Geographic and Information Systems classndash Bioinformatics

bull Excel SAS and Rbull SAS Certification

ndash The Little SAS Book ndash SAS Certification Guide

bull Base Programmingbull Advanced Programming

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 8: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

01

1

10

100

1000

01 1 10 100

Fetal Weight

Placental Weight

Computingbull Take CS or MIS classesbull Take other classes with computing

ndash GIS Geographic and Information Systems classndash Bioinformatics

bull Excel SAS and Rbull SAS Certification

ndash The Little SAS Book ndash SAS Certification Guide

bull Base Programmingbull Advanced Programming

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 9: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Computingbull Take CS or MIS classesbull Take other classes with computing

ndash GIS Geographic and Information Systems classndash Bioinformatics

bull Excel SAS and Rbull SAS Certification

ndash The Little SAS Book ndash SAS Certification Guide

bull Base Programmingbull Advanced Programming

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 10: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

However

bull If you do everything or even everything as well as you can you arenrsquot prioritizing

bull ldquoEverything in moderation including moderationrdquo Lost Horizon

bull Have a few things that you have done thoroughly and wellbull See recommendations darr

bull Get to know at least three people well enough so that they can give you good letters of recommendation

bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively

bullThis may not be in class

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 11: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Design and Data Analysisbull Plan carefully before collecting data

bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might

influence results

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 12: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

bull Understand the questionsbull Know how the experiment was conducted

bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip

bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries

bull Age = 240bull Check consistency of entries

bull Not male and hysterectomy

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 13: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

bull Check assumptionsbull Independence How was the experiment

conductedbull Plot the original data pointsbull Normality

bull PDP polymerase significant after log transformation

bull Insect traps significant after log transformation

bull Equal variancesbull Fixing assumptions

bull Transformations sometimesbull Other methods

bull Weibull models

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 14: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

bull Check assumptionsbull Plot residualsbull Especially with more complicated data

where itrsquos hard to plot all factors at one time effectively

bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance

bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection

bull Check for drift over time

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 15: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual

bull If it has large influence on the fitted model

0

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16

Y

X

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 16: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project

bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties

the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 17: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps

bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)

bull AICC Corrected for small sample sizes

Overfitting Polynomial

0

20

40

60

80

100

120

0 1 2 3 4 5 6

X

Y

bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 18: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Using Regression Class Formulas

Using x = 1 2 3 4 5

Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156

( )ySE ˆAt x = 6 Simulations (1000)

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 19: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple

( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )xPxPxPxPyy

aaaxxxxxP

axxxxxP

axxxP

xxxP

44332211

22224

4

23

3

22

2

1

ˆˆˆˆˆ560

01314

13320

7312

1

lowast+lowast+lowast+lowast+=

minusminuslowast+

minuslowastminusminusminus=

minuslowastminusminusminus=

minusminusminus=

minus=

ββββ

Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term

above and beyond the linear term

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 20: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

let b0=5let b1=20let std=1let nsim=1000

data polynomialcall streaminit(0)do isim=1 to ampnsim

do time= 1 to 5 by 1mu=ampb0+ampb1time

y=rand(normal mu ampstd)output

end time = 6 y = output

endrun

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 21: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun

ods listingproc means data=outp

where time = 6var predrun

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 22: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

For Balanced Factorial Models andor Blocks

bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks

bull The difference between means with same number of values inthe means either way

bull The only down side is fewer df for estimating the variance

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 23: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

bull Pay attention to interactionsbull Main effects need to be interpreted very

carefully when there are interactionsbull In the KCT example K does not have a

significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over

both temperaturesbull Summarize K effects separately for low and

high temperaturesbull In the age at metamorphosis data on Assign 7

bull B and C do not have significant main effectsbull But based on significant interactions both B

and C do affect age at metamorphosis

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 24: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Account for Unbalanced Data

bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted

comparisonsbull For The Rose-Hellekant PCR data we could still be interested

in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment

bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS

bull There are also sound arguments for using Type II SS which we didnrsquot cover

bull Adjust effects of keyboard on pain according to how much the keyboards were used

bull Adjust comparisons of male and female mice to account for different mixes of young and old mice

bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 25: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are

separate independent pieces of information independent replicates

bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)

bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different

sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make

the next paper helicopter rather than flying this helicopter again

bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the

wider population of possible patients based on data from just two patients

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 26: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a

million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values

bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different

bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data

where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general

bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of

a keyboard comes out negative

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37
Page 27: Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you learn on one road can be really helpful on ... • Have a few things that you have

Cultivate Consideration Concern and Compassion

ldquoA little love and affection in everything you

Will make the world a better place with or without yourdquo

Neil Young Greendale

Stop to Smell the Flowers and Dream

ldquoYoursquove got to have a dream

If you donrsquot have a dream

How you going to have a dream come truerdquo

South Pacific

  • Slide Number 1
  • Slide Number 2
  • Slide Number 3
  • Slide Number 4
  • Slide Number 5
  • Slide Number 6
  • Slide Number 7
  • Slide Number 8
  • Slide Number 9
  • Consulting Advice
  • Slide Number 11
  • Slide Number 12
  • Slide Number 13
  • Slide Number 14
  • Computing
  • Slide Number 16
  • Slide Number 17
  • Slide Number 18
  • Slide Number 19
  • Slide Number 20
  • Slide Number 21
  • Slide Number 22
  • Slide Number 23
  • Slide Number 24
  • Slide Number 25
  • Slide Number 26
  • Slide Number 27
  • Slide Number 28
  • Slide Number 29
  • Slide Number 30
  • Slide Number 31
  • Slide Number 32
  • Slide Number 33
  • Slide Number 34
  • Slide Number 35
  • Slide Number 36
  • Slide Number 37