8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. ·...

55
1. Two-way tables. 2. Histograms. 3. mean, median, IQR, z score. 4. skew. 5. Boxplots. 6. Scatterplots and correlation. 7. Regression. 8. Goodness of fit in regression. 9. Common regression pitfalls.

Transcript of 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. ·...

Page 1: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

1.Two-waytables.2.Histograms.3.mean,median,IQR,zscore.4.skew.5.Boxplots.6.Scatterplotsandcorrelation.7.Regression.8.Goodnessoffitinregression.9.Commonregressionpitfalls.

Page 2: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Simpledatasummaries• Forcategoricaldata,two-waytablescanbeuseful.

• Forquantitativedata,histogramsareuseful.

Page 3: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

• Forarelativefrequencyhistogram,thepercentageofpeopleinthebinisshownratherthanthewholenumber.

• Here,n=25.0.2=20%ofpeopleinthesamplehad3quarts.Thenumberofpeoplewith3quartswas0.2x25=5.• Thesizesofthebinscanbeadjustedandthelookofthehistogramcanbeinfluencedbythebinsizes.

Page 4: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

• Withhistograms,lookforsymmetry,skew,bimodality,andoutliers.

• Therange=maximumobservedvalue– minimum.• Forroughlysymmetricdata,themeanandsd aregoodsummariesofthecenterandspread.• Whenthedataareskewedorthereareseriousoutliers,themedianandtheIQRcanbepreferable.

Page 5: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

3.mean,median,IQR,zscore.• Themedianisthemiddleinthesortedlistofvalues.ItisavalueMwhere50%oftheobservationsare≤M.Differentsoftwareusedifferentconventions,butwewillusetheconventionthat,ifthereisarangeofpossiblemedians,youtakethemiddleofthatrange.

• Forexample,supposedataare1,3,7,7,8,9,12,14.• M=7.5.• Suppose25%oftheobservationsliebelowacertainvaluex.Thenxiscalledthelowerquartile (or25th percentile).

• Similarly,if25%oftheobservationsaregreaterthanx,thenxiscalledtheupperquartile (or75th percentile).

• ThelowerquartilecanbecalculatedbyfindingthemedianM,andthendeterminingthemedianofthevaluesbelowM.SimilarlytheupperquartileisthemedianofthevaluesgreaterthanM.

Page 6: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

IQRandFive-NumberSummary• Thedifferencebetweenthequartilesiscalledtheinter-quartilerange (IQR),anothermeasureofvariabilityalongwithstandarddeviation.• Thefive-numbersummary forthedistributionofaquantitativevariableconsistsoftheminimum,lowerquartile,median,upperquartile,andmaximum.• TechnicallytheIQRisnottheinterval(25thpercentile,75thpercentile),butthedifference75th percentile– 25th .• Supposedataare1,3,7,7,8,9,12,14.• M=7.5,25th percentile=5,75th percentile=10.5.IQR=5.5.

Page 7: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

zscore.• Manydatasetsareroughlysymmetricandthehistogramsomewhatresemblesthenormalcurve.

• Forsuchdata,about2/3ofobservationsarewithin1SDofthemean,andabout95%arewithin2SDsofthemean.• Itcanbeusefultoconvertvaluestozscores.Thissimplymeanstakingavaluexandstandardizingitbysubtractingthesamplemeanandthendividingbys.• IQ.mean=100,s=15.Ifx=125,thenz=(125-100)/15=1.33.• About95%ofz-scoresarebetween-2and2,fornormaldata.

Page 8: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

• 4.Skew.• Themeanandsd areveryinfluencedbyoutliersandskew.However,themedianandIQRaremuchmoreresistanttooutliersandskew.InmanycasesthemedianandIQRwillnotchangeatallbytheadditionofoneortwohugeoutliers.

• Forrightskeweddata,mean>median.Forleftskeweddata,mean<median.

Page 9: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

GeyserEruptionsExample6.1

Page 10: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

OldFaithfulInter-EruptionTimes

• Howdothefive-numbersummaryandIQRdifferforinter-eruptiontimesbetween1978and2003?

Page 11: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

OldFaithfulInter-EruptionTimes

• 1978IQR=81– 58=23• 2003IQR=98– 87=11

Page 12: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

5.Boxplots.

Min Qlower Med Qupper Max

Page 13: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

outliersonboxplots.• Adatavaluethatismorethan1.5× IQRabovetheupperquartileorbelowthelowerquartileisconsideredanoutlier.

• Whentheseoccur,thewhiskersonaboxplotextendouttothefarthestvaluenotconsideredanoutlierandoutliersarerepresentedbyadotoranasterisk.

Page 14: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

CancerPamphletReadingLevels

• Shortetal.(1995)comparedreadinglevelsofcancelpatientsandreadabilitylevelsofcancerpamphlets.Whatisthe:• Medianreadinglevel?• Meanreadinglevel?

• Arethedataskewedonewayortheother?

Page 15: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

• Skewedabittotheright• Meantotherightofmedian

Page 16: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

6.ScatterplotsandCorrelation

Time 30 41 41 43 47 48 51 54 54 56 56 56 57 58

Score 100 84 94 90 88 99 85 84 94 100 65 64 65 89

Time 58 60 61 61 62 63 64 66 66 69 72 78 79

Score 83 85 86 92 74 73 75 53 91 85 62 68 72

Supposewecollecteddataontherelationshipbetweenthetimeittakesastudenttotakeatestandtheresultingscore.

Page 17: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Scatterplot

Putexplanatoryvariableonthehorizontalaxis.

Putresponsevariableontheverticalaxis.

Page 18: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

DescribingScatterplots•Whenwedescribedatainascatterplot,wedescribethe• Direction(positiveornegative)• Form(linearornot)• Strength(strong-moderate-weak,wewillletcorrelationhelpusdecide)• UnusualObservations

Page 19: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Correlation• Correlationmeasuresthestrengthanddirectionofalinear associationbetweentwoquantitative variables.• Correlationisanumberbetween-1and1.• Withpositivecorrelationonevariableincreases,onaverage,astheotherincreases.• Withnegativecorrelationonevariabledecreases,onaverage,astheotherincreases.• Thecloseritistoeither-1or1thecloserthepointsfittoaline.• Thecorrelationforthetestdatais-0.56.

Page 20: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

CorrelationGuidelinesCorrelationValue Strengthof

AssociationWhatthismeans

0.7to1.0 Strong Thepointswillappeartobenearlyastraightline

0.3to0.7 Moderate Whenlookingatthegraphtheincreasing/decreasingpatternwillbeclear,but thereisconsiderablescatter.

0.1to0.3 Weak Withsomeeffortyouwillbeabletoseeaslightlyincreasing/decreasingpattern

0to0.1 None Nodiscernibleincreasing/decreasingpattern

Same StrengthResultswithNegativeCorrelations

Page 21: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

BacktothetestdataActuallythelastthreepeopletofinishthetesthadscoresof93,93,and97.

Whatdoesthisdotothecorrelation?

Page 22: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

InfluentialObservations• Thecorrelationchangedfrom-0.56(afairlymoderatenegativecorrelation)to-0.12(aweaknegativecorrelation).• Pointsthatarefartotheleftorrightandnotintheoveralldirectionofthescatterplotcangreatlychangethecorrelation.(influentialobservations)

Page 23: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Correlation• Correlationmeasuresthestrengthanddirectionofalinear associationbetweentwoquantitativevariables.• -1< r< 1• Correlationmakesnodistinctionbetweenexplanatoryandresponsevariables.• Correlationhasnounits.Anylinearchangeinunitsofxory,meaningaddingandmultiplyingeveryobservationbysomenumber,doesnotinfluencer.• Correlationisnotresistanttooutliers.Itissensitive.

Page 24: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Calculatingcorrelation,r.

ρ =rho=correlationofthepopulation.SupposethereareNpeopleinthepopulation,X=temperature,Y=heartrate,themeanandsd oftempinthepop.areµ" and𝜎" ,andthepop.meanandsd ofheartrateareµ$and𝜎$.

ρ =*+∑ "-./0

10+23*

$-./414

.

Givenasampleofsizen,weestimateρ using

r= *5.*

∑"-.060

523*

$-.464

.

24

Page 25: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

TemperatureandHeartRate

Tmp 98.3 98.2 98.7 98.5 97.0 98.8 98.5 98.7 99.3 97.8HR 72 69 72 71 80 81 68 82 68 65Tmp 98.2 99.9 98.6 98.6 97.8 98.4 98.7 97.4 96.7 98.0HR 71 79 86 82 58 84 73 57 62 89

Page 26: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

TemperatureandHeartRate

r=0.378

Page 27: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

HeartRateandBodyTemp• Anothergroupstudiedtherelationshipbetweenheartrateandbodytemperaturewith130healthyadults• PredictedHeartRate = −166.3 + 2.44 Temp• r=0.257

Page 28: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

• Fittingalinetoascatterplot.• Unlessthepointsareperfectlylinearlyalligned,therewillnotbeasinglelinethatgoesthrougheverypoint.• Wewantalinethatgetsascloseaspossibletoallthepoints.

7.Regression.

Page 29: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Regression.• Wewantalinethatminimizestheverticaldistancesbetweenthelineandthepoints• Thesedistancesarecalledresiduals.• Thelinewewillfindactuallyminimizesthesumofthesquaresoftheresiduals.• Thisiscalledaleast-squaresregressionline.

Page 30: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

AreDinnerPlatesGettingLarger?

Page 31: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

GrowingPlates?• TherearemanyrecentarticlesandTVreportsabouttheobesityproblem.• Onereasonsomehavegivenisthatthesizeofdinnerplatesareincreasing.• Aretheseblackcirclesthesamesize,orisonelargerthantheother?

Page 32: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

GrowingPlates?• Theyappeartobethesamesizeformany,buttheoneontherightisabout20%largerthantheleft.

• Thissuggeststhatpeoplewillputmorefoodonlargerdinnerplateswithoutknowingit.

• Thereisnameforthisphenomenon:Delboeufillusion

Page 33: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

GrowingPlates?• Researchersgathereddatatoinvestigatetheclaimthatdinnerplatesaregrowing• Americandinnerplatessoldonebay onMarch30,2010(VanIttersum andWansink,2011)• Yearmanufacturedanddiameteraregiven.

Page 34: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

GrowingPlates?• Bothyear(explanatoryvariable)anddiameterininches(responsevariable)arequantitative.• Eachdotrepresentsoneplateinthisscatterplot.

Page 35: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

GrowingPlates?• Theassociationappearstoberoughlylinear• Theleastsquaresregressionlineisadded.

Page 36: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

RegressionLineTheregressionequationis𝑦K = 𝑎 + 𝑏𝑥:• a isthey-intercept• b istheslope• x isavalueoftheexplanatoryvariable• ŷ isthepredictedvaluefortheresponsevariable

• Foraspecificvalueofx,thecorrespondingdistancey − 𝑦K (oractual– predicted)isaresidual

Page 37: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

RegressionLine• Theleastsquareslineforthedinnerplatedatais𝑦K = −14.8 + 0.0128𝑥• OrdiameterR = −14.8 + 0.0128(year)• Thisallowsustopredictplatediameterforaparticularyear.

Page 38: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Slope𝑦K = −14.8 + 0.0128𝑥

• Whatisthepredicteddiameterforaplatemanufacturedin2000?• -14.8+0.0128(2000)=10.8in.

• Whatisthepredicteddiameterforaplatemanufacturedin2001?• -14.8+0.0128(2001)=10.8128in.

• Howdoesthiscomparetoourpredictionfortheyear2000?• 0.0128larger

• Slopeb =0.0128meansthatusingtheregressionline,diametersarepredictedtoincreaseby0.0128inchesperyearonaverage

Page 39: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Slope• Slopeisthepredictedchangeintheresponsevariableforone-unitchangeintheexplanatoryvariable.• Boththeslopeandthecorrelationcoefficientforthisstudywerepositive.• Theslopeis0.0128• Thecorrelationis0.604

• Theslopeandcorrelationcoefficientwillalwayshavethesamesign.

Page 40: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

y-intercept• They-interceptiswheretheregressionlinecrossesthey-axisorthepredictedresponsewhentheexplanatoryvariableequals0.• Wehaday-interceptof-14.8inthedinnerplateequation.Whatdoesthistellusaboutourdinnerplateexample?• Dinnerplatesinyear0were-14.8inches.

• Howcanitbenegative?• Theequationworkswellwithintherangeofvaluesgivenfortheexplanatoryvariable,butfailsoutsidethatrange.

• Ourequationshouldonlybeusedtopredictthesizeofdinnerplatesfromabout1950to2010.

Page 41: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Extrapolation• Predictingvaluesfortheresponsevariableforvaluesoftheexplanatoryvariablethatareoutsideoftherangeoftheoriginaldataiscalledextrapolation.

Page 42: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

r2

• Whiletheinterceptandslopehavemeaninginthecontextofyearanddiameter,rememberthatthecorrelationdoesnothaveunits.Itisjust0.604.• Thesquareofthecorrelationr2 hasmeaningintermsofvariation.• r2 =0.6042=0.365or36.5%• 36.5%ofthevariationinplatesize(theresponsevariable)canbeexplainedbyitslinearassociationwiththeyear(theexplanatoryvariable).

Page 43: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Slopeofregressionline.• Suppose𝑦K =a+bx istheregressionline.

• Theslopeboftheregressionlineisb=r6460 .

Thisisusuallythethingofprimaryinteresttointerpret,asthepredictedincreaseinyforeveryunitincreaseinx.• Bewareofassumingcausationthough,esp.withobservationalstudies.Bewaryofextrapolationtoo.

• Theintercepta=$ - b " .

• TheSDoftheresidualsis 1 − 𝑟V� 𝑠$.Thisisagoodestimateofhowmuchtheregression

predictionswilltypicallybeoffby.

Page 44: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

8.Howwelldoesthelinefit?• 𝑟V isameasureoffit.Itindicatestheamountofscatteraroundthebestfittingline.• Residualplotscanindicatecurvature,outliers,orheteroskedasticity.

• 1 − 𝑟V� 𝑠$isaveryusefulsummary. Itisameasure• ofhowfaroffpredictionswouldhavebeenonaverage.

-2 -1 0 1 2 3 4 5

-4-2

02

46

8

x

y

-2 -1 0 1 2 3 4 5-10

-8-6

-4-2

02

x

residual

Page 45: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

• Heteroskedasticity:whenthevariabilityinyisnotconstantasxvaries.

Page 46: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Howwelldoesthelinefit?• 𝑟V isameasureoffit.Itindicatestheamountofscatteraroundthebestfittingline.• Residualplotscanindicatecurvature,outliers,orheteroskedasticity.

• 1 − 𝑟V� 𝑠$isusefulasameasureofhowfaroffpredictionswouldhavebeenonaverage.

Notethatregressionresidualshavemeanzero,whetherthelinefitswellorpoorly.

Page 47: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

9.commonproblemswithregression.

• a.Correlationisnotcausation.Especiallywithobservationaldata.

Page 48: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Commonproblemswithregression.

Page 49: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Commonproblemswithregression.Holmes and Willett (2004) reviewed all prospective studies on fat consumption and breast cancer with at least 200 cases of breast cancer. "Not one study reported a significant positive association with total fat intake.... Overall, no association was observed between intake of total, saturated, monounsaturated, or polyunsaturated fat and risk for breast cancer."

They also state "The dietary fat hypothesis is largely based on the observation that national per capita fat consumption is highly correlated with breast cancer mortality rates. However, per capita fat consumption is highly correlated with economic development. Also, low parity and late age at first birth, greater body fat, and lower levels of physical activity are more prevalent in Western countries, and would be expected to confound the association with dietary fat."

Page 50: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Commonproblemswithregression.• b.Extrapolation.

Page 51: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Commonproblemswithregression.• b.Extrapolation.• Oftenresearchersextrapolatefromhighdosestolow.

Page 52: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Commonproblemswithregression.• b.Extrapolation.Therelationshipcanbenonlinearthough.Researchersalsooftenextrapolatefromanimalstohumans.

Zaichkina etal.(2004)onhamsters

Page 53: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Commonproblemswithregression.• c.Curvature.Thebestfittinglinemightfitpoorly.Portetal.(2005).

Page 54: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Commonproblemswithregression.• c.Curvature.Thebestfittinglinemightfitpoorly.Wongetal.(2011).

Page 55: 8. Goodness of fit in regression. - UCLA Statisticsfrederic/10/sum19/day02.pdf · 2019. 8. 7. · Goodness of fit in regression. 9. Common regression pitfalls. Simple data summaries

Commonproblemswithregression.• d.Statisticalsignificance.Couldtheobservedcorrelationjustbeduetochancealone?