Chapter1
Statistics:thescienceofdata--collecting,classifying,summarizing,organizing,analyzing,presenting,and
interpreting.
coursera
CasesandVariablesWeobtaininformationaboutcases orunits.
Avariable isanycharacteristicthatisrecordedforeachcase.
Generallyeachcasemakesuparowinadataset,andeachvariablemakesupacolumn
CountriesoftheWorldCountry LandArea Population Density GDP Rural CO2 PumpP
rice Military
Afghanistan 652.86 30.552 46.8 665 74.1 35.3 1.28 8.65
Albania 27.40 2.897 105.7 4460 44.6 12.8 1.81
Algeria 2381.74 39.208 16.5 5361 30.5 24.6 0.29
AmericanSamoa 0.20 0.055 275.0 12.7
Andorra 0.47 0.079 168.1 13.8 9.5 1.67
Angola 1246.70 21.472 17.2 5783 57.5 44.8 0.63 13.81
AntiguaandBarbuda 0.4 0.090 204.5 13342 75.4 16.6
Argentina 2736.690 41.446 15.1 14715 8.5 16.9 1.46
Applyingthedefinition
Considertheabovedatasets–Whatarethecases?–Whatarethevariables?–Whatinterestingquestionscouldithelpyouanswer?
Quantitativevs.QualitativeData
• QuantitativeData:measurementsrecordedonanaturallyoccurringnumericalscale
• Categorical (orQualitative)Data:measurementsthatdividecasesintogroups(cannotbemeasuredonanumericalscale)
Istherenumericaldatathatisnotquantitative?
Datacanbeusedtoanswerinterestingquestions!
UsingDatatoAnsweraQuestion
1. Caneatingayogurtadaycauseyoutoloseweight?
2. Domalesfindfemalesmoreattractiveiftheywearred?
3. Doesloudermusiccausepeopletodrinkmorebeer?
4. Arelionsmorelikelytoattackafterafullmoon?
(theanswertoallofthesequestionsisyes!)
Variables
Foreachofthefollowingsituations:–Whatarethevariables?– Iseachvariablecategoricalorquantitative?
1. Caneatingayogurtadaycauseyoutoloseweight?
2. Domalesfindfemalesmoreattractiveiftheywearred?
3. Doesloudermusiccausepeopletodrinkmorebeer?
4. Arelionsmorelikelytoattackafterafullmoon?
ExplanatoryandResponseIfweareusingonevariabletohelpusunderstandorpredictvaluesofanothervariable,wecalltheformertheexplanatoryvariable andthelatter
theresponsevariable
Examples:• Doesmeditationhelpreducestress?• Doessugarconsumptionincreasehyperactivity?
Variables
Foreachofthefollowingsituations:–Whichistheexplanatoryandwhichistheresponsevariable?
1. Caneatingayogurtadaycauseyoutoloseweight?
2. Domalesfindfemalesmoreattractiveiftheywearred?
3. Doesloudermusiccausepeopletodrinkmorebeer?
4. Arelionsmorelikelytoattackafterafullmoon?
Summary• Dataareeverywhere,andpertaintoawidevarietyoftopics
• Adatasetisusuallycomprisedofvariablesmeasuredoncases
• Variablesareeithercategoricalorquantitative• Datacanbeusedtoprovideinformationaboutessentiallyanythingweareinterestedinandwanttocollectdataon!
SampleversusPopulation
Apopulation includesallindividualsorobjectsofinterest.
Asample isallthecasesthatwehavecollecteddataon(asubsetofthepopulation).
Statistical inference istheprocessofusingdatafromasampletogaininformationaboutthe
population.
Definitions• Population:allindividualsorobjectsofinterest.• Variable:acharacteristicofanindividualunit.• Sample:allthecasesthatwehavecollecteddataon(asubsetofthe
population).
Example:IwanttoestimatewhatproportionofUPstudentsareleft-handed.
• HowcouldIdothat?• Determinetheabove.
SamplingBias
• Samplingbiasoccurswhenthemethodofselectingasamplecausesthesampletodifferfromthepopulationinsomerelevantway.
• Ifsamplingbiasexists,wecannottrustgeneralizationsfromthesampletothepopulation
• PeopleareTERRIBLEatselectingagoodsample,evenwhenexplicitlytryingtoavoidsamplingbias!
RandomSampling
• Howcanwemakesuretoavoidsamplingbias?
• Imagineputtingthenamesofalltheunitsofthepopulationintoahat,anddrawingoutnamesatrandomtobeinthesample
• Moreoften,weusetechnology
TakeaRANDOM sample!
SimpleRandomSample
Inasimplerandomsample,eachunitofthepopulationhasthesamechanceofbeingselected,regardlessoftheother
unitschosenforthesample
*morecomplicatedrandomsamplingschemesexist
RealitiesofSampling• Whilearandomsampleisideal,oftenitisn’tfeasible.Alistoftheentirepopulationmaynotbeavailable,oritmaybeimpossibleortoodifficulttocontactallmembersofthepopulation.
• Sometimes,yourpopulationofinteresthastobealteredtosomethingmorefeasibletosamplefrom.Generalizationofresultsarelimitedtothepopulationthatwasactuallysampledfrom.
• Inpractice,thinkhardaboutpotentialsourcesofsamplingbias,andtryyourbesttoavoidthem
Non-RandomSamplesSupposeyouwanttoestimatetheaveragenumberofhoursthatstudentsspendstudyingeachweek.Whichofthefollowingisthebestmethodofsampling?a) Gotothelibraryandaskallthestudentsthere
howmuchtheystudyb) Emailallstudentsaskinghowmuchtheystudy,
anduseallthedatayougetc) Standonthequadandaskeveryonewalkingby
howmuchtheystudy
a) Gotothelibraryandaskallthestudentstherehowmuchtheystudy
Samplingunitsbasedonsomethingobviouslyrelatedtothevariable(s)youarestudying
BadMethodsofSampling
b)Emailallstudentsaskinghowmuchtheystudy,anduseallthedatayouget
•Lettingyoursamplebecomprisedofwhoeverchoosestoparticipate(volunteerbias)• Peoplewhochosetoparticipateorrespondareprobablynotrepresentativeoftheentirepopulation
BadMethodsofSampling
Alcohol,Marijuana,andDriving• TheFederalOfficeofRoadSafetyinAustraliaconductedastudyontheeffectsofalcoholandmarijuanaonperformance• Volunteerswhorespondedtoadvertisementsforthestudyonrockradiostationsweregivenarandomcombinationofthetwodrugs,thentheirperformancewasobserved–Whatisthesample?Whatisthepopulation?– Istheresamplingbias?–Willtheresultsbeinformativeand/ordoyouthinkthestudyisworthconducting?
Source:Chesher,G.,Dauncey,H.,Crawford,J.andHorn,K,“TheInteractionbetweenAlcoholandMarijuana:ADoseDependentStudyontheEffectsofHumanMoodsandPerformanceSkills,”ReportNo.C40,FederalOfficeofRoadSafety,FederalDepartmentofTransport,Australia,1986.
OtherFormsofBias• Evenwitharandomsample,datacanstillbebiased,especiallywhencollectedonhumans• Otherformsofbiastowatchoutforindatacollection:– Questionwording– Context– Inaccurateresponses
Manyotherpossibilities– examinethespecificsofeachstudy!
QuestionWording• Arandomsamplewasasked:“Shouldtherebeataxcut,orshouldmoneybeusedtofundnewgovernmentprograms?”
• Adifferentrandomsamplewasasked:“Shouldtherebeataxcut,orshouldmoneybespentonprogramsforeducation,theenvironment,healthcare,crime-fighting,andmilitarydefense?”
TaxCut:60% Programs:40%
TaxCut:22% Programs:78%
InaccurateResponses• InastudyonUSstudents,93%ofthesamplesaidtheywereinthetophalfofthesampleregardingdrivingskillSvenson,O.(February1981)."Arewealllessriskyandmoreskillfulthanourfellowdrivers?" Acta Psychologica 47 (2):143–148.
• FromrandomsampleofallUScollegestudents,22.7%reportedusingillicitdrugs.Doyouthinkthisnumberisaccurate?SubstanceAbuseandMentalHealthServicesAdministration(2010).“Resultsfromthe2009NationalSurveyonDrugUseandHealth:Volume1.”SummaryofNationalFindings(OfficeofAppliedStudies,NSDUHSeriesH-38A,HHSPublicationNo.SMA10-4856Findings).Rockville,MD,heeps://nsduhweb.rti.org/
Summary
Alwaysthinkcriticallyabouthowthedatawerecollected,andrecognizethatnotall
formsofdatacollectionleadtovalidinferences
� Thisistheeasiestwaytoinstantlybecomeamorestatisticallyliterateindividual!
AssociationandCausationTwovariablesareassociated ifvaluesofonevariabletendtoberelatedtovalues
oftheothervariable
Twovariablesarecausallyassociatedifchangingthevalueoftheexplanatoryvariableinfluencesthevalueofthe
responsevariable
Explanatory,Response,Causation
Foreachofthefollowingheadlines:– Identifytheexplanatoryandresponsevariables(ifappropriate).– Doestheheadlineimplyacausal association?
1. “DailyExerciseImprovesMentalPerformance”
2. “Wanttoloseweight?Eatmorefiber!”
3. “Catownerstendtobemoreeducatedthandogowners”
0 200 400 600 800 1000
4050
6070
80
TV and Life Expectancy
TVs per 1000 People
Life
Exp
ecta
ncy
Angola
Australia
Cambodia
Canada
ChinaEgypt
France
Haiti
Iraq
Japan
Madagascar
Mexico
Morocco
Pakistan
Russia
South Africa
Sri Lanka
Uganda
United KingdomUnited States
Vietnam
Yemen
r = 0.74
TVsandLifeExpectancy
ShouldyoubuymoreTVstolivelonger?
Associationdoesnotimplycausation!
ConfoundingVariableAthirdvariablethatisassociatedwithboththeexplanatoryvariableandtheresponsevariable
iscalledaconfoundingvariable
• Aconfoundingvariablecanofferaplausibleexplanationforanassociationbetweentheexplanatoryandresponsevariables
• Wheneverconfoundingvariablesarepresent(ormaybepresent),acausalassociationcannotbedetermined
ConfoundingVariable
Foreachofthefollowingrelationships,identifyapossibleconfoundingvariable:1. Moreicecreamsaleshavebeenlinkedtomoredeathsbydrowning.
2. Thetotalamountofbeefconsumedandthetotalamountofporkconsumedworldwidearecloselyrelatedoverthepast100years.
3. Peoplewhoownayachtaremorelikelytobuyasportscar.
4. Airpollutionishigherinplaceswithahigherproportionofpavedgroundrelativetograssyground.
5. Peoplewithshorterhairtendtobetaller.
Experimentvs ObservationalStudyAnobservationalstudy isastudyinwhichtheresearcherdoesnotactivelycontrolthevalueofanyvariable,butsimply
observesthevaluesastheynaturallyexist
Anexperiment isastudyinwhichtheresearcheractivelycontrolsoneormoreof
theexplanatoryvariables
ObservationalStudies• Therearealmostalwaysconfoundingvariablesinobservationalstudies
• ObservationalstudiescanalmostneverbeusedtoestablishcausationObservationalstudiescanalmostnever
beusedtoestablishcausationObservationalstudiescanalmostneverbeusedto
establishcausation
KindergartenandCrime• DoesKindergartenLeadtoCrime?• Yes,accordingtoresearchconductedbyNewHampshirestate
legislatureBobKingsbury• “Kingsbury(R-Laconia),86,recentlyclaimedthatanalyseshe’sbeen
carryingoutsince1996showthatcommunitiesinhisstatethathavekindergartenprogramshaveupto400%morecrimethanlocalitieswhoseclassroomsarefreeoffinger-painting5-year-olds.PointingtohishometownofLaconia,thelargestof10communitiesinBelknapCounty,thelegislatornotedthatithastheonlykindergartenprograminthecountyandthemostcrime,includingmostorallofthecounty’srapes,robberies,assaultsandmurders.”
Szalavitz,M.“DoesKindergartenLeadtoCrime?Fact-CheckingN.H.Legislator’s`Research’,”healthland.time.com,7/6/12.
TexasGOPPlatform• Afewdayslater,theTexasGOP2012Platformannouncedthatitopposedearlychildhoodeducation
• Causationorjustassociation?
Source:Strauss,V.“TexasGOPrejects‘criticalthinking’skills.Really.”www.washingtonpost.com,7/9/12.
http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.htmlDatafromFacebook andBloomberg
http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.htmlDatafromUSSocialSecurityAdministrationandNationalHousingFinanceAgency
It’saCommonMistake!
“Theinvalidassumptionthatcorrelationimpliescauseisprobablyamongthetwoorthreemostseriousandcommonerrorsofhumanreasoning.”
- StephenJayGould
http://xkcd.com/552/
Randomization
• Howcanwemakesuretoavoidconfoundingvariables?
RANDOMLYassignvaluesoftheexplanatory
variable
RandomizedExperiment
Inarandomizedexperiment theexplanatoryvariableforeachunitisdeterminedrandomly,beforetheresponsevariableismeasured
RandomizedExperiment• Thedifferentlevelsoftheexplanatoryvariableareknownastreatments
• Randomlydividetheunitsintogroups,andrandomlyassignadifferenttreatmenttoeachgroup
• Ifthetreatmentsarerandomlyassigned,thetreatmentgroupsshouldalllooksimilar
RandomizedExperiments• Becausetheexplanatoryvariableisrandomlyassigned,itisnotassociatedwithanyothervariables.Confoundingvariablesareeliminated!!!
ExplanatoryVariable
ResponseVariable
ConfoundingVariable
RANDOMIZEDEXPERIMENT
RandomizedExperiments
• Ifarandomizedexperimentyieldsasignificantassociationbetweenthetwovariables,wecanestablishcausationfromtheexplanatorytotheresponsevariable
Randomizedexperimentsareverypowerful!Theyallowyoutoinfercausality.
ExerciseandtheBrain• Astudyfoundthatelderlypeoplewhowalkedatleastamileadayhadsignificantlyhigherbrainvolume(graymatterrelatedtoreasoning)andsignificantlylowerratesofAlzheimer’sanddementiacomparedtothosewhowalkedless
• Thearticlestates:“Walkingaboutamileadaycanincreasethesizeofyourgraymatter,andgreatlydecreasethechancesofdevelopingAlzheimer'sdiseaseordementiainolderadults,anewstudysuggests.”
• Isthisconclusionvalid?Allen,N.“OnewaytowardoffAlzheimer’s:TakeaHike,”msnbc.com,10/13/10.
No. Observational study – cannot yield causal conclusions.
ExerciseandtheBrain
• Howwouldyoudesignanexperimenttodeterminewhetherexerciseactuallycauses changesinthebrain?
ExerciseandtheBrain• Asampleofmiceweredividedrandomly intotwogroups.Onegroupwasgivenaccesstoanexercisewheel,theothergroupwaskeptsedentary
• “Thebrainsofmiceandratsthatwereallowedtorunonwheelspulsedwithvigorous,newlybornneurons,andthoseanimalsthenbreezedthroughmazesandothertestsofrodentIQ”comparedtothesedentarymice
• IsthisevidencethatexercisecausesanincreaseinbrainactivityandIQ,atleastinmice?
Reynolds,“PhysEd:YourBrainonExercise",NYTimes,July7,2010.Yes. Randomized experiment– can yield causal conclusions.
Let’sTryIt!
• Isjust5secondsofexerciseenoughtoincreaseyourpulserate?
• Treatmentgroups:exerciseversussedentary• Randomlydividetheclassintothetwogroups• Givethetreatment• Measuretheresponse(pulserate)• We’lllearnhowtoanalyzethislater…
KneeSurgeryforArthritisResearchersconductedastudyontheeffectivenessofakneesurgerytocurearthritis.Itwasrandomlydeterminedwhetherpeoplegotthekneesurgery.Everyonewhounderwentthesurgeryreportedfeelinglesspain.Isthisevidencethatthesurgerycausesadecreaseinpain?
No. Need a control or comparison group. What would happen without surgery?
ControlGroup•Whendeterminingwhetheratreatmentiseffective,itisimportanttohaveacomparisongroup,knownasthecontrolgroup• Itisn’tenoughtoknowthateveryoneinonegroupimproved,weneedtoknowwhethertheyimprovedmorethantheywouldhaveimprovedwithoutthesurgery• Allrandomizedexperimentsneedeitheracontrolgroup,ortwodifferenttreatmentstocompare
KneeSurgeryforArthritis• Inthekneesurgerystudy,thoseinthecontrolgroupreceivedafakekneesurgery.Theywereputunderandcutopen,butthedoctordidnotactuallyperformthesurgery.Allofthesepatientsalsoreportedlesspain!• Infact,theimprovementwasindistinguishablebetweenthosereceivingtherealsurgeryandthosereceivingthefakesurgery!
Source:“ThePlaceboPrescription,”NYTimesMagazine,1/9/00.
PlaceboEffect• Often,peoplewillexperiencetheeffecttheythinktheyshouldbeexperiencing,eveniftheyaren’tactuallyreceivingthetreatment
• Example:Eurotrip
• Thisisknownastheplaceboeffect• Onestudyestimatedthat75%oftheeffectivenessofanti-depressantmedicationisduetotheplaceboeffect
• Formoreinformationontheplaceboeffect(it’sprettyamazing!)readThePlaceboPrescription
PlaceboandBlinding• Controlgroupsshouldbegivenaplacebo,afaketreatmentthatresemblestheactivetreatmentasmuchaspossible•Usingaplaceboisonlyhelpfulifparticipantsdonotknowwhethertheyaregettingtheplaceboortheactivetreatment• Ifpossible,randomizedexperimentsshouldbedouble-blinded:neithertheparticipantsortheresearchersinvolvedshouldknowwhichtreatmentthepatientsareactuallygetting
TypesofRandomizedExperiments
• Randomizingcasesintodifferenttreatmentgroupsiscalledarandomizedcomparativeexperiment
• Wecanalsogiveeachtreatmenttoeachcase,andjustrandomizetheorder inwhichtreatmentsarereceived:matchedpairsexperiment
• Eitherarevalidrandomizedexperiments!
MatchedPairs
Example:Toseeifpeoplereadfasteronpaperorakindle,astudywasdoneinwhich16peoplereadtwosetsofinstructionsofsimilarlength,oneonakindleandoneonpaper.Theorderinwhichtheyreadtheinstructionswasrandomized.(Readingwasfasteronpaper.)
Whynotalwaysrandomize?
• Randomizedexperimentsareideal,butsometimesnotethicalorpossible
• Often,youhavetodothebestyoucanwithdatafromobservationalstudies
• Example:researchfortheSupremeCourtcaseastowhetherpreferencesforminoritiesinuniversityadmissionshelpsorhurtstheminoritystudents
Wasthesamplerandomlyselected?
Possibletogeneralizetothepopulation
Yes
Shouldnotgeneralizeto
thepopulation
No
Wastheexplanatoryvariablerandomly
assigned?
Possibletomake
conclusionsaboutcausality
Yes
Cannotmakeconclusions
aboutcausality
No
RandomizationinDataCollection
DATA
TwoFundamentalQuestions inDataCollection
PopulationSample
Randomsample???
Randomizedexperiment???
Randomization• Doingarandomizedexperimentonarandomsampleisideal,butrarelyachievable
• Ifthefocusofthestudyisusingasampletoestimateastatisticfortheentirepopulation,youneedarandomsample,butdonotneedarandomizedexperiment(example:electionpolling)
• Ifthefocusofthestudyisestablishingcausalityfromonevariabletoanother,youneedarandomizedexperimentandcansettleforanon-randomsample(example:drugtesting)
Summary(1.3)• Associationdoesnotimplycausation!• Inobservationalstudies,confoundingvariablesalmostalwaysexist,socausationcannotbeestablished
• Randomizedexperimentsinvolverandomlydeterminingtheleveloftheexplanatoryvariable
• Randomizedexperimentspreventconfoundingvariables,socausalitycanbeinferred
• Acontrolorcomparisongroupisnecessary• Theplaceboeffectexists,soaplaceboandblindingshouldbeused
Top Related