Describing Location in a Distribution (2.1)
Transcript of Describing Location in a Distribution (2.1)
Section 2.1 Notes - COMPLETED
DescribingLocationinaDistribution(2.1)
MeasuringPosition:Percentiles
Onewaytodescribethelocationofavalueinadistributionistotellwhatpercentofobservationsarelessthanit.
De#inition:
Thepthpercentileofadistributionisthevaluewithppercentoftheobservationslessthanit.
Section 2.1 Notes - COMPLETED
Thestemplotbelowshowsthenumberofwinsforeachofthe30MajorLeagueBaseballteamsin2009.5962455700455589803456677789123557103
Problem:Findthepercentilesforthefollowingteams:(a)TheColoradoRockies,whowon92games.
(b)TheNewYorkYankees,whowon103games.
Thestemplotbelowshowsthenumberofwinsforeachofthe30MajorLeagueBaseballteamsin2009.5962455700455589803456677789123557103
Problem:Findthefollowing-(c)ThepercentilefortheKansasCityRoyals,whowon65games.
(d)AMLBteamrepresentedthe75thpercentilefornumberofwins.Assumingnoteamhadthesamenumberofwinsthattheydid,approximatelyhowmanyteamshadmorewinsthantheydid?
Section 2.1 Notes - COMPLETED
Acumulativerelativefrequencygraph(orogive)displaysthecumulativerelativefrequencyofeachclassofafrequencydistribution.
Hereisatableshowingthedistributionofmedianhouseholdincomesforthe50statesandtheDistrictofColumbia.
Here is a table showing the distribution of median household incomes for the 50 states and the District of Columbia.
Construct a cumulative relative frequency graph (ogive) of this data set.
Section 2.1 Notes - COMPLETEDProblem:Usethecumulativerelativefrequencygraphforthestateincomedatatoanswereachquestion.(a)AtwhatpercentileisCalifornia,withamedianincomeof$57,445?
(b)Estimateandinterpretthe]irstquartileofthissolution.
(c)Arethereanystateswhosemedianincomecanbeconsideredanoutlier?
MeasuringPosition:z-Scores
Az-scoretellsushowmanystandarddeviationsfromthemeananobservationfalls,andinwhat
direction.
De#inition:
Ifxisanobservationfromadistributionthathasknownmeanandstandarddeviation,thestandardizedvalueofxis:
Astandardizedvalueisoftencalledaz-score.
Section 2.1 Notes - COMPLETED
In2009,themeannumberofwinsforteamsintheMLBwas81withastandarddeviationof11.4wins.
Problem:Findandinterpretthez-scoresforthefollowingteams.
(a)TheNewYorkYankees,with103wins.
(b)TheBaltimoreOrioles,with64wins.
Thesingle-seasonhomerunrecordformajorleaguebaseballhasbeensetjustthreetimessinceBabeRuthhit60homerunsin1927.RogerMarishit61in1961,MarkMcGwirehit70in1998andBarryBondshit73in2001.Inanabsolutesense,BarryBondshadthebestperformanceofthesefourplayers,sincehehitthemosthomerunsinasingleseason.However,inarelativesensethismaynotbetrue.Baseballhistorianssuggestthathittingahomerunhasbeeneasierinsomeerasthanothers.Thisisduetomanyfactors,includingqualityofbatters,qualityofpitchers,hardnessofthebaseball,dimensionsofballparks,andpossibleuseofperformance-enhancingdrugs.Tomakeafaircomparison,weshouldseehowtheseperformancesraterelativetoothershittersduringthesameyear.
Section 2.1 Notes - COMPLETED
Problem:Computethestandardizedscoresforeachperformance.Whichplayerhadthemostoutstandingperformancerelativetohispeers?
TransformingData(addingandsubtractingconstants)
Transformingconvertstheoriginalobservationsfromtheoriginalunitsofmeasurementstoanotherscale.Transformationscanaffecttheshape,center,andspreadofadistribution.
EXAMPLE:Recordyourguessforthewidthofthisroom(lengthofbluewall)onapost-itandbringuptothesmartpanel.
Hereareyourclassesguesses:
Createadotplotofthedataanddescribethedistributionofguesses.
Section 2.1 Notes - COMPLETED
Theactualwidthoftheroomis:
Determinetheerrorassociatedwitheachguessbysubtractingtheactualwidthfromeachguessandrecordhere:
Constructadotplotoftheerrorsanddescribetheirdistribution.
Inthisexampleweadded(anegative...)toeachdatapoint/observation.Whatdidthisdotothe
shape?
spread?
center?
Section 2.1 Notes - COMPLETED
Addingthesamenumbera(eitherpositive,zero,ornegative)toeachobservation:
• addsatomeasuresofcenterandlocation/position(mean,median,quartiles,percentiles),BUT
• doesNOTchangetheshapeofthedistributionormeasuresofspread(range,IQR,standarddeviation,variance)
TransformingData(multiplyinganddividingbyconstants)
EXAMPLE(con't):Supposewewantedtoconvertyourguessesforthewidthoftheroomfromfeettoinches.Multiplyallguessesfromthepreviousexampleby12in./ft.toconvertthedatatoinches.
Constructadotplotofthenewdataanddescribethedistributionofguesses(ininches).
Section 2.1 Notes - COMPLETED
Inthisexamplewemultipliedeachdatapoint/observationbythesameconstant.Whatdidthisdotothe
shape?
spread?
center?
Multiplying(ordividing)eachobservationbythesamenumberb(eitherpositive,zero,ornegative):
• multiplies(divides)measuresofcenterandlocation/position(mean,median,quartiles,percentiles)byb
• multiplies(divides)measuresofspreadandlocation/position(range,IQR,standarddeviation)by|b|(andvariancebyb2)
• doesNOTchangetheshapeofthedistribution
Section 2.1 Notes - COMPLETED
EXAMPLE:In2010,TaxiCabsinNewYorkCitychargedaninitialfeeof$2.50plus$2permile.Inequationform,fare=2.50+2(miles).Attheendofamonthabusinessmancollectsallofhistaxicabreceiptsandcalculatessomenumericalsummaries.Themeanfarehepaidwas$15.45withastandarddeviationof$10.20.Whatarethemeanandstandarddeviationofthelengthsofhiscabridesinmiles?
InChapter1,wedevelopedakitofgraphicalandnumericaltoolsfordescribingdistributions.Now,we’lladdonemoresteptothestrategy.
ExploringQuantitativeData
1.Alwaysplotyourdata:makeagraph.
2.Lookfortheoverallpattern(shape,center,andspread)andforstrikingdeparturessuchasoutliers.
3.Calculateanumericalsummary(summarystatistics)tobrie]lydescribecenterandspread.
4.Sometimestheoverallpatternofalargenumberofobservationsissoregularthatwecandescribeitbyasmoothcurve.
Section 2.1 Notes - COMPLETED
De#inition:
Adensitycurveisacurvethatisalwaysonorabovethehorizontalaxis,andhasareaexactly1underneathit.
Adensitycurvedescribestheoverallpatternofadistribution.Theareaunderthecurveandaboveanyintervalofvaluesonthehorizontalaxisistheproportionofallobservationsthatfallinthatinterval.
Thehistogrambelowshowsthedistributionofbattingaverage(proportionofhits)forthe432MajorLeagueBaseballplayerswithatleast100plateappearancesinthe2009season.Thesmoothcurveshowstheoverallshapeofthedistribution.
Inthe]irstgraphbelow,thebarsinredrepresenttheproportionofplayerswhohadbattingaveragesofatleast0.270.Thereare177suchplayersoutofatotalof432,foraproportionof0.410.Inthesecondgraphbelow,theareaunderthecurvetotherightof0.270isshaded.Thisareais0.391,only0.019awayfromtheactualproportionof0.410.