Describing Location in a Distribution (2.1)

11
Section 2.1 Notes - COMPLETED Describing Location in a Distribution (2.1) Measuring Position: Percentiles One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. De#inition: The pth percentile of a distribution is the value with p percent of the observations less than it.

Transcript of Describing Location in a Distribution (2.1)

Page 1: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

DescribingLocationinaDistribution(2.1)

MeasuringPosition:Percentiles

Onewaytodescribethelocationofavalueinadistributionistotellwhatpercentofobservationsarelessthanit.

De#inition:

Thepthpercentileofadistributionisthevaluewithppercentoftheobservationslessthanit.

Page 2: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

Thestemplotbelowshowsthenumberofwinsforeachofthe30MajorLeagueBaseballteamsin2009.5962455700455589803456677789123557103

Problem:Findthepercentilesforthefollowingteams:(a)TheColoradoRockies,whowon92games.

(b)TheNewYorkYankees,whowon103games.

Thestemplotbelowshowsthenumberofwinsforeachofthe30MajorLeagueBaseballteamsin2009.5962455700455589803456677789123557103

Problem:Findthefollowing-(c)ThepercentilefortheKansasCityRoyals,whowon65games.

(d)AMLBteamrepresentedthe75thpercentilefornumberofwins.Assumingnoteamhadthesamenumberofwinsthattheydid,approximatelyhowmanyteamshadmorewinsthantheydid?

Page 3: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

Acumulativerelativefrequencygraph(orogive)displaysthecumulativerelativefrequencyofeachclassofafrequencydistribution.

Hereisatableshowingthedistributionofmedianhouseholdincomesforthe50statesandtheDistrictofColumbia.

Here is a table showing the distribution of median household incomes for the 50 states and the District of Columbia.

Construct a cumulative relative frequency graph (ogive) of this data set.

Page 4: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETEDProblem:Usethecumulativerelativefrequencygraphforthestateincomedatatoanswereachquestion.(a)AtwhatpercentileisCalifornia,withamedianincomeof$57,445?

(b)Estimateandinterpretthe]irstquartileofthissolution.

(c)Arethereanystateswhosemedianincomecanbeconsideredanoutlier?

MeasuringPosition:z-Scores

Az-scoretellsushowmanystandarddeviationsfromthemeananobservationfalls,andinwhat

direction.

De#inition:

Ifxisanobservationfromadistributionthathasknownmeanandstandarddeviation,thestandardizedvalueofxis:

Astandardizedvalueisoftencalledaz-score.

Page 5: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

In2009,themeannumberofwinsforteamsintheMLBwas81withastandarddeviationof11.4wins.

Problem:Findandinterpretthez-scoresforthefollowingteams.

(a)TheNewYorkYankees,with103wins.

(b)TheBaltimoreOrioles,with64wins.

Thesingle-seasonhomerunrecordformajorleaguebaseballhasbeensetjustthreetimessinceBabeRuthhit60homerunsin1927.RogerMarishit61in1961,MarkMcGwirehit70in1998andBarryBondshit73in2001.Inanabsolutesense,BarryBondshadthebestperformanceofthesefourplayers,sincehehitthemosthomerunsinasingleseason.However,inarelativesensethismaynotbetrue.Baseballhistorianssuggestthathittingahomerunhasbeeneasierinsomeerasthanothers.Thisisduetomanyfactors,includingqualityofbatters,qualityofpitchers,hardnessofthebaseball,dimensionsofballparks,andpossibleuseofperformance-enhancingdrugs.Tomakeafaircomparison,weshouldseehowtheseperformancesraterelativetoothershittersduringthesameyear.

Page 6: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

Problem:Computethestandardizedscoresforeachperformance.Whichplayerhadthemostoutstandingperformancerelativetohispeers?

TransformingData(addingandsubtractingconstants)

Transformingconvertstheoriginalobservationsfromtheoriginalunitsofmeasurementstoanotherscale.Transformationscanaffecttheshape,center,andspreadofadistribution.

EXAMPLE:Recordyourguessforthewidthofthisroom(lengthofbluewall)onapost-itandbringuptothesmartpanel.

Hereareyourclassesguesses:

Createadotplotofthedataanddescribethedistributionofguesses.

Page 7: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

Theactualwidthoftheroomis:

Determinetheerrorassociatedwitheachguessbysubtractingtheactualwidthfromeachguessandrecordhere:

Constructadotplotoftheerrorsanddescribetheirdistribution.

Inthisexampleweadded(anegative...)toeachdatapoint/observation.Whatdidthisdotothe

shape?

spread?

center?

Page 8: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

Addingthesamenumbera(eitherpositive,zero,ornegative)toeachobservation:

• addsatomeasuresofcenterandlocation/position(mean,median,quartiles,percentiles),BUT

• doesNOTchangetheshapeofthedistributionormeasuresofspread(range,IQR,standarddeviation,variance)

TransformingData(multiplyinganddividingbyconstants)

EXAMPLE(con't):Supposewewantedtoconvertyourguessesforthewidthoftheroomfromfeettoinches.Multiplyallguessesfromthepreviousexampleby12in./ft.toconvertthedatatoinches.

Constructadotplotofthenewdataanddescribethedistributionofguesses(ininches).

Page 9: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

Inthisexamplewemultipliedeachdatapoint/observationbythesameconstant.Whatdidthisdotothe

shape?

spread?

center?

Multiplying(ordividing)eachobservationbythesamenumberb(eitherpositive,zero,ornegative):

• multiplies(divides)measuresofcenterandlocation/position(mean,median,quartiles,percentiles)byb

• multiplies(divides)measuresofspreadandlocation/position(range,IQR,standarddeviation)by|b|(andvariancebyb2)

• doesNOTchangetheshapeofthedistribution

Page 10: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

EXAMPLE:In2010,TaxiCabsinNewYorkCitychargedaninitialfeeof$2.50plus$2permile.Inequationform,fare=2.50+2(miles).Attheendofamonthabusinessmancollectsallofhistaxicabreceiptsandcalculatessomenumericalsummaries.Themeanfarehepaidwas$15.45withastandarddeviationof$10.20.Whatarethemeanandstandarddeviationofthelengthsofhiscabridesinmiles?

InChapter1,wedevelopedakitofgraphicalandnumericaltoolsfordescribingdistributions.Now,we’lladdonemoresteptothestrategy.

ExploringQuantitativeData

1.Alwaysplotyourdata:makeagraph.

2.Lookfortheoverallpattern(shape,center,andspread)andforstrikingdeparturessuchasoutliers.

3.Calculateanumericalsummary(summarystatistics)tobrie]lydescribecenterandspread.

4.Sometimestheoverallpatternofalargenumberofobservationsissoregularthatwecandescribeitbyasmoothcurve.

Page 11: Describing Location in a Distribution (2.1)

Section 2.1 Notes - COMPLETED

De#inition:

Adensitycurveisacurvethatisalwaysonorabovethehorizontalaxis,andhasareaexactly1underneathit.

Adensitycurvedescribestheoverallpatternofadistribution.Theareaunderthecurveandaboveanyintervalofvaluesonthehorizontalaxisistheproportionofallobservationsthatfallinthatinterval.

Thehistogrambelowshowsthedistributionofbattingaverage(proportionofhits)forthe432MajorLeagueBaseballplayerswithatleast100plateappearancesinthe2009season.Thesmoothcurveshowstheoverallshapeofthedistribution.

Inthe]irstgraphbelow,thebarsinredrepresenttheproportionofplayerswhohadbattingaveragesofatleast0.270.Thereare177suchplayersoutofatotalof432,foraproportionof0.410.Inthesecondgraphbelow,theareaunderthecurvetotherightof0.270isshaded.Thisareais0.391,only0.019awayfromtheactualproportionof0.410.