Mathematics and Statistics in...

42
Using BioInteractive Resources to Teach Mathematics and Statistics in Biology Paul Strode, PhD Fairview High School Boulder, Colorado Ann Brokaw Rocky River High School Rocky River, Ohio Version: October 2015 2015

Transcript of Mathematics and Statistics in...

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.1

UsingBioInteractive

ResourcestoTeach

Mathematicsand

Statisticsin

Biology

PaulStrode,PhDFairviewHighSchoolBoulder,Colorado

AnnBrokawRockyRiverHighSchool

RockyRiver,Ohio

Version:October2015

2015

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.2

UsingBioInteractiveResourcestoTeach

MathematicsandStatisticsinBiologyAboutThisGuide 3 StatisticalSymbolsandEquations 4

Part1:DescriptiveStatisticsUsedinBiology 5MeasuresofAverage:Mean,Median,andMode 7

Mean 7 Median 8 Mode 9 WhentoUseWhichOne 9 MeasuresofVariability:Range,StandardDeviation,andVariance 9 Range 9 StandardDeviationandVariance 10 UnderstandingDegreesofFreedom 12 MeasuresofConfidence:StandardErroroftheMeanand95%ConfidenceInterval 13Part2:InferentialStatisticsUsedinBiology 17 SignificanceTesting:Theα(Alpha)Level 17 ComparingAverages:TheStudent’st-TestforIndependentSamples 18

AnalyzingFrequencies:TheChi-SquareTest 21 MeasuringCorrelationsandAnalyzingLinearRegression 25Part3:CommonlyUsedCalculationsinBiology 31 RelativeFrequency 31

Probability 31 RateCalculations 33 Hardy-WeinbergFrequencyCalculations 34 StandardCurves 36Part4:BioInteractive’sMathematicsandStatisticsClassroomResources 39 BioInteractiveResourceName(LinkstoClassroom-ReadyResources) 39

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.3

AboutThisGuideManystatesciencestandardsencouragetheuseofmathematicsandstatisticsinbiologyeducation,includingthenewlydesignedAPBiologycourse,IBBiology,NextGenerationScienceStandards,andtheCommonCore.SeveralresourcesontheBioInteractivewebsite(www.biointeractive.org),whicharelistedinthetableattheendofthisdocument,makeuseofmathandstatisticstoanalyzeresearchdata.Thisguideismeanttohelp

educatorsusetheseBioInteractiveresourcesintheclassroombyprovidingfurtherbackgroundonthe

statisticaltestsusedandstep-by-stepinstructionsfordoingthecalculations.Althoughmostoftheexampledatasetsincludedinthisguidearenotrealandaresimplyprovidedtoillustratehowthecalculationsaredone,thedatasetsonwhichtheBioInteractiveresourcesarebasedrepresentactualresearchdata.

Thisguideisnotmeanttobeatextbookonstatistics;itonlycoverstopicsmostrelevanttohighschoolbiology,focusingonmethodsandexamplesratherthantheory.Itisorganizedinfourparts:

• Part1coversdescriptivestatistics,methodsusedtoorganize,summarize,anddescribequantifiable

data.Themethodsincludewaystodescribethetypicaloraveragevalueofthedataandthespreadofthedata.

• Part2coversstatisticalmethodsusedtodrawinferencesaboutpopulationsonthebasisof

observationsmadeonsmallersamplesorgroupsofthepopulation—abranchofstatisticsknownasinferentialstatistics.

• Part3describesothermathematicalmethodscommonlytaughtinhighschoolbiology,including

frequencyandratecalculations,Hardy-Weinbergcalculations,probability,andstandardcurves.

• Part4providesachartofactivitiesontheBioInteractivewebsitethatusemathandstatisticsmethods.AfirstdraftoftheguidewaspublishedinJuly2014.Ithasbeenrevisedbasedonuserfeedbackandexpertreview,andthisversionwaspublishedinOctober2015.Theguidewillcontinuetobeupdatedwithnewcontentandbasedonongoingfeedbackandreview.

Foramorecomprehensivediscussionofstatisticalmethodsandadditionalclassroomexamples,refertoJohnMcDonald’sHandbookofBiologicalStatistics,http://www.biostathandbook.com,andtheCollegeBoard’sAPBiologyQuantitativeSkills:AGuideforTeachers,http://apcentral.collegeboard.com/apc/public/repository/AP_Bio_Quantitative_Skills_Guide-2012.pdf.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.4

StatisticalSymbolsandEquations

Listedbelowaretheuniversalstatisticalsymbolsandequationsusedinthisguide.Thecalculationscanallbedoneusingscientificcalculatorsortheformulafunctioninspreadsheetprograms.

!: Totalnumberofindividualsinapopulation(i.e.,thetotalnumberofbutterfliesofaparticularspecies)

#: Totalnumberofindividualsinasampleofapopulation(i.e.,thenumberofbutterfliesinanet)

df: Thenumberofmeasurementsinasamplethatarefreetovaryoncethesamplemeanhasbeencalculated;inasinglesample,df=#–1

&': Asinglemeasurement

(: The(thobservationinasample

S: Summation

&: Samplemean &=  *+,

-.: Samplevariance -.=∑  *+0*1

,02

-: Samplestandarddeviation -= -.

SEx: Samplestandarderror,orstandarderrorofthemean(SEM) SE= 5,

95%CI:95%confidenceinterval 95%CI=2.785,

t-test: tobs=|*:0*1|

;:1<:=;1

1<1

Chi-squaretest(>.): >.= ?0@ 1

@

Linearregressiontest: A=B+CB;B

<+D:

E+CE;E

,02

Hardy-Weinbergprinciple: p2+2pq+q2=1.0

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.5

Part1:DescriptiveStatisticsUsedinBiologyScientiststypicallycollectdataonasampleofapopulationandusethesedatatodrawconclusions,ormakeinferences,abouttheentirepopulation.AnexampleofsuchadatasetisshowninTable1.ItshowsbeakmeasurementstakenfromtwogroupsofmediumgroundfinchesthatlivedontheislandofDaphneMajor,oneoftheGalápagosIslands,duringamajordroughtin1977.Onegroupoffinchesdiedduringthedrought,andonegroupsurvived.(ThesedatawereprovidedbyscientistsPeterandRosemaryGrant,andthe

completedataareavailableontheBioInteractivewebsiteat

http://www.hhmi.org/biointeractive/evolution-action-data-analysis.)

Table1.BeakDepthMeasurementsinaSampleofMediumGroundFinchesfromDaphneMajor

Note:“Band”referstoanindividual’sidentity—morespecifically,thenumberonametallegbanditwasgiven.Fiftyindividualsdiedin1977(nonsurvivors)and50survivedbeyond1977(survivors),theyearofthedrought.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.6

HowwouldyoudescribethedatainTable1,andwhatdoesittellyouaboutthepopulationsofmediumgroundfinchesofDaphneMajor?Thesearedifficultquestionstoanswerbylookingatatableofnumbers.

OneofthefirststepsinanalyzingasmalldatasetliketheoneshowninTable1istographthedataand

examinethedistribution.Figure1showstwographsofbeakmeasurements.Thegraphonthetopshowsbeakmeasurementsoffinchesthatdiedduringthedrought.Thegraphonthebottomshowsbeakmeasurementsoffinchesthatsurvivedthedrought.

Beak Depths of 50 Medium Ground Finches That Did Not Survive the Drought

Beak Depths of 50 Medium Ground Finches That Survived the Drought

Figure1.DistributionsofBeakDepthMeasurementsinTwoGroupsofMediumGroundFinches

Noticethatthemeasurementstendtobemoreorlesssymmetricallydistributedacrossarange,withmostmeasurementsaroundthecenterofthedistribution.Thisisacharacteristicofanormaldistribution.Moststatisticalmethodscoveredinthisguideapplytodatathatarenormallydistributed,likethebeakmeasurementsabove;othertypesofdistributionsrequireeitherdifferentkindsofstatisticsortransformingdatatomakethemnormallydistributed.

Howwouldyoudescribethesetwographs?Howaretheythesameordifferent?Descriptivestatisticsallowsyoutodescribeandquantifythesedifferences.TherestofPart1ofthisguideprovidesstep-by-stepinstructionsforcalculatingmean,standarddeviation,standarderror,andotherdescriptivestatistics.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.7

MeasuresofAverage:Mean,Median,andModeInthetwographsinFigure1,thecenterandspreadofeachdistributionisdifferent.Thecenterofthedistributioncanbedescribedbythemean,median,ormode.Thesearereferredtoasmeasuresofcentral

tendency.

MeanYoucalculatethesamplemean(alsoreferredtoastheaverageorarithmeticmean)bysummingallthedatapointsinadataset(�X)andthendividingthisnumberbythetotalnumberofdatapoints(N):

Whatwewanttounderstandisthemeanoftheentirepopulation,whichisrepresentedbyµ.Theyusethesamplemean,representedby&,asanestimateofµ.

ApplicationinBiology

Studentsinabiologyclassplantedeightbeanseedsinseparateplasticcupsandplacedthemunderabankoffluorescentlights.Fourteendayslater,thestudentsmeasuredtheheightofthebeanplantsthatgrewfromthoseseedsandrecordedtheirresultsinTable2.

Table2.BeanPlantHeights

PlantNo. 1 2 3 4 5 6 7 8

Height(cm) 7.5 10.1 8.3 9.8 5.7 10.3 9.2 8.7Todeterminethemeanofthebeanplants,followthesesteps:

I. Findthesumoftheheights:

7.5+10.1+8.3+9.8+5.7+10.3+9.2+8.7=69.6centimeters

II. Countthenumberofheightmeasurements:Thereare8heightmeasurements.

III. Dividethesumoftheheightsbythenumberofmeasurementstocomputethemean:

mean=69.6cm/8=8.7centimeters

Themeanforthissampleofeightplantsis8.7centimetersandservesasanestimateforthetruemeanofthepopulationofbeanplantsgrowingundertheseconditions.Inotherwords,ifthestudentscollecteddatafromhundredsofplantsandgraphedthedata,thecenterofthedistributionshouldbearound8.7centimeters.

MedianWhenthedataareorderedfromthelargesttothesmallest,themedianisthemidpointofthedata.Itisnotdistortedbyextremevalues,orevenwhenthedistributionisnotnormal.Forthisreason,itmaybemoreusefulforyoutousethemedianasthemaindescriptivestatisticforasampleofdatainwhichsomeofthemeasurementsareextremelylargeorextremelysmall.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.8

Todeterminethemedianofasetofvalues,youfirstarrangetheminnumericalorderfromlowesttohighest.Themiddlevalueinthelististhemedian.Ifthereisanevennumberofvaluesinthelist,thenthemedianisthemeanofthemiddletwovalues.

ApplicationinBiology

AresearcherstudyingmousebehaviorrecordedinTable3thetime(inseconds)ittook13differentmicetolocatefoodinamaze.

Table3.LengthofTimeforMicetoLocateFoodinaMaze

MouseNo. 1 2 3 4 5 6 7 8 9 10 11 12 13

Time(sec.) 31 33 163 33 28 29 33 27 27 34 35 28 32Todeterminethemediantimethatthemicespentsearchingforfood,followthesesteps:

I. Arrangethetimevaluesinnumericalorderfromlowesttohighest:

27,27,28,28,29,31,32,33,33,33,34,35,163

II. Findthemiddlevalue.Thisvalueisthemedian:

median=32secondsInthiscase,themedianis32seconds,butthemeanis41seconds,whichislongerthanallbutoneofthemicetooktosearchforfood.Inthiscase,themeanwouldnotbeagoodmeasureofcentraltendencyunlessthereallyslowmouseisexcludedfromthedataset.

ModeThemodeisanothermeasureoftheaverage.Itisthevaluethatappearsmostofteninasampleofdata.IntheexampleshowninTable3,themodeis33seconds.

Themodeisnottypicallyusedasameasureofcentraltendencyinbiologicalresearch,butitcanbeusefulindescribingsomedistributions.Forexample,Figure2showsadistributionofbodylengthswithtwopeaks,ormodes—calledabimodaldistribution.Describingthesedatawithameasureofcentraltendencylikethemeanormedianwouldobscurethisfact.

Figure2.GraphofBodyLengthsofWeaverAntWorkers(Reproducedfromhttp://en.wikipedia.org/wiki/File:BimodalAnts.png.)

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.9

WhentoUseWhichOneThepurposeofthesestatisticsistocharacterize“typical”datafromadataset.Youusethemeanmostoftenforthispurpose,butitbecomeslessusefulifthedatainthedatasetarenotnormallydistributed.Whenthedataarenotnormallydistributed,thenotherdescriptivestatisticscangiveabetterideaaboutthetypicalvalueofthedataset.Themedian,forexample,isausefulnumberifthedistributionisheavilyskewed.Forexample,youmightusethemediantodescribeadatasetoftoprunningspeedsoffour-leggedanimals,mostofwhicharerelativelyslowandafew,likecheetahs,areveryfast.Themodeisnotusedveryfrequentlyinbiology,butitmaybeusefulindescribingsometypesofdistributions—forexample,oneswithmorethanonepeak.

MeasuresofVariability:Range,StandardDeviation,andVarianceVariabilitydescribestheextenttowhichnumbersinadatasetdivergefromthecentraltendency.Itisameasureofhow“spreadout”thedataare.Themostcommonmeasuresofvariabilityarerange,standard

deviation,andvariance.

RangeThesimplestmeasureofvariabilityinasampleofnormallydistributeddataistherange,whichisthedifferencebetweenthelargestandsmallestvaluesinasetofdata.

ApplicationinBiology

StudentsinabiologyclassmeasuredthewidthincentimetersofeightleavesfromeightdifferentmapletreesandrecordedtheirresultsinTable4.

Table4.WidthofMapleTreeLeaves

PlantNo. 1 2 3 4 5 6 7 8

Width(cm) 7.5 10.1 8.3 9.8 5.7 10.3 9.2 8.7Todeterminetherangeofleafwidths,followthesesteps:

I. Identifythelargestandsmallestvaluesinthedataset:

largest=10.3centimeters,smallest=5.7centimeters

II. Todeterminetherange,subtractthesmallestvaluefromthelargestvalue:

range=10.3centimeters–5.7centimeters=4.6centimeters

Alargerrangevalueindicatesagreaterspreadofthedata—inotherwords,thelargertherange,thegreaterthevariability.However,anextremelylargeorsmallvalueinthedatasetwillmakethevariabilityappearhigh.Forexample,ifthemapleleafsamplehadnotincludedtheverysmallleafnumber5,therangewouldhavebeenonly2.8centimeters.Thestandarddeviationprovidesamorereliablemeasureofthe“true”spreadofthedata.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.10

StandardDeviationandVarianceThestandarddeviationisthemostwidelyusedmeasureofvariability.Thesamplestandarddeviation(s)isessentiallytheaverageofthedeviationbetweeneachmeasurementinthesampleandthesamplemean(&).Thesamplestandarddeviationestimatesthestandarddeviationinthelargerpopulation.

Theformulaforcalculatingthesamplestandarddeviationfollows:

s= ∑ (*+0*)1(,02)

CalculationSteps1. Calculatethemean(&)ofthesample.

2. Findthedifferencebetweeneachmeasurement(&i)inthedatasetandthemean(&)oftheentireset:

(&i−&)3. Squareeachdifferencetoremoveanynegativevalues:

(&i−&)24. Addup(sum,S)allthesquareddifferences:

S (&i−&)25. Dividebythedegreesoffreedom(df),whichis1lessthanthesamplesize(n–1):

∑ (&' − &).(# − 1)

Notethatthenumbercalculatedatthisstepprovidesastatisticcalledvariance(s2).Varianceisameasureofvariabilitythatisusedinmanystatisticalmethods.Itisthesquareofthestandarddeviation.

6. Takethesquareroottocalculatethestandarddeviation(s)forthesample.

ApplicationinBiology

Youareinterestedinknowinghowtallbeanplants(Phaseolusvulgaris)growintwoweeksafterplanting.Youplantasampleof20seeds(n=20)inseparatepotsandgivethemequalamountsofwaterandlight.Aftertwoweeks,17oftheseedshavegerminatedandhavegrownintosmallseedlings(nown=17).Youmeasureeachplantfromthetipsoftherootstothetopofthetalleststem.YourecordthemeasurementsinTable5,alongwiththestepsforcalculatingthestandarddeviation.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.11

Table5.PlantMeasurementsandStepsforCalculatingtheStandardDeviation

PlantNo. PlantHeight(mm) Step2:(&i−&)(mm)

Step3:(&i−&)2(mm

2)

1 112 (112–103)=9 92=812 102 (102–103)=(−1) (−1)2=1

3 106 (106–103)=3 32=9

4 120 (120–103)=17 172=289

5 98 (98–103)=(−5) (−5)2=25

6 106 (106–103)=3 32=9

7 80 (80–103)=(−23) (−23)2=529

8 105 (105–103)=2 22=4

9 106 (106–103)=3 32=9

10 110 (110–103)=7 72=49

11 95 (95–103)=(−8) (−8)2=64

12 98 (98–103)=(−5) (−5)2=25

13 74 (74–103)=(−29) (−29)2=841

14 112 (112–103)=9 92=81

15 115 (115–103)=12 122=144

16 109 (109–103)=6 62=36

17 100 (100–103)=(−3) (−3)2=9Step1:Calculatemean. &=103mm Step4:∑ (&i−&)2

=2,205

Variance,s2 KLMN5:∑ (&i−&)2/(n–1)=2,205/16=138

StandardDeviation,s Step6: -.= 138=11.7mm

Note:Theunitsforvariancearesquaredunits,whichmakevariancelessusefulasameasureofdispersion.

Themeanheightofthebeanplantsinthissampleis103millimeters±11.7millimeters.Whatdoesthistellus?Inadatasetwithalargenumberofmeasurementsthatarenormallydistributed,68.3%(orroughlytwo-

thirds)ofthemeasurementsareexpectedtofallwithin1standarddeviationofthemeanand95.4%ofall

datapointsliewithin2standarddeviationsofthemeanoneitherside(Figure3).Thus,inthisexample,ifyouassumethatthissampleof17observationsisdrawnfromapopulationofmeasurementsthatarenormallydistributed,68.3%ofthemeasurementsinthepopulationshouldfallbetween91.3and114.7millimetersand95.4%ofthemeasurementsshouldfallbetween79.6millimetersand126.4millimeters.(Note:youmightget

slightlydifferentvaluesifyouuseaspreadsheettomakethecalculationsbecauseinthisexamplethemeanandstandarddeviationwererounded.)

Figure3.TheoreticalDistributionofPlantHeights.

Fornormallydistributeddata,68.3%ofdatapointsliebetween±1standarddeviationofthemeanand95.4%ofdatapointsliebetween±2standarddeviationsofthemean.

Num

ber o

f Pla

nts

Plant Height (mm)

34.1%13.6% 34.1% 13.6%

Mean

–2 SD +2 SD

–1 SD +1 SD

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.12

Wecangraphthemeanandthestandarddeviationofthissampleofbeanplantsusingabargraphwitherrorbars(Figure4).Standarddeviationbarssummarizethevariationinthedata—themorespreadoutthe

individualmeasurementsare,thelargerthestandarddeviation.Ontheotherhand,errorbarsbasedonthestandarderrorofthemeanorthe95%confidenceintervalrevealtheuncertaintyinthesamplemean.Theydependonhowspreadoutthemeasurementsareandonthesamplesize.(Thesestatisticsarediscussedfurtherin“MeasuresofConfidence:StandardErroroftheMeanand95%ConfidenceInterval”.)

Figure4.MeanPlantHeightofaSampleofBeanPlantsandan

ErrorBarRepresenting±1StandardDeviation.Roughlytwo-thirdsofthemeasurementsinthispopulationwouldbeexpectedtofallintherangeindicatedbythebar.

Acommonmisconceptionisthatstandarddeviationdecreaseswithincreasingsamplesize.Asyouincreasethesamplesize,standarddeviationcaneitherincreaseordecreasedependingonthemeasurementsinthe

sample.However,withalargersamplesize,standarddeviationwillbecomeamoreaccurateestimateofthestandarddeviationofthepopulation.

UnderstandingDegreesofFreedomCalculationsofsampleestimates,suchasthestandarddeviationandvariance,usedegreesoffreedominsteadofsamplesize.Thewayyoucalculatedegreesoffreedomdependsonthestatisticalmethodyouareusing,butforcalculatingthestandarddeviation,itisdefinedas1lessthanthesamplesize(n−1).

Toillustratewhatthisnumbermeans,considerthefollowingexample.Biologistsareinterestedinthevariationinlegsizesamonggrasshoppers.Theycatchfivegrasshoppers(#=5)inanetandpreparetomeasuretheleftlegs.Asthescientistspullgrasshoppersoneatatimefromthenet,theyhavenowayofknowingtheleglengthsuntiltheymeasurethemall.Inotherwords,allfiveleglengthsare“free”tovarywithinsomegeneralrangeforthisparticularspecies.Thescientistsmeasureallfiveleglengthsandthencalculatethemeantobex=10millimeters.Theythenplacethegrasshoppersbackinthenetanddecidetopullthemoutoneatatimetomeasurethemagain.Thistime,sincethebiologistsalreadyknowthemeantobe10,onlythefirstfourmeasurementsarefreetovarywithinagivenrange.Ifthefirstfourmeasurementsare8,9,10,and12millimeters,thenthereisnofreedomforthefifthmeasurementtovary;ithastobe11.Thus,oncetheyknowthesamplemean,thenumberofdegreesoffreedomis1lessthanthesamplesize,df = 4.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.13

MeasuresofConfidence:StandardErroroftheMeanand95%ConfidenceIntervalThestandarddeviationprovidesameasureofthespreadofthedatafromthemean.Adifferenttypeofstatisticrevealstheuncertaintyinthecalculationofthemean.

Thesamplemeanisnotnecessarilyidenticaltothemeanoftheentirepopulation.Infact,everytimeyoutakeasampleandcalculateasamplemean,youwouldexpectaslightlydifferentvalue.Inotherwords,thesamplemeansthemselveshavevariability.Thisvariabilitycanbeexpressedbycalculatingthestandarderrorofthemean(abbreviatedasSE*orSEM).

Toillustratethispoint,assumethatthereisapopulationofaspeciesofanolelizardslivingonanislandoftheCaribbean.Ifyouwereabletomeasurethelengthofthehindlimbsofeveryindividualinthispopulationandthencalculatethemean,youwouldknowthevalueofthepopulationmean.However,therearethousandsofindividuals,soyoutakeasampleof10anolesandcalculatethemeanhindlimblengthforthatsample.Anotherresearcherworkingonthatislandmightcatchanothersampleof10anolesandcalculatethemeanhindlimblengthforthissample,andsoon.Thesamplemeansofmanydifferentsampleswouldbenormallydistributed.Thestandarderrorofthemeanrepresentsthestandarddeviationofsuchadistributionandestimateshowclosethesamplemeanistothepopulationmean.

Thegreatereachsamplesize(i.e.,50ratherthan10anoles),themorecloselythesamplemeanwillestimatethepopulationmean,andthereforethestandarderrorofthemeanbecomessmaller.

TocalculateSE*orSEMdividethestandarddeviationbythesquarerootofthesamplesize:

-= ∑(*+0*)1(,02)

SE*= 5,

Whatthestandarderrorofthemeantellsyouisthatabouttwo-thirds(68.3%)ofthesamplemeanswould

bewithin±1standarderrorofthepopulationmeanand95.4%wouldbewithin±2standarderrors.

Anothermoreprecisemeasureoftheuncertaintyinthemeanisthe95%confidenceinterval(95%CI).Forlargesamplesizes,95%CIcanbecalculatedusingthisformula:2.785, ,whichistypicallyroundedto.5,foreaseofcalculation.Inotherwords,95%CIisabouttwicethestandarderrorofthemean.Theactualformulaforcalculating95%CIusesastatisticcalledthet-valueforasignificancelevelof0.05,whichisexplainedinTable8inPart2.Forlargesamplesizes,thist-valueis1.96.Sincet-valuesarenottypicallycoveredinhighschoolbiology,inthisguideweestimatethe95%CIbyusing2xSEM,butnotethatthisisjustanapproximation.NoteaboutErrorBars:Manybargraphsincludeerrorbars,whichmayrepresentstandarddeviation,SEM,or95%CI.WhenthebarsrepresentSEM,youknowthatifyoutookmanysamplesonlyabouttwo-thirdsoftheerrorbarswouldincludethepopulationmean.Thisisverydifferentfromstandarddeviationbars,whichshowhowmuchvariationthereisamongindividualobservationsinasample.Whentheerrorbarsrepresent95%confidenceintervalsinagraph,youknowthatinabout95%ofcasestheerrorbarsincludethepopulation

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.14

mean.IfagraphshowserrorbarsthatrepresentSEM,youcanestimatethe95%confidenceintervalbymakingthebarstwiceasbig—thisisafairlyaccurateapproximationforlargesamplesizes,butforsmallsamplesthe95%confidenceintervalsareactuallymorethantwiceasbigastheSEMs.

ApplicationinBiology—Example1

Seedsofmanyweedspeciesgerminatebestinrecentlydisturbedsoilthatlacksalight-blockingcanopyofvegetation.Studentsinabiologyclasshypothesizedthatweedseedsgerminatebestwhenexposedtolight.Totestthishypothesis,thestudentsplacedaseedfromcroftonweed(Ageratinaadenophora,aninvasivespeciesonseveralcontinents)ineachof20petridishesandcoveredtheseedswithdistilledwater.Theyplacedhalfthepetridishesinthedarkandhalfinthelight.Afteroneweek,thestudentsmeasuredthecombinedlengthsinmillimetersoftheradiclesandshootsextendingfromtheseedsineachdish.Table6showsthedataandcalculationsofvariance,standarddeviation,standarderrorofthemean,and95%confidenceinterval.Thestudentsplottedthedataastwobargraphsshowingthestandarderrorofthemeanand95%confidenceinterval(Figure5).

Table6.CombinedLengthsofCroftonWeedRadiclesandShootsafterOneWeekintheDarkandtheLight

PetriDishes Dark(Z[)(mm)

Light(Z\)(mm)

Dark(Z]−Z[)2(mm

2)

Light(Z]−Z\)2(mm

2)

1and2 12 18 (12–9.6)2=5.8 (18–18.4)2=0.16

3and4 8 22 (8–9.6)2=2.6 (22–18.4)2=12.96

5and6 15 17 (15–9.6)2=29.1 (17–18.4)2=1.96

7and8 13 23 (13–9.6)2=11.5 (23–18.4)2=21.16

9and10 6 16 (6–9.6)2=13.0 (16–18.4)2=5.76

11and12 4 18 (4–9.6)2=31.4 (18–18.4)2=0.16

13and14 13 22 (13–9.6)2=11.6 (22–18.4)2=12.96

15and16 14 12 (14–9.6)2=19.3 (12–18.4)2=40.96

17and18 5 19 (5–9.6)2=21.1 (19–18.4)2=0.36

19and20 6 17 (6–9.6)2=13.0 (17–18.4)2=1.96

∑(&'−&2)2=158.4 ∑(&'−&.)2=98.4

Mean(&) &2=9.6(10)mm

&.=18.4(18)mm

(*+0*)1(,02) =2^_.`7 ∑ (*+0*)1

(,02) =7_.`7

Variance(-.) -2.=17.6 -..=10.93

StandardDeviation,-= ∑ (*+0*)1(,02) - =4.20mm - =3.31mm

StandardErroroftheMean,SE*= 5, SE*=`..a2a=1.33 SE*=b.b22a=1.05

95%CI=\cd 95%CI=.(`..a)2a =2.7 95%CI=.(`.e`)2a =2.1

Note:Thenumberofreplicates(i.e.,samplesize,n)=10.Meansinparentheses,thatis,(10)and(18),aretothenearestmillimeter.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.15

Figure5.MeanLengthofCroftonSeedlingsafterOneWeekintheDarkorintheLight.ThestandarderrorofthemeangraphshowsthefgZaserrorbars,andthe95%confidenceintervalgraphshowsthehi%jkaserrorbars.(Notethatinthesecalculationsweapproximated95%CIasabouttwicetheSEM.)ThecalculationsinTable6showthatalthoughthestudentsdon’tknowtheactualmeancombinedradicleandshootlengthoftheentirepopulationofcroftonplantsinthedark,itislikelytobeanumberaroundthesamplemeanof9.6millimeters±1.3millimeters.Forthelighttreatmentitislikelytobe18.4millimeters±1millimeter.Thestudentscanbeevenmorecertainthatthepopulationmeanwouldbe9.6millimeters±2.6millimetersforthedarktreatmentand18.4millimeters±2.1millimetersforthelighttreatment.

Note:Bylookingatthebargraphs,youcanseethatthemeansforthelightanddarktreatmentsaredifferent.Becausethe95%confidenceintervalerrorbarsdonotoverlap,thissuggeststhatthetruepopulationmeansarealsodifferent.However,inordertodeterminewhetherthisdifferenceissignificant,youwillneedtoconductanotherstatisticaltest,theStudent’st-test,whichiscoveredin”ComparingAverages”inPart2ofthisguide.

ApplicationinBiology—Example2

Ateacherhadfivestudentswritetheirnamesontheboard,firstwiththeirdominanthandsandthenwiththeirnondominanthands.Therestoftheclassobservedthatthestudentswrotemoreslowlyandwithlessprecisionwiththenondominanthandthanwiththedominanthand.Theteacherthenaskedtheclasstoexplaintheirobservationsbydevelopingtestablehypotheses.Theyhypothesizedthatthedominanthandwasbetteratperformingfinemotormovementsthanthenondominanthand.Theclasstestedthishypothesisbytiming(inseconds)howlongittookeachstudenttobreak20toothpickswitheachhand.Theresultsoftheexperimentandthecalculationsofvariance,standarddeviation,standarderrorofthemean,and95%confidenceintervalarepresentedinTable7.Thestudentsthenillustratedthedataanduncertaintywithtwobargraphs,oneshowingthestandarderrorofthemeanandtheothershowingthe95%confidenceinterval(Figure6).

StandardErroroftheMean 95%ConfidenceInterval

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.16

Table7.NumberofSecondsItTookforStudentstoBreak20ToothpickswithTheirNondominant(ND)and

Dominant(D)Hands(numberofreplicates[n]=14)Students ND(Z[)

(sec.)

D(Z\)(sec.)

ND(Z]−Z[)2 D(Z]−Z\)2

Josh 33 37 (33–33)2=0 (37–35)2=4

Bobby 24 22 (24–33)2=81 (22–35)2=169

Qing 35 37 (35–33)2=4 (37–35)2=4

Julie 33 28 (33–33)2=0 (28–35)2=49

Lisa 42 50 (42–33)2=81 (50–35)2=225

Akash 36 36 (36–33)2=9 (36–35)2=1

Hector 31 36 (31–33)2=4 (36–35)2=1

Viviana 40 46 (40–33)2=49 (46–35)2=121

Brenda 28 26 (28–33)2=25 (26–35)2=81

Jane 24 28 (24–33)2=81 (28–35)2=49

Asa 23 22 (23–33)2=100 (22–35)2=169

Eli 44 52 (44–33)2=121 (52–35)2=289

Adee 35 29 (35–33)2=4 (29–35)2=36

Jenny 36 37 (36–33)2=9 (37–35)2=4

∑(&'−&2)2=568 ∑(&'−&.)2=1,200Mean(&) &2=33 &.=35 ∑(*+0*:).

,02 =^8_2b ∑(*+0*1).

,02 =2,.aa2b

Variance,-. -2.=44 -..=92

StandardDeviation,-= ∑ (*+0*)1(,02) - =6.6sec. - =9.6sec.

StandardErroroftheMean,SE*= cd SE*= 8.82`=1.8sec. SE*= 7.82`=2.6sec.

95%CI=\cd 95%CI=.(8._`)2` =3.5 95%CI=.(7.8)2` =5.1

Figure6.MeanNumberofSecondsforStudentstoBreak20ToothpickswiththeirNondominantHands(ND)and

DominantHands(D).ThestandarderrorofthemeangraphshowsthefgZaserrorbars,andthe95%confidenceintervalgraphshowsthehi%jkaserrorbars.

StandardErroroftheMean 95%ConfidenceInterval

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.17

Thecalculationsindicatethatittakesabout31.2seconds(33−1.8)to34.8seconds(33+1.8)forthenondominanthandtobreaktoothpicksandabout32.4to37.6secondsforthedominanthand.Youcanbemorecertainthattheaverageforthenondominanthandwouldfallsomewherebetween29.5seconds(33–3.5)and36.5seconds(33+3.5)andforthedominanthandsfallssomewherebetween29.9seconds(35–5.1)and40.1seconds(35+5.1).

Thisendsthepartondescriptivestatistics.GoingbacktothefinchdatasetinTable1andFigure1ofPart1,

howwouldyoucalculatethesamplemeansforbeaksizesofthesurvivorsandnonsurvivors?Istheremore

variabilityamongsurvivorsornonsurvivors?Whatistheuncertaintyinyoursamplemeanestimates?To

findtheanswerstothesequestions,seethe“EvolutioninAction:DataAnalysis”activitiesat

http://www.hhmi.org/biointeractive/evolution-action-data-analysis.

Part2:InferentialStatisticsUsedinBiologyInferentialstatisticstestsstatisticalhypotheses,whicharedifferentfromexperimentalhypotheses.Tounderstandwhatthismeans,assumethatyoudoanexperimenttotestwhether“nitrogenpromotesplantgrowth.”Thisisanexperimentalhypothesisbecauseittellsyousomethingaboutthebiologyofplantgrowth.Totestthishypothesis,yougrow10beanplantsindirtwithaddednitrogenand10beanplantsindirtwithoutaddednitrogen.Youfindoutthatthemeansofthesetwosamplesare13.2centimetersand11.9centimeters,respectively.Doesthisresultindicatethatthereisadifferencebetweenthetwopopulationsandthatnitrogenmightpromoteplantgrowth?Oristhedifferenceinthetwomeansmerelyduetochance?Astatisticaltestisrequiredtodiscriminatebetweenthesepossibilities.

Statisticaltestsevaluatestatisticalhypotheses.Thestatisticalnullhypothesis(symbolizedbyH0andpronouncedH-naught)isastatementthatyouwanttotest.Inthiscase,ifyougrow10plantswithnitrogenand10without,thenullhypothesisisthatthereisnodifferenceinthemeanheightsofthetwogroupsandanyobserveddifferencebetweenthetwogroupswouldhaveoccurredpurelybychance.ThealternativehypothesistoH0issymbolizedbyH1andusuallysimplystatesthatthereisadifferencebetweenthepopulations.

Thestatisticalnullandalternativehypothesesarestatementsaboutthedatathatshouldfollowfromthe

experimentalhypothesis.

SignificanceTesting:Thea(Alpha)LevelBeforeyouperformastatisticaltestontheplantgrowthdata,youshoulddetermineanacceptablesignificancelevelofthenullstatisticalhypothesis.Thatis,ask,whendoIthinkmyresultsandthusmyteststatisticaresounusualthatInolongerthinkthedifferencesobservedinmydataaresimplyduetochance?Thissignificancelevelisalsoknownas“alpha”andissymbolizedbya.

Thesignificancelevelistheprobabilityofgettingateststatisticrareenoughthatyouarecomfortable

rejectingthenullhypothesis(H0).(Seethe“Probability”sectionofPart3forfurtherdiscussionofprobability.)Thewidelyacceptedsignificancelevelinbiologyis0.05.Iftheprobability(p)valueislessthan0.05,yourejectthenullhypothesis;ifpisgreaterthanorequalto0.05,youdon’trejectthenullhypothesis.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.18

ComparingAverages:TheStudent’st-TestforIndependentSamplesTheStudent’st-testisusedtocomparethemeansoftwosamplestodeterminewhethertheyare

statisticallydifferent.Forexample,youcalculatedthesamplemeansofsurvivorandnonsurvivorfinchesfromTable1andyougotdifferentnumbers.Whatistheprobabilityofgettingthisdifferenceinmeans,ifthepopulationmeansarereallythesame?

Thet-testassessestheprobabilityofgettingaresultmoredifferentthantheobservedresult(i.e.,thevaluesyoucalculatedforthemeansshowninFigure1)ifthenullstatisticalhypothesis(H0)istrue.Typically,thenullstatisticalhypothesisinat-testisthatthemeanofthepopulationfromwhichsample1came(i.e.,themeanbeaksizeofsurvivors)isequaltothemeanofthepopulationfromwhichsample2came(i.e.,themeanbeaksizeofthenonsurvivors),orm1=m2.RejectingH0supportsthealternativehypothesis,H1,thatthemeansaresignificantlydifferent(m1¹m2).Inthefinchexample,thet-testdetermineswhetheranyobserveddifferencesbetweenthemeansofthetwogroupsoffinches(9.67millimetersversus9.11millimeters)arestatisticallysignificantorhavelikelyoccurredsimplybychance.

At-testcalculatesasinglestatistic,t,ortobs,whichiscomparedtoacriticalt-statistic(tcrit):

tobs=|*:0*1|no

Tocalculatethestandarderror(SE)specificforthet-test,wecalculatethesamplemeansandthevariance(s2)forthetwosamplesbeingcompared—thesamplesize(n)foreachsamplemustbeknown:

SE= 5:1,:+ 51

1

,1

Thus,thecompleteequationforthet-testis

tobs=|*:0*1|

;:1<:=;1

1<1

CalculationSteps

1. Calculatethemeanofeachsamplepopulationandsubtractonefromtheother.Taketheabsolutevalueofthisdifference.

2. Calculatethestandarderror,SE.Tocomputeit,calculatethevarianceofeachsample(s2),anddivideit

bythenumberofmeasuredvaluesinthatsample(n,thesamplesize).Addthesetwovaluesandthentakethesquareroot.

3. Dividethedifferencebetweenthemeansbythestandarderrortogetavaluefort.Comparethe

calculatedvaluetotheappropriatecriticalt-valueinTable8.Table8showstcritfordifferentdegreesoffreedomforasignificancevalueof0.05.Thedegreesoffreedomiscalculatedbyaddingthenumber

ofdatapointsinthetwogroupscombined,minus2.Notethatyoudonothavetohavethesamenumberofdatapointsineachgroup.

4. Ifthecalculatedt-valueisgreaterthantheappropriatecriticalt-value,thisindicatesthatyouhave

enoughevidencetosupportthehypothesisthatthemeansofthetwosamplesaresignificantlydifferentattheprobabilityvaluelisted(inthiscase,0.05).Ifthecalculatedtissmaller,thenyoucannotrejectthenullhypothesisthatthereisnosignificantdifference.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.19

Table8.Criticalt-ValuesforaSignificanceLevela=0.05DegreesofFreedom(df) tcrit(a=0.05)1 12.712 4.303 3.184 2.785 2.576 2.457 2.368 2.319 2.2610 2.2311 2.2012 2.1813 2.1614 2.1415 2.1316 2.1217 2.1118 2.1019 2.0920 2.0921 2.0822 2.0723 2.0724 2.0625 2.0626 2.0627 2.0528 2.0529 2.0430 2.0440 2.0260 2.00120 1.98Infinity 1.96

Note:Therearetwobasicversionsofthet-test.Theversionpresentedhereassumesthateachsamplewastakenfromadifferentpopulation,andsothesamplesarethereforeindependentofoneanother.Forexample,thesurvivorandnonsurvivorfinchesaredifferentindividuals,independentofoneanother,andthereforeconsideredunpaired.Ifwewerecomparingthelengthsofrightandleftwingsonallthefinches,thesampleswouldbeclassifiedaspaired.Pairedsamplesrequireadifferentversionofthet-testknownasapairedt-test,aversiontowhichmanystatisticalprogramsdefault.Thepairedt-testisnotdiscussedinthisguide.

ApplicationinBiology

Afterasmallpopulationofcrayfishwasaccidentallyreleasedintoashallowpond,biologistsnoticedthatthecrayfishhadconsumednearlyalloftheunderwaterplantpopulation;aquaticinvertebrates,suchasthewaterflea(Daphniasp.),hadalsodeclined.ThebiologistsknewthatthemainpredatorofDaphniaisthegoldfish,andtheyhypothesizedthattheunderwaterplantsprotectedtheDaphniafromthegoldfishbyprovidinghidingplaces.TheDaphnialosttheirprotectionastheunderwaterplantsdisappeared.Thebiologistsdesignedan

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.20

experimenttotesttheirhypothesis.TheyplacedgoldfishandDaphniatogetherinatankwithunderwaterplants,andanequalnumberofgoldfishandDaphniainanothertankwithoutunderwaterplants.TheythencountedthenumberofDaphniaeatenbythegoldfishin30minutes.Theyreplicatedthisexperimentinnineadditionalpairsoftanks(i.e.,samplesize=10,orn=10,pergroup).Theresultsoftheirexperimentandtheircalculationsofexperimentalerror(variance,s2)areinTable9.

Experimentalhypothesis:TheunderwaterplantsprotectDaphniafromgoldfishbyprovidinghidingplaces.Experimentalprediction:ByplacingDaphniaandgoldfishintankswithandwithoutplants,youshouldseeadifferenceinthesurvivalofDaphniainthetwotanks.Statisticalnullhypothesis:ThereisnodifferenceinthenumberofDaphniaintankswithplantscomparedtotankswithoutplants:anydifferencebetweenthetwogroupsoccurssimplybychance.Statisticalalternativehypothesis:ThereisadifferenceinthenumberofDaphniaintankswithplantscomparedtotankswithoutplants.

Table9.NumberofDaphniaEatenbyGoldfishin30MinutesinTankswithorwithoutUnderwaterPlants

Tanks Plants

(sample1)

NoPlants

(sample2)

Plants

(Z]−Z[)2NoPlants

(Z]−Z\)21and2 13 14 (9.6–13)2=11.56 (14.4–14)2=0.16

3and4 9 12 (9.6–9)2=0.36 (14.4–12)2=5.876

5and6 10 15 (9.6–10)2=0.16 (14.4–15)2=0.436

7and8 10 14 (9.6–10)2=0.16 (14.4–14)2=0.16

9and10 7 17 (9.6–7)2=6.76 (14.4–17)2=6.76

11and12 5 10 (9.6–5)2=21.16 (14.4–10)2=19.37

13and14 10 15 (9.6–10)2=0.16 (14.4–15)2=0.36

15and16 14 15 (9.6–14)2=19.34 (14.4–15)2=0.36

17and18 9 18 (9.6–9)2=0.36 (14.4–18)2=12.96

19and20 9 14 (9.6–9)2=0.36 (14.4–14)2=0.16

∑(&'−&2)2=60.4 ∑(&'−&.)2=46.4

Mean,& &2=9.6 &.=14.4 ∑(*+0*:)1,02 =8a.`7 ∑(*+0*:)1

,02 =`8.`7

Variance,-. -2.=6.71 -..=5.16

Todeterminewhetherthedifferencebetweenthetwogroupswassignificant,thebiologistscalculatedat-teststatistic,asshownbelow:

SE= 5:1,:+ 51

1

,1= 8.e2

2a + ^.282a =1.089

Themeandifference(absolutevalue)=|&2 − &.|=|9.6−14.4|=4.8

t=|*:0*1|qr = `._2.a_7=4.41Thereare(10+10−2)=18degreesoffreedom,sothecriticalvalueforp=0.05is2.10fromTable8.Thecalculatedt-valueof4.41isgreaterthan2.10,sothestudentscanrejectthenullhypothesisthatthedifferencesinthenumbersofDaphniaeateninthepresenceorabsenceofunderwaterplantswereaccidental.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.21

Sowhatcantheyconclude?ItispossiblethatthegoldfishatesignificantlymoreDaphniaintheabsenceofunderwaterplantsthaninthepresenceoftheplants.

AnalyzingFrequencies:TheChi-SquareTestThet-testisusedtocomparethesamplemeansoftwosetsofdata.Thechi-squaretestisusedtodeterminehowtheobservedresultscomparetoanexpectedortheoreticalresult.

Forexample,youdecidetoflipacoin50times.Youexpectaproportionof50%headsand50%tails.Basedona50:50probability,youpredict25headsand25tails.Thesearetheexpectedvalues.Youwouldrarelygetexactly25and25,buthowfaroffcanthesenumbersbewithouttheresultsbeingsignificantlydifferentfromwhatyouexpected?Afteryouconductyourexperiment,youget21headsand29tails(theobservedvalues).Isthedifferencebetweenobservedandexpectedresultspurelyduetochance?Orcoulditbeduetosomethingelse,suchassomethingmightbewrongwiththecoin?Thechi-squaretestcanhelpyouanswerthisquestion.Thestatisticalnullhypothesisisthattheobservedcountswillbeequaltothatexpected,andthealternativehypothesisisthattheobservednumbersaredifferentfromtheexpected.

Notethatthistestmustbeusedonrawcategoricaldata.Valuesneedtobesimplecounts,notpercentagesorproportions.Thesizeofthesampleisanimportantaspectofthechi-squaretest—itismoredifficulttodetectastatisticallysignificantdifferencebetweenexperimentalandobservedresultsinasmallsamplethaninalargesample.Twocommonapplicationsofthistestinbiologyareinanalyzingtheoutcomesofageneticcrossandthedistributionoforganismsinresponsetoanenvironmentalfactorofinterest.

Tocalculatethechi-squareteststatistic(χ2),youusetheequation

s.= ?0@ 1

@

o=observedvalues e=expectedvalues

χ2=chi-squarevalue S=summation

CalculationSteps

1. Calculatethechi-squarevalue.ThecolumnsinTable10outlinethestepsrequiredtocalculatethechi-squarevalueandtestthenullhypothesis,usingthecoin-flippingexamplediscussedabove.Theequationsforcalculatingachi-squarevalueareprovidedineachcolumnheading.

Table10.Coin-TossChi-SquareValueCalculations

SideofCoin Observed(o) Expected(e) (o−e) (o−e)2 (o−e)2/eHeads 21 25 (−4) 16 0.64Tails 29 25 4 16 0.64

χ2=∑ (o−e)2/e→χ2=1.28

2. Determinethedegreesoffreedomvalueasfollows:

df=numberofcategories−1

Intheexampleabove,therearetwocategories(headsandtails):

df=(2−1)=1

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.22

3. Usethecriticalvaluestable(Table11)todeterminetheprobability(p)value.Ap-valueof0.05(whichisshowninredinTable11)meansthereisonlya5%probabilityofgettingtheobserveddifferencebetweenobservedandexpectedvaluesbychance,ifthenullhypothesisistrue(i.e.,thereisnorealdifference).

Forexample,fordf=1,thereisa5%probability(p-value=0.05)ofobtainingaχ2-valueof3.841orlargerbychance.Iftheχ2-valueobtainedwas4.5,thenyoucanrejectthenullhypothesisthatthereisnorealdifferencebetweenobservedandexpecteddata.Thedifferencebetweenobservedandexpecteddataislikelyrealandisconsideredstatisticallysignificant.

Iftheχ2-valuewas3.1,thenyoucannotrejectthenullhypothesis.Thedifferencebetweenobservedandexpecteddatamaybeaccidentalandisnotstatisticallysignificant.

Significancetestinginbiologytypicallyusesap-valueof0.05,whichisalsoreferredtoasthealphavalue(see“SignificanceTesting:Theα(Alpha)Level”inPart2).Aresultwiththep-valueof0.05orlowerisdeemeda

statisticallysignificantresult.

Tousethecriticalvaluestable(Table11),locatethecalculatedχ2-valueintherowcorrespondingtotheappropriatenumberofdegreesoffreedom.Forthecoin-flippingexample,locatethecalculatedχ2-valueinthedf=1row.Theχ2-valueobtainedwas1.28,whichfallsbetween0.455and2.706andissmallerthan3.841(theχ2-valueatthep=0.05cutoff);inotherwords,theresultwaslikelytohappenbetween10%and50%oftime.Therefore,youcannotrejectthenullhypothesisthattheresultshavelikelyoccurredsimplybychance,atanacceptablesignificancelevel.

Table11.CriticalValuesTableforDifferentSignificanceLevelsandDegreesofFreedom

ApplicationinBiology—Example1

Studentsjustlearnedintheirbiologyclassthatpillbugsusegill-likestructurestobreatheoxygen.Thestudentshypothesizedthatthepillbugs’gillsrequirethemtoliveinwetenvironmentsfortheirsurvival.Totestthehypothesis,theywantedtodeterminewhetherpillbugsshowapreferenceforlivinginwetordryenvironments.

Thestudentsplaced15pillbugsonthedrysideofatwo-sidedchoicechamber,and15pillbugsonthewetsideofthechamber.Fifteenminuteslater,26pillbugswereonthewetsideand4onthedryside.ThedataareshowninTable12.

Table12.PillBugLocationsonTwo-SidedChamber

ElapsedTime(min.) PillBugsonWetSide(no.) PillBugsonDrySide(no.)

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.23

0 15 15

15 26 4

Experimentalhypothesis:Pillbugs’gillsrequirethemtoliveinwetenvironmentsforsurvival.Experimentalprediction:Ifyouplacepillbugsinatwo-sidedchamberthatisdryononesideandwetontheother,theywillshowapreferenceforthewetside.Statisticalnullhypothesis:Thereisnodifferenceinthenumbersofpillbugsonthedryandwetsidesofthechamber;anydifferencebetweenthetwosidesoccurredpurelybychance.Statisticalalternativehypothesis:Thereisadifferenceinthenumbersofpillbugsonthedryandwetsidesofthechamber.

Table13.PillBugLocationChi-SquareValueCalculations

SideofChamber Observed(o) Expected(e) (o−e) (o−e)2 (o−e)2/eWet 26 15 11 121 8.07Dry 4 15 −11 121 8.07

χ2=∑ (o−e)2/e→χ2=16.14

InTable13,thedegreesoffreedom(df)=(2−1)=1.Theχ2-valueis16.14,whichismuchgreaterthanthecriticalvalueof3.841(fromthecriticalvaluestable[Table11]forap-valueof0.05).Thismeansthatthereisastatisticallysignificantdifferencebetweenexpectedandobserveddata,anditmayindicatethatthepillbugspreferonesideofthechambertotheother.

Notethatanalternativehypothesisisneverproventruewithanystatisticaltestlikethechi-squareTest.Thisstatisticaltestonlytellsyouwhetherthenullhypothesiscanorcannotberejected.Thereisalwaysachance,howeversmall,thattheobserveddifferencecouldhaveoccurredbychanceevenifthenullhypothesisistrue.Likewise,failingtorejectthenullhypothesisdoesnotnecessarilymeanthatitistrue.Theremightbeadifferencebetweentheobservedandexpecteddatathatwastoosmalltodetectwiththesamplesizeoftheexperiment.

ApplicationtoBiology—Example2

Onecommonapplicationforthechi-squaretestisageneticcross.Inthiscase,thestatisticalnullhypothesisisthattheobservedresultsfromthecrossarethesameasthoseexpected,forexample,the3:1ratioor1:2:1ratioforaMendeliantrait.

Dr.WilliamCresko,aresearcherattheUniversityofOregon,conductedseveralcrossesbetweenmarinesticklebackfishandfreshwatersticklebackfish.Allmarinesticklebackfishhavespinesthatprotrudefromthepelvis,whichpresumablyserveasprotectionfromlargerpredatoryfish.Manyfreshwatersticklebackpopulationslackpelvicspines.Dr.CreskowantedtofindoutwhetherthepresenceorabsenceofpelvicspinesbehaveslikeaMendeliantrait,meaningthatitislikelytobecontrolledmainlybyasinglegene.

Inonecross,marinesticklebackwithspineswerecrossedwithsticklebackfromBearPawLake,whichdon’thavepelvicspines.Alltheprogenyfishfromthiscross,theso-calledF1generation,hadpelvicspines.Dr.CreskothentooktheF1offspringandconductedseveralcrossesbetweenthemtoproducetheF2generation.TheresultsoftheF2crossesareshowninTable14.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.24

Table14.F2Generation:CrossofF1GenerationIndividuals

Cross Number Total Number of F2 Fish F2 Fish with Spines F2 Fish without Spines 1 98 71 27 2 79 62 17 3 62 49 13 4 34 28 6 5 29 24 5 6 23 17 6 7 21 17 4 8 19 18 1 9 15 11 4 10 12 10 2 11 12 10 2 12 4 3 1 Total 408 320 88

Source:Cresko,WilliamA.,A.Amores,C.Wilson,J.Murphy,M.Currey,P.Phillips,M.Bell,C.Kimmel,andJ.Postlethwait.“ParallelGeneticBasisforRepeatedEvolutionofArmorLossinAlaskanThreespineSticklebackPopulations.”ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica101(2004):6050–6055.

IfthepresenceofpelvicspinesiscontrolledbyasinglegeneandthepresenceofpelvicspinesisthedominanttraitassuggestedbytheF1results,youwouldexpectaratioof3:1forfishwithpelvicspinestofishwithoutpelvicspinesintheF2generation.Foratotalof408fish,theexpectedresultswouldbe306:102.TheresultsfromDr.Cresko’scrossesare320:88.

Thenullhypothesisisthatthereisnorealdifferencebetweentheexpectedresultsandtheobservedresults,andthatthedifferencethatweseeoccurredpurelybychance.Thestatisticalalternativehypothesisisthatthereisarealdifferencebetweenobservedandexpectedresults.

Table15.SticklebackSpineChi-SquareValueCalculations

Phenotype Observed(o) Expected(e) (o−e) (o−e)2 (o−e)2/eSpinespresent 320 306 14 196 0.64Spinesabsent 88 102 −14 196 0.64

χ2=∑ (o−e)2/e→χ2=1.28

Theχ2-valueis1.28,whichislessthanthecriticalvalueof3.841(fromthecriticalvaluestable[Table11]forap-valueof0.05andadfof1).Thismeansthatthedifferencebetweenexpectedandobserveddataisnotstatisticallysignificant.Basedonthiscalculation,wecannotrejectthenullhypothesisandconcludethatanydifferencebetweenobservedandexpectedresultsmayhaveoccurredsimplybychance.

Thechi-squareexampleaboveisprovidedintheBioInteractiveactivity“UsingGeneticCrossestoAnalyzea

SticklebackTrait,”http://www.hhmi.org/biointeractive/using-genetic-crosses-analyze-stickleback-trait.

Anotherapplicationofchi-squaretogeneticsisavailableintheactivity“MappingGenestoTraitsinDogs

UsingSNPs,”http://www.hhmi.org/biointeractive/mapping-traits-in-dogs.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.25

MeasuringCorrelationsandAnalyzingLinearRegressionCorrelationscansuggestrelationshipsbetweensetsofdata.Thecorrelationcoefficient(A)providesameasureofhowrelatedtwovariablesare,anditisexpressedasavaluebetween+1and−1.Thecloserthevalueisto0,theweakerthecorrelation.

Forexample,ifyouplotthewidthofanoak(Quercussp.)leaf(Y)onanxyscatterplotasafunctionoftheleaf’slength(X),thecorrelationcoefficient(A)indicateshowmuchwidthdependsonlength.AnA-valueequalto+1wouldindicateaperfectpositivecorrelationbetweenwidthandlength.Inotherwords,thelongeranoakleaf,thewideritis.AnA-valueof−1wouldindicateaperfectnegativecorrelation—thelongeranoakleaf,thenarroweritis.Ifthereisnocorrelationbetweentwovariables,theA-valueequals0,whichwouldmeanthatthereisnorelationshipbetweenoakleaflengthandwidth.Thenullhypothesis(H0)foracorrelationisthatthereisnocorrelationandA=0.

CalculatingAinvolvesdeterminingthesamplemeanofthepredictorvariable(&)anditsstandarddeviation(-*),thesamplemeanoftheresponsevariable(t)anditsstandarddeviation(-u),andthenumberofpairs(X,Y)ofindividualsinthesample(#):

A=B+CB;B

<+D:

E+CE;E

,02 Anotherstatistic,calledthecoefficientofdetermination,usesthesquareofA.TheA.-valuetellsusthestrengthoftherelationshipbetweenXandY.

Whencalculatingcorrelations,itisimportantforyoutorememberthatcorrelationdoesnotimplycausation.Forexample,Figure7showsthatthereisastrongnegativecorrelationbetweenthemeantemperatureofEarthoverthelast190yearsandthenumberofpiratesintheCaribbean.Clearly,adecreaseinthenumberofpiratesisnotthecauseofglobalwarming.

Figure7.MeanGlobalTemperature(°C)asaFunctionoftheApproximateNumberofPirates

intheCaribbean,1820–2000.Thelineisthelinearregression.Statisticsarecorrelationcoefficient(v)andthecoefficientofdetermination(v\).

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.26

CalculationSteps

Studentswantedtoseeifthereisanassociationbetweenthemassofthetomatoesonatomatoplantandthenumberofseedsinthetomatoes.Theypicked10tomatoesfromtheplant,cuteachoneopen,andcountedtheseeds.ThedataandcalculationsfordeterminingtheA-valueareshowninTable16.

Table16.NumberofSeedsandMass(g)ofTomatoes

TomatoNo. Mass(Z) No.ofSeeds

(w)Z] − ZcZ

w] − wcw

Z] − ZcZ

w] − wcw

1 77.5 20 −0.241 −0.530 0.1282 81.2 26 1.439 1.061 1.5273 75.9 22 −0.967 0 04 78.1 22 0.032 0 05 77.1 18 −0.422 −1.061 0.4486 76.3 20 −0.785 −0.530 0.4167 75.7 18 −1.058 −1.061 1.1238 82.3 30 1.938 2.121 4.1109 78.9 24 0.395 0.530 0.20910 77.3 20 −0.331 −0.530 0.175Mean &=78.03g t=22seeds &' − &

-*t' − t-u

=8.136StandardDeviation -*=2.203 -u=3.771

A=  B+CB;B

E+CE;E

,02 = _.2b82a02=_.2b87 =0.904

1. Calculate&,-*,t,and-uasshowninTable16.

2. Determine*+0*5B

andu+0u5E,multiplythetwoforeachtomatosample,andthensumtheresultsas

showninTable16:

*+0*5B

u+0u5E

=8.136

3. CalculateA=8.136/9=0.904.

4. ComparethecalculatedA-valuewiththecriticalvalueofAata(theH0rejectionlevel)=0.05.See

Table17forcriticalA-values.Thedegreesoffreedomiscalculatedbyaddingthetotalnumberofdatapointsminus2.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.27

Table17.Criticalv-Valuesatα=0.05DegreesofFreedom(npairs–2) ±vxv]y(0.05)1 0.9972 0.9503 0.8784 0.8115 0.7546 0.7077 0.6668 0.6329 0.60210 0.57611 0.55312 0.53213 0.51414 0.49715 0.482

NotethatA-valuescanindicateeitherpositiveornegativecorrelationsandcanthereforevarybetween+1and−1.Therefore,criticalA-valuesarelistedas+/−Ainthetable.

InTable17,thevalueforAz{'|is±0.632for8degreesoffreedom(10pairsofobservations–2).ThecalculatedA-valueis0.904,whichismuchcloserto+1(perfectpositivecorrelation)thanto0.632.Thismeansthattheprobabilityofgettingavalueasextremeas0.904purelybychanceifH0istrue(A=0)islessthan0.05.Therefore,studentscanrejectH0andconcludethatthereisastatisticallysignificantassociationbetweenthemassofthestudents’tomatoesandthenumberofseedseachtomatohas.

Togoonestepfurther,itispossibletocalculatethecoefficientofdetermination,A.:

A.=(0.904)2=0.817TheA.-valueof0.817denotesthestrengthofthelinearassociationbetweenthemassofthetomatoes(x)andthenumberofseeds(y).Thecloserthenumberisto1,thestrongertheassociation.

Finally,ascatterplotgraphcanbeusedtoillustratetherelationship(Figure8).Itispossibletodrawastraightlinethatgoesthroughthepoints—calledalineofbestfit.Thelineofbestfitmaypassthroughsomeofthepoints,noneofthepoints,orallofthepoints.

Figure8.NumberofSeedsCountedinTomatoes

asaFunctionofTomatoMass.Statisticsarecorrelationcoefficient(v)andthecoefficientofdetermination(v\).

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.28

Thecoefficientofdeterminationisameasureofhowwelltheregressionlinerepresentsthedata.Iftheregressionlinepassesexactlythrougheverypointonthescatterplot,itwouldbeabletoexplainallofthevariation.Thefurtherthelineisawayfromthepoints,thelessitisabletoexplain.InFigure8,81.7%ofthevariationinycanbeexplainedbytherelationshipbetweenxandy.

Note:Manybiologicaltraits(i.e.,animalbehaviororphysicalappearance)varygreatlyamongindividualsinapopulation.Thus,acoefficientofdeterminationof0.817betweentwobiologicalvariablesisconsideredveryhigh.However,forastandardcurve,asdescribedin“StandardCurves”inPart3,itwouldbeconsideredlow.

ApplicationinBiology—Example1

Studentswerecurioustolearnwhetherthereisanassociationbetweenamountsofalgaeinpondwaterandthewater’sclarity.Theycollectedwatersamplesfromsevenlocalpondsthatseemedtodifferinwaterclarity.Toquantifytheclarityofthewater,theycutoutasmalldiskfromwhiteposterboard,dividedthediskintofourequalparts,andcoloredtwooftheoppositepartsblack;theythenplacedthediskinthebottomofa100-millilitergraduatedcylinder.Foreachsample,thestudentsslowlypouredpondwaterintothecylinderuntilthediskwasnolongervisiblefromabove.InTable18theyrecordedthevolumeofwaternecessarytoobscurethedisk—themorewaternecessarytoobscurethedisk,theclearerthewater.Asaproxyforalgaeconcentration,theyextractedchlorophyllfromthewatersamplesandusedaspectrophotometertodeterminechlorophyllconcentration(Table18).

Table18.ChlorophyllConcentration(μg/L)andClarityofPondWater

Pond Chlorophyll

Concentration(Z)(μg/L)

WaterClarity(w)(mL)

Z] − ZcZ

w] − wcw

Z] − ZcZ

w] − wcw

Sandy’s 14 28 0.672 −0.656 −0.441Herron 5 68 −0.956 1.077 −1.029Tommy’s 10 32 −0.052 −0.482 −0.025Rocky 7 54 −0.594 0.470 −0.280Fishing 17 18 1.214 −1.089 −1.323Lost 16 25 1.033 −0.786 −0.812Sunset 3 77 −1.318 1.467 −1.933Mean &=10.29μg/L t=43.14mL &' − &

-*t' − t-u

,

'}2

=−5.792StandardDeviation

-*=5.529 -u=23.083

A=B+CB;B

E+CE;E

<+D:

,02 =0^.e7.e02 =0^.e7.

8 =−0.965

Note:Waterclarityisgivenasthevolumeofwaterinmilliliters(mL)requiredtoobscureablack-and-whitediskatthebottomofa100-millilitergraduatedcylinder.Agreatervolumeindicatesclearerwater.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.29

Figure9.ClarityofPondwaterasaFunctionof

ChlorophyllConcentration.Waterclaritywasmeasuredasthevolumeofwaternecessarytoobscureablack-and-whitediskinthebottomofa100-millilitergraduatedcylinder.Statisticsarecorrelationcoefficient(v)andthecoefficientofdetermination(v\).

JustbygraphingthedatapointsandlookingatthegraphinFigure9,itisclearthatthereisanassociationbetweenwaterclarityandchlorophyllconcentration.Thelineslopesdown,sothestudentsknowtherelationshipisnegative:aswaterclaritydecreases,chlorophyllconcentrationincreases.

WhenstudentscalculateA,theygetavalueof−0.965,whichisastrongcorrelation(with−1beingaperfectnegativecorrelation).TheycanconfirmthisbycheckingthecriticalA-valueonTable17,whichis±0.754for5degreesoffreedom(7–2)witha0.05confidencelevel.SincetheAof−0.965iscloserto−1thantheAz{'|of−0.754,theycanconcludethattheprobabilityofgettingavalueasextremeas−0.965purelybychanceislessthan0.05.Therefore,theycanrejectH0andconcludethatchlorophyllconcentrationandwaterclarityaresignificantlyassociated.

Moreover,thecoefficientofdetermination(A.)of0.932iscloseto1.Asyoucanseefromthegraph,mostofthepointsareclosetotheline.

ApplicationinBiology—Example2

Studentsnoticedthatsomeponderosapinetrees(Pinusponderosa)onastreethadmoreovulatecones(femalepinecones)thanotherponderosapinetrees.Theyhypothesizedthatthenumberofpineconeswasafunctionoftheageofthetreeandpredictedthattallertreeswouldhavemoreconesthanyounger,shortertrees.Todeterminetheheightofatree,theyusedthe“oldlogger”method.Astudentheldastickthesamelengthasthestudent’sarmata90°angletothearmandbackedupuntilthetipofthestick“touched”thetopofthetree.Thedistancethestudentwasfromthetreeequaledtheheightofthetree.Usingthismethod,thestudentsmeasuredtheheightsof10trees.Then,usingbinoculars,theycountedthenumberofovulateconesoneachtreeandrecordedthedatainTable19.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.30

Table19.NumberofOvulateConesonPonderosaPineTreesofDifferentHeights

TreeNo. TreeHeight(m) No.ofCones Z] − ZcZ

w] − wcw

Z] − ZcZ

w] − wcw

1 10.5 75 1.226 1.034 1.2672 7.2 68 0.133 0.790 0.1053 4.3 59 −0.828 0.475 −0.3934 7.9 46 0.364 0.021 0.0085 3.8 8 −0.994 −1.307 1.2986 8.3 56 0.497 0.370 0.1847 3.4 25 −1.126 −0.713 0.8028 4.1 13 −0.894 −1.132 1.0129 12.3 15 1.822 −1.062 −1.93410 6.2 89 −0.199 1.523 −0.303

Mean &=6.8m t=45.4cones

StandardDeviation

-*=3.019 -u=28.625

A=B+CB;B

E+CE;E

<+D:

,02 = ..a`82a02=..a`87 =0.227

Figure10.NumberofOvulateConesonPonderosaPineTreesasaFunctionofTreeHeight(m).Statisticsarecorrelationcoefficient(v)andthecoefficientofdetermination(v\).JustlookingatthedatapointsinFigure10,itishardtoknowwhetherthereisacorrelationornot.Ifthereisacorrelation,itisnotverystrong.Drawingthelineofbestfitsuggestsapositivecorrelation.ThisisclearlyacaseinwhichcalculatingAwillhelpdeterminewhetherthecorrelationisstatisticallysignificant.

InTable17,Az{'|is±0.632for8degreesoffreedom(10–2).Thecalculatedr-valueis0.227,whichisfurtherawayfrom+1thanAz{'|0.632,sotheprobabilityofgettingavalueof0.227purelybychanceisgreaterthan0.05(p>0.05).Therefore,studentscannotrejectH0andcanconcludethatthereisnotastatisticallysignificantassociationbetweenthenumbersofovulateconesonponderosapinetreesandtheheightsofthetrees.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.31

Part3:CommonlyUsedCalculationsinBiology

RelativeFrequencyRelativefrequencyistheratioofthenumberoftimesaneventoccursoutofatotalnumberofevents.Thiscalculatedfractioncanthenbeconvertedintoapercentageofthetotalnumberofeventsmeasured.

relativefrequency=(numberoftimesaspecificeventoccurs)/(totalnumberofevents)

Theresultcanbemultipliedby100togivethepercentage.

ApplicationinBiology

Alleleandgenotypefrequenciesarecommonlycalculatedbypopulationgeneticists.Forinstance,inapopulationof350peaplants,suppose112arehomozygousforthedominantyellowpeaseedallele(YY),139areheterozygous(Yy),and99arehomozygousfortherecessivegreenpeaseedallele(yy).

Todeterminetherelativefrequency(andpercentage)ofplantsinthispopulationthatarehomozygousforthedominantyellowpeaseedallele,youshoulddividethenumberofplantsthatarehomozygousfortheyellowpeaseedallelebythetotalnumberofplants:

relativefrequencyofthehomozygousdominant(YY)genotype=112/350=0.32

Toexpressfrequencyasapercentage,multiplythefrequencyby100%:percentage=0.32×100=32%ofthepopulationhasthehomozygousdominantgenotype

Todeterminetherelativefrequency(andpercentage)oftherecessivegreenseedallele,dividethetotalnumberofgreenseedallelesinthegenepoolbythetotalnumberofallelesinthepopulation:

relativefrequencyoftherecessiveallele(y)=[(139×1)+(99×2)]/(350×2)=337/700=0.48percentage=0.48×100=48%ofthegenepoolistherecessivegreenpeaseedallele

ProbabilityYoulearnedinthe“SignificanceTesting”sectionofPart2thataprobabilityof0.05meansthatthereisa5%chanceforaneventtohappen—forexample,a5%chanceofobtainingaparticularteststatisticbychance.Thissectionprovidesmoreinformationaboutprobabilityandhowtocalculateitfordifferentscenarios.

Probabilityallowsscientiststopredictthelikelihoodoftheoutcomeofrandomevents.Probability(p)valuesliebetween1(theeventiscertaintohappen)and0(theeventcertainlywillnothappen).Theprobabilitiesforallothereventshavefractionalvalues.Forexample,theprobabilityofthrowinga2onasix-sideddieis1outof6(p=1/6),sincethenumber2appearsononlyoneofthesixsides.Bycontrast,theprobabilityofthrowinga7onanormalsix-sideddieis0.

RuleofAddition

Theprobabilityofeitheroftwomutuallyexclusiveeventsoccurringisequaltothesumoftheirindividualprobabilities.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.32

Example

Givenanormalsix-sideddie,whatistheprobabilityofyourollingeithera2ora4onthedie?Theseeventsaremutuallyexclusivebecausetheycannothappenatthesametime—thatis,inasinglerollofthedieyoucannotrollbotha2anda4.Thereisa1/6chanceofrollinga2.Thereisa1/6chanceofrollinga4.Theprobabilityofeithereventoccurringisequaltothesumoftheprobabilityofeachevent:

p=1/6+1/6=2/6=1/3

Thereis1chancein3ofyourollingeithera2ora4onthedie.

RuleofMultiplication

Theprobabilityoftwoindependenteventsbothoccurringistheproductoftheirindividualprobabilities.

Example

Givenanormalsix-sideddie,whatistheprobabilityofyourollinga2andthena4ontwoconsecutiverolls?Theseeventsareindependentofoneanotherbecausetheyhavenoeffectoneachother’soccurrence—thatis,ifyourollasix-sideddietwice,rollinga2onthefirstrollhasnoeffectonwhetheryouwillrolla4onthesecondroll.Onthefirstroll,thereisa1/6chanceofrollinga2.Onthesecondroll,thereisa1/6chanceofrollinga4.Theprobabilityofrollinga2firstanda4secondfollows: p=1/6×1/6=1/36Thereis1chancein36ofrollinga2andthena4ontwoconsecutiverollsofthedie.

ApplicationinBiology—Example1

Whatistheprobabilitythattwoparentswhoareheterozygousforthesicklecellallelewouldhavethreechildreninarowwhoarehomozygousforthesicklecellalleleandhavesicklecellanemia?

Theprobabilityoftwoparentswhoareheterozygousforanalleletohaveachildwhoishomozygousforthatalleleis1in4:

p=1/4×1/4×1/4=1/64Thereis1chancein64thattheseparentswillhavethreechildreninarowwithsicklecelldisease.

ApplicationinBiology—Example2

Twopeaplantsthatareheterozygousfortheround(R)andyellow(Y)alleles(RrYy)arecrossedandproduceonlyasingleseed.WhatistheprobabilityofaseedfromthiscrosshavingthegenotypeRRYyorRRYY?

TheprobabilityofgettingaseedwiththeRRYygenotypeis1/4×1/2=1/8=2/16.TheprobabilityofgettingaseedwiththeRRYYgenotypeis1/4×1/4=1/16.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.33

ToobtaintheprobabilityofgettingaseedwiththeRRYygenotypeortheRRYYgenotype,weusetheruleofaddition:

p=2/16+1/16=3/16Thereare3chancesin16oftheseplantsproducingaseedwitheithertheRRYyorRRYYgenotype.

RateCalculationsRateisusedtoexpressonemeasuredquantity(y)inrelationtoanothermeasuredquantity(x).

Inbiology,ratesareoftencalculatedtoindicatethechangeinapropertyofasystemovertime.Forexample,therateofanenzyme-catalyzedreactionisfrequentlyexpressedastheamountofproductproducedbytheenzymeinagivenamountoftime.Whenyouusedataplottedonagraph,youcalculatetherateinthesamewayasyoucalculatetheslope:

rate=Δy/Δx

(Thedeltasymbol,Δ,representschange.)ApplicationinBiology

Studentsinanadvancedbiologyclassstudiedthereactioncatalyzedbythecatalaseenzyme.Catalasedegradeshydrogenperoxide(H2O2)towater(H2O)andoxygengas(O2).ThestudentssetupanexperimenttomeasuretheamountofO2producedbycatalaseover5minuteswhenitisaddedtoH2O2.Table20containsthedatacollectedbyagroupofstudents,andFigure11showsthecorrespondinggraph.Fromthesedata,ratesofcatalaseactivitycanbecalculatedovervariousintervalsoftime.

Table20.VolumeofOxygenProducedfromtheCatalysisofHydrogenPeroxidebytheEnzymeCatalase

Time(min.) VolumeofOxygenProduced(mL)

0 01 122 253 334 395 42

Figure11.OxygenProducedinaCatalase-CatalyzedReactionasaFunctionofTime

0

10

20

30

40

50

0 1 2 3 4 5 6

VolumeofOxygen

Produced(mL)

Time(min.)

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.34

Tocalculatetheinitialrateofreactioninthefirst2minutesoftheexperiment,studentsfirstsubtractedthevolumeofoxygenproducedat2minutesfromthevolumeofoxygenproducedat0minutes:

Δy=25milliliters–0milliliters=25milliliters

Next,theysubtractedthenumberofminutesat2minutesfromthenumberofminutesat0minutes:

Δx=2minutes–0minutes=2minutes

Finally,theydividedthechangeinoxygenvolume(Δy)bythechangeintime(Δx);thisistherate:

rate=25milliliters/2minutes=12.5millilitersofO2produced/minute

Similarly,theycancalculatetherateofreactionbetweenthethirdandfifthminutesoftheexperiment:

rate=(42milliliters–33milliliters)/(5minutes–3minutes)=9/2=4.5millilitersofO2produced/minute

Hardy-WeinbergFrequencyCalculationsTheHardy-Weinbergequationsareusedinpopulationgeneticstodescribethebasicprinciplethatallelefrequenciesdonotchangeinalarge,freelyinterbreedingpopulationfromonegenerationtothenext.

Allelefrequenciesinapopulationareinequilibrium(donotchange)whenallthefollowingconditionsaremet:

1. Thepopulationisverylargeandwellmixed.2. Thereisnomigrationinoroutofthepopulation.3. Mutationsarenotoccurring.4. Matingisrandom.5. Thereisnonaturalselection.

UnderHardy-Weinberg,ifthefrequenciesoftwoallelesarepandq,thefrequenciesofhomozygotesarep2

andq2,andofheterozygotes2pq.Ifyouaregiventhefrequenciesoftheallelesyoucancalculatedthegenotypefrequenciesbysquaringandmultiplying;ifyouaregivenahomozygousgenotypefrequency,you

canestimatetheallelefrequenciesbytakingthesquareroot.

Hardy-Weinbergpredictsthattheallelefrequenciesinapopulationareatequilibrium,wherebyp+q=1.0

IftheobservedallelefrequenciesinapopulationdifferfromthefrequenciespredictedbytheHardy-Weinbergprinciple,thenthepopulationisnotatequilibriumandevolutionmaybeoccurring.

ApplicationinBiology

Inahypotheticalpopulationof100rockpocketmice(Chaetodipusintermedius),81individualshavelight,sandy-coloredfurandaddgenotype.Theremaining19individualsaredarkcoloredandthereforehaveeithertheDDgenotypeortheDdgenotype.Scientistsassumedthatthispopulationisatequilibrium;theyusedtheHardy-Weinbergequationstofindpandqforthispopulationandcalculatedthefrequencyofheterozygousgenotypes.

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.35

Scientistsknewthat81micehavetheddgenotype:q2=81/100=0.81,or81%

Next,theycalculatedq:

q= 0.81=0.9

Then,theycalculatedpusingtheequationp+q=1:

p+(0.9)=1p=0.1

Tocalculatethefrequencyoftheheterozygousgenotype,theycalculated2pq:

2pq=2(0.1)(0.9)=2(0.09)2pq=0.18

Basedonthecalculations,theestimatedfrequencyoftherecessivealleleis0.9andthefrequencyofthedominantalleleis0.1.

Ifthescientistshadawaytodistinguishmicethatareheterozygotesfromthosethatarehomozygousdominantforthedark-coloredfur,thentheywouldhaveawayofdeterminingwhetherthepopulationisorisnotatequilibriumandcouldapplyastatisticaltestlikethechi-squaretesttoseeifthereisadifference.

StandardCurvesAstandardcurveisamethodofquantitativedataanalysisinwhichmeasurementsofsampleswithknownpropertiesareplottedonagraphandthenthegraphisanalyzedtodeterminethepropertiesofunknownsamples.Analysisofthegraphisperformedbydrawingalineofbestfitthroughtheplottedpointsoftheknownsamplesandthendeterminingtheequationofthisline(intheformy=mx+b)orbyinterpretingthevaluesofunknownsamplesdirectlyfromthedrawnline.Thesampleswithknownpropertiesarethestandards,andthegraphisthestandardcurve.TwocommonusesofstandardcurvesinbiologyaretodetermineproteinconcentrationsandtoanalyzeDNAfragmentlength.

ApplicationinBiology—Example1:TheBradfordProteinAssay

TheBradfordproteinassayisacolorimetricassaythatdeterminestheproteinconcentrationofasolutionbymeasuringhowmuchlightofacertainwavelengthitabsorbs.Thelightabsorbanceofseveralsampleswithknownproteinconcentrationsismeasuredusingaspectrophotometerandthenplottedonagraphasafunctionofproteinconcentration.Usingthisgraph,orlinearregressionanalysis,scientistsdeterminedtheproteinconcentrationofanunknownsampleonceitsabsorbancewasmeasured.

Table21.AbsorbanceMeasuredat595NanometersofVariousKnownProteinConcentrations

KnownProtein

Concentration(µg/mL)

MeasuredAbsorbance(at

595nm)

1 0.4335 0.74210 1.03615 1.46320 1.750

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.36

Figure12.ProteinConcentrationasaFunctionofAbsorbance

InFigure12,absorbanceinTable21wasplottedasafunctionofproteinconcentrationfortheknownsamples(standards).Thecalculatedcoefficientofdetermination(A.)andequationoftheregressionlineareincludedonthegraph.TheclosertheA.valueisto1,thebetterthedatafitthecurve—orthemorelikelythatthedatapointsxandyareactualsolutionstotheequationy=mx+b.

Inthiscase,theequationofthelineis

y=0.0699x+0.3722

Theabsorbanceoftheunknownproteinsolutionwasmeasuredwithaspectrophotometeras0.921(y=0.921),sothescientistsusedtheequationofthebest-fitlinetodeterminetheproteinconcentration(x):

0.921=0.0699x+0.3722

Proteinconcentrationofunknown:x=7.85microgramspermilliliter

TheredlinesdrawnonthegraphinFigure12showhowthescientistsestimatedthevalueoftheunknownproteinconcentrationfromtheregressionline.Todeterminethisestimate,theylocatedtheabsorbanceof0.921onthey-axisandtracedithorizontallytoitsintersectionwiththeregressionline.Averticallinefromtheintersectionwillcrossthex-axisatthecorrespondingproteinconcentration.

ApplicationinBiology—Example2:DNAFragmentSizeAnalysis

InRFLP(restrictionfragmentlengthpolymorphism)analysis,thefragmentsizesofunknownDNAsamplescanbedeterminedfromthestandardcurveofDNAmarkersofknownfragmentlengths.First,scientistsmeasuredthedistancetraveledbyeachofthemarkerfragmentsinagelplateandplotteditasafunctionofsize.Thisprovidesastandardforcomparisontointerpolatethesizeoftheunknownfragments(Table22).

y =0.0699x +0.3722r²=0.9965

0

0.5

1

1.5

2

0 5 10 15 20 25

Absorbance(at595nm)

ProteinConcentration(μg/mL)

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.37

Table22.DistanceDNAFragmentsofDifferentKnownLengthsMigrateafterGelElectrophoresis

DistanceMigrated(mm) FragmentLength(no.of

basepairs)

11 23,13013 9,41615 6,55718 4,36123 2,32224 2,027

Figure13.MarkerFragmentLengthasaFunctionofDistanceMigrated

Note:ThestandardcurveinFigure13wasplottedonalogarithmicy-axisscale,becausetherelationshipbetweenfragmentlengthanddistancemigratedisexponential,ornonlinear.

ScientistsestimatedthelengthofanunknownDNAfragmentthatmigrated20millimetersbyusingthegraphinFigure13,asillustratedbytheredlines.Toestimatetheunknownlength,theylocatedthedistanceof20millimetersonthex-axisandtracedaverticallinetothelineofbestfit.Ahorizontallinefromthepointofintersectionwillcrossthey-axisatthecorrespondingfragmentlength.Inthiscase,theyestimatedthefragmentlengthtobe3,100basepairs.

Authors:

WrittenbyPaulStrode,PhD,FairviewHighSchool,CO,andAnnBrokaw,RockyRiverHighSchool,OHEditedbyLauraBonetta,PhD,HHMIReviewedbyBradWilliamson,TheUniversityofKansas;JohnMcDonald,PhD,UniversityofDelaware;SandraBlumenrath,PhD,andSatoshiAmagai,PhD,HHMICopyeditedbyBarbaraReschGraphicsbyHeatherMcDonald,PhD,andBillPietsch

100

1,000

10,000

100,000

5 10 15 20 25 30

DNAFragmentLength(bp)

MigrationDistance(mm)

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.38

Part4:HHMIBioInteractiveMathematicsandStatisticsClassroomResources

BioInteractiveResourceName

(HyperlinkedTitles)

Mean

Median

Mode

Range

Frequency

Rate

VarianceandStandard

Deviation

StandardErrorand95%

ConfidenceIntervals

Chi-SquareAnalysis

Studentt-Test

LinearRegressionand

Pearson’sCorrelation

Hardy-W

einberg

Probability

StandardCurves

DietandtheEvolutionofSalivaryAmylaseStudentsanalyzedataobtainedfromtwodifferentresearchstudiesinordertodrawconclusionsbetweenAMY1genecopynumberandamylaseproduction;andalsobetweenAMY1genecopynumberanddietarystarchconsumption.Theactivityinvolvesgraphing,analyzingresearchdatautilizingstatistics,makingclaims,andsupportingtheclaimswithscientificreasoning.

X X X X X

EvolutioninAction:GraphingandStatisticsStudentsanalyzefrequencydistributionsofbeakdepthdatafromPeterandRosemaryGrant’sGalapagosfinchstudyandsuggesthypothesestoexplainthetrendsillustratedinthegraphs.Studentstheninvestigatetheeffectofsamplesizeondescriptivestatisticsandnoticethatthemeansandstandarddeviationsvaryforeachsubsample.Finally,studentsusewinglengthandbodymassdatatoconstructbargraphsandareaskedtoproposeexplanationsforhowandwhysomecharacteristicsaremoreadaptivethanothersingivenenvironments.

X X

EvolutioninAction:StatisticalAnalysisStudentscalculatedescriptivestatistics(mean,standarddeviation,and95%confidenceintervals)foreightsetsofdatafromPeterandRosemaryGrant’sGalapagosfinchstudy.Studentsconstructbargraphswith95%confidenceintervalsandanalyzethemeansoffinchbodymeasurementswitht-Tests.Studentsalsographtwoofthefinchmeasurementsagainsteachothertoinvestigateapossibleassociation.

X X X X

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.39

BioInteractiveResourceName

(HyperlinkedTitles)

Mean

Median

Mode

Range

Frequency

Rate

VarianceandStandard

Deviation

StandardErrorand95%

ConfidenceIntervals

Chi-SquareAnalysis

Studentt-Test

LinearRegressionand

Pearson’sCorrelation

Hardy-W

einberg

Probability

StandardCurves

LizardEvolutionVirtualLabThevirtuallabincludesfourmodulesthatinvestigatedifferentconceptsinevolutionarybiology,includingadaptation,convergentevolution,phylogeneticanalysis,reproductiveisolation,andspeciation.Eachmoduleinvolvesdatacollection,calculations,analysisandansweringquestions.The“Educators”tabincludeslistsofkeyconceptsandlearningobjectivesanddetailedsuggestionsforincorporatingthelabinyourinstruction.

X X X X

TheVirtualSticklebackEvolutionLabThisvirtuallabisappropriateforthehighschoolbiologyclassroomasanexcellentcompaniontoanevolutionunit.Becausethetraitunderstudyisfishpelvicmorphology,thelabcanbeusedforlessonsonvertebrateformandfunction.Inanecologyunit,thelabcanbeusedtoillustratepredator-preyrelationshipsandenvironmentalselectionpressures.Thesectionsongraphing,dataanalysis,andstatisticalsignificancemakethelabagoodfitforaddressingthe"scienceasaprocess"or"natureofscience"aspectsofthecurriculum.

X X

BattlingBeetlesThisseriesofactivitiescomplementstheHHMIDVDEvolution:ConstantChangeandCommonThreads,andrequiressimplematerialssuchasM&Ms,foodstoragebags,coloredpencils,andpapercups.AnextensionofthisactivityallowsstudentstomodelHardy-WeinbergandselectionusinganExcelspreadsheet.TheoverallgoalofBattlingBeetlesistoengagestudentsinthinkingaboutthemechanismofnaturalselectionthroughdatacollection,analysisandpatternrecognition.

X X

AlleleandPhenotypeFrequenciesinRockPocketMousePopulations

-AlessonthatusesrealrockpocketmousedatacollectedbyDr.MichaelNachmanandhiscolleaguestoillustratetheHardy-Weinbergprinciple.

X X

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.40

BioInteractiveResourceName

(HyperlinkedTitles)

Mean

Median

Mode

Range

Frequency

Rate

VarianceandStandard

Deviation

StandardErrorand95%

ConfidenceIntervals

Chi-SquareAnalysis

Studentt-Test

LinearRegressionand

Pearson’sCorrelation

Hardy-W

einberg

Probability

StandardCurves

PopulationGenetics,Selection,andEvolutionThishands-onactivity,usedinconjunctionwiththefilmTheMakingoftheFittest:NaturalSelectioninHumans,teachesstudentsaboutpopulationgenetics,theHardy-Weinbergprinciple,andhownaturalselectionaltersthefrequencydistributionofheritabletraits.Itusessimplesimulationstoillustratethesecomplexconceptsandincludesexercisessuchascalculatingalleleandgenotypefrequencies,graphingandinterpretationofdata,anddesigningexperimentstoreinforcekeyconceptsinpopulationgenetics.

X X

MendelianGenetics,Pedigrees,andChi-SquareStatisticsThislessonrequiresstudentstoworkthroughaseriesofquestionspertainingtothegeneticsofsicklecelldiseaseanditsrelationshiptomalaria.Thesequestionswillprobestudents'understandingofMendeliangenetics,probability,pedigreeanalysis,andchi-squarestatistics.

X

UsingGeneticCrossestoAnalyzeaSticklebackTraitThishands-onactivityinvolvesstudentsapplyingtheprinciplesofMendeliangeneticstoanalyzetheresultsofgeneticcrossesbetweensticklebackfishwithdifferenttraits.Studentsusephotosofactualresearchspecimens(theF1andF2cards)toobtaintheirdata;theythenanalyzethedatatheycollectedalongwithadditionaldatafromthescientificliterature.Intheextensionactivity,studentsusechi-squareanalysistodeterminethesignificanceofgeneticdata.

X X

PedigreesandtheInheritanceofLactoseIntoleranceInthisclassroomactivity,studentsanalyzethesameFinnishfamilypedigreesthatresearchersstudiedtounderstandthepatternofinheritanceoflactosetolerance/intolerance.TheyalsoexamineportionsofDNAsequencenearthelactasegenetoidentifyspecificmutationsassociatedwithlactosetolerance.

X

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.41

BioInteractiveResourceName

(HyperlinkedTitles)

Mean

Median

Mode

Range

Frequency

Rate

VarianceandStandard

Deviation

StandardErrorand95%

ConfidenceIntervals

Chi-SquareAnalysis

Studentt-Test

LinearRegressionand

Pearson’sCorrelation

Hardy-W

einberg

Probability

StandardCurves

BeaksasTools:SelectiveAdvantageinChangingEnvironmentsIntheirstudyofthemediumgroundfinches,evolutionarybiologistsPeterandRosemaryGrantwereabletotracktheevolutionofbeaksizetwiceinanamazinglyshortperiodoftimeduetotwomajordroughtsthatoccurredinthe1970sand1980s.Thisactivitysimulatesthefoodavailabilityduringthesedroughtsanddemonstrateshowrapidlynaturalselectioncanactwhentheenvironmentchanges.Studentscollectandanalyzedataanddrawconclusionsabouttraitsthatofferaselectiveadvantageunderdifferentenvironmentalconditions.TheyhavetheoptionofusinganExcelspreadsheettocalculatedifferentdescriptivestatisticsandinterpretgraphs.

X X

LookWho’sComingforDinner:SelectionbyPredationThishands-onclassroomactivityisbasedonrealmeasurementsfromayear-longfieldstudyonpredation,inwhichDr.JonathanLososandcolleaguesintroducedalargepredatorlizardtosmallislandsthatwereinhabitedbyAnolissagrei.Theactivityillustratestheroleofpredationasanagentofnaturalselection.Studentsareaskedtoformulateahypothesisandanalyzeasetofsampleresearchdatafromactualfieldexperiments.Theythenusedrawingsofislandhabitatstocollectdataonanolesurvivalandhabitatuse.Thequantitativeanalysisincludescalculatingandinterpretingsimpledescriptivestatisticsandplottingtheresultsaslinegraphs.

X X X

MappingGenestoTraitsinDogsUsingSNPsInthishands-ongeneticmappingactivitystudentsidentifysinglenucleotidepolymorphisms(SNPs)correlatedwithdifferenttraitsindogs.Thequantitativeanalysissectionincludeschi-squareanalysis.

X

UsingBioInteractiveResourcestoTeachMathematicsandStatisticsinBiology Pg.42

BioInteractiveResourceName

(HyperlinkedTitles)

Mean

Median

Mode

Range

Frequency

Rate

VarianceandStandard

Deviation

StandardErrorand95%

ConfidenceIntervals

Chi-SquareAnalysis

Studentt-Test

LinearRegressionand

Pearson’sCorrelation

Hardy-W

einberg

Probability

StandardCurves

SpreadsheetDataAnalysisTutorialsThesetutorialsaredesignedtoshowessentialdataanalysistechniquesusingaspreadsheetprogramsuchasExcel.Followthetutorialsinsequencetolearnthefundamentalsofusingaspreadsheetprogramtoorganizedata;takingadvantageofformulaeandfunctionstocalculatestatisticalvaluesincludingmean,standarddeviation,standarderrorofthemean,and95%confidenceintervals;andplottinggraphswitherrorbars.

X X X X X

SchoolingBehaviorofSticklebackFishfromDifferentHabitats(DataPoint)Ateamofscientistsstudiedtheschoolingbehaviorofthreespinesticklebackfishbyexperimentallytestinghowindividualfishrespondedtoanartificialfishschoolmodel.

X X

EffectsofNaturalSelectiononFinchBeakSize(DataPoint)RosemaryandPeterGrantstudiedthechangeinbeakdepthsoffinchesontheislandofDaphneMajorintheGalápagosIslandsafteradrought.

X X X