Topic 1 (1.1.1-1.1.6)

38
Statistical Analysis Statistical Analysis

description

Topic 1 (1.1.1-1.1.6). Statistical Analysis. 1.1.1. State that Error bars are a graphical representation of the variability of data. To answer an IB question involving 1.1.1 simply state that Error bars are a graphical representation of the variability of data. - PowerPoint PPT Presentation

Transcript of Topic 1 (1.1.1-1.1.6)

Page 1: Topic 1 (1.1.1-1.1.6)

Statistical AnalysisStatistical Analysis

Page 2: Topic 1 (1.1.1-1.1.6)

State that Error bars are a graphical State that Error bars are a graphical representation of the variability of data.representation of the variability of data.

To answer an IB question involving 1.1.1 To answer an IB question involving 1.1.1 simply state that Error bars are a graphical simply state that Error bars are a graphical representation of the variability of data.representation of the variability of data.

The variability of data refers to how close or The variability of data refers to how close or far away most data values are from the far away most data values are from the mean. A high standard deviation indicates a mean. A high standard deviation indicates a high variability of data and a low standard high variability of data and a low standard deviation indicates a low variability of data.deviation indicates a low variability of data.

Page 3: Topic 1 (1.1.1-1.1.6)

The following is an example of The following is an example of error bars.error bars.

Page 4: Topic 1 (1.1.1-1.1.6)
Page 5: Topic 1 (1.1.1-1.1.6)

Key vocabulary listKey vocabulary list

1.1. Error barsError bars2.2. VariabilityVariability3.3. DataData4.4. GraphicalGraphical

Page 6: Topic 1 (1.1.1-1.1.6)

Calculate the mean and standard deviation of a set of values.Calculate the mean and standard deviation of a set of values. (Students should specify the standard deviation (s), not the (Students should specify the standard deviation (s), not the

population standard deviation. Students will not be population standard deviation. Students will not be expected to know the formulas for calculating these expected to know the formulas for calculating these statistics. They will be expected to use the standard statistics. They will be expected to use the standard deviation function of a graphic display or scientific deviation function of a graphic display or scientific calculator.calculator.

Aim 7:Students could also be taught how to calculate Aim 7:Students could also be taught how to calculate standard deviation using a spreadsheet computer program.)standard deviation using a spreadsheet computer program.)

The mean is the average data value.The mean is the average data value. The sample standard deviation is the average difference The sample standard deviation is the average difference

from the mean for data with a sample size that is less than from the mean for data with a sample size that is less than thirty, which is noted as “s”.thirty, which is noted as “s”.

A statistic is a characteristic or measure obtained by using A statistic is a characteristic or measure obtained by using the data values from a sample, as opposed to a parameter, the data values from a sample, as opposed to a parameter, which is a characteristic or measure obtained by using all which is a characteristic or measure obtained by using all the data values from a specific population.the data values from a specific population.

A set of values is something that consists of data from A set of values is something that consists of data from multiple subjects, which is either a sample or a population.multiple subjects, which is either a sample or a population.

Page 7: Topic 1 (1.1.1-1.1.6)

The following is an example of The following is an example of how to calculate the mean.how to calculate the mean.

Page 8: Topic 1 (1.1.1-1.1.6)

The following is an example for The following is an example for how to calculate the sample how to calculate the sample standard deviation formula, standard deviation formula, which is used when the sample which is used when the sample size is less than thirty. The size is less than thirty. The sample size is the number of sample size is the number of data values that are in the data values that are in the data set that the standard data set that the standard deviation is of.deviation is of.

The formula for the The formula for the sample standard sample standard deviation is.deviation is.

Page 9: Topic 1 (1.1.1-1.1.6)

To find the sample standard deviation (s) and the To find the sample standard deviation (s) and the sample mean on your calculator all that you need to sample mean on your calculator all that you need to do is press the “STAT” button on your TI calculator, do is press the “STAT” button on your TI calculator, then press “1:Edit” under the “EDIT” menu, then then press “1:Edit” under the “EDIT” menu, then input the data into L1 (list 1) by pressing “ENTER” input the data into L1 (list 1) by pressing “ENTER” each time you type a data value, then pressing each time you type a data value, then pressing “STAT” after all of the data values have been typed “STAT” after all of the data values have been typed in, then press the right arrow button to go to the in, then press the right arrow button to go to the “CALC” menu, then press “1:1-Var Stats” by using “CALC” menu, then press “1:1-Var Stats” by using the “ENTER” button, which takes you to the home the “ENTER” button, which takes you to the home screen, and then press “Enter” one last time. Data screen, and then press “Enter” one last time. Data will appear on the home screen. The sample mean is will appear on the home screen. The sample mean is represented by a represented by a X X with a line over it. The sample with a line over it. The sample standard deviation is represented by the symbol Sx. standard deviation is represented by the symbol Sx. Find the two values that correspond to those Find the two values that correspond to those symbols on the home screen and you will have found symbols on the home screen and you will have found the sample standard deviation and the sample the sample standard deviation and the sample mean.mean.

Page 10: Topic 1 (1.1.1-1.1.6)

Key vocabulary listKey vocabulary list

1.1. MeanMean2.2. Standard deviationStandard deviation3.3. ValuesValues4.4. Sample meanSample mean5.5. Sample standard deviationSample standard deviation6.6. Population meanPopulation mean7.7. StatisticStatistic8.8. Sample sizeSample size

Page 11: Topic 1 (1.1.1-1.1.6)

State that the term standard deviation is used to State that the term standard deviation is used to summarize the spread of the values around the mean summarize the spread of the values around the mean and that 68% of the values fall within one standard and that 68% of the values fall within one standard deviation of the mean.deviation of the mean.

(For normally distributed data, about 68% of (For normally distributed data, about 68% of all values lie within +-1 standard deviation (s all values lie within +-1 standard deviation (s or o) of the mean. This rises to about 95% for or o) of the mean. This rises to about 95% for +-2 standard deviations.)+-2 standard deviations.)

To answer an IB question involving this simply To answer an IB question involving this simply write that the term standard deviation is used write that the term standard deviation is used to summarize the spread of the values around to summarize the spread of the values around the mean and that 68% of the values fall the mean and that 68% of the values fall within one standard deviation of the mean.within one standard deviation of the mean.

Page 12: Topic 1 (1.1.1-1.1.6)

1.1.3 refers to the empirical rule, which states for data that is 1.1.3 refers to the empirical rule, which states for data that is normally distributed that 68% of all data values in a set of data normally distributed that 68% of all data values in a set of data lie within 1 standard deviation of the mean, 95% of all data lie within 1 standard deviation of the mean, 95% of all data values in a set of data lie within 2 standard deviations of the values in a set of data lie within 2 standard deviations of the mean, and 99.7% of all data values in a set of data lie within 3 mean, and 99.7% of all data values in a set of data lie within 3 standard deviations of the mean. standard deviations of the mean.

Data is normally distributed if the mean, median, and mode are Data is normally distributed if the mean, median, and mode are practically all the same and the distribution is unimodal.practically all the same and the distribution is unimodal.

The empirical rule does not apply to non-normally distributed The empirical rule does not apply to non-normally distributed data and in order to figure out how many data values lie within data and in order to figure out how many data values lie within +-1 standard deviation of the mean, +-2 standard deviations of +-1 standard deviation of the mean, +-2 standard deviations of the mean, and +- 3 standard deviations of the mean one must the mean, and +- 3 standard deviations of the mean one must utilize methods that require calculations. Such methods will not utilize methods that require calculations. Such methods will not be discussed because IBO will not ask you to do anything that be discussed because IBO will not ask you to do anything that involves the use of them.involves the use of them.

A standard normal distribution has a mean of 0 and a standard A standard normal distribution has a mean of 0 and a standard deviation of 1.deviation of 1.

The spread of values about the mean refers to the average The spread of values about the mean refers to the average numerical amount that a set data values differ from the value of numerical amount that a set data values differ from the value of the mean.the mean.

Page 13: Topic 1 (1.1.1-1.1.6)
Page 14: Topic 1 (1.1.1-1.1.6)
Page 15: Topic 1 (1.1.1-1.1.6)

Key vocabulary listKey vocabulary list

1.1. MeanMean2.2. Standard deviationStandard deviation3.3. SpreadSpread4.4. Normally distributedNormally distributed5.5. Normal distribution curve (bell Normal distribution curve (bell

curve)curve)

Page 16: Topic 1 (1.1.1-1.1.6)

Explain how the standard deviation is useful for Explain how the standard deviation is useful for comparing the means of the spread of data comparing the means of the spread of data between two or more samples.between two or more samples.

(A small standard deviation indicates that (A small standard deviation indicates that the data is clustered closely around the the data is clustered closely around the mean value. Conversely, a large standard mean value. Conversely, a large standard deviation indicates a wider spread around deviation indicates a wider spread around the mean.)the mean.)

Page 17: Topic 1 (1.1.1-1.1.6)

If one sample of data has a large standard If one sample of data has a large standard deviation and if another sample of data deviation and if another sample of data has a small standard deviation, then it is has a small standard deviation, then it is clear that the sample with the larger clear that the sample with the larger standard deviation is much more variable standard deviation is much more variable than the sample with the smaller standard than the sample with the smaller standard deviation.deviation.

The standard deviation is the average The standard deviation is the average spread about the mean. spread about the mean.

The variance is the standard deviation to The variance is the standard deviation to the 2the 2ndnd power. power.

Page 18: Topic 1 (1.1.1-1.1.6)
Page 19: Topic 1 (1.1.1-1.1.6)

Key vocabulary listKey vocabulary list

1.1. Standard deviationStandard deviation2.2. SpreadSpread3.3. SampleSample4.4. ClusteredClustered5.5. AroundAround

Page 20: Topic 1 (1.1.1-1.1.6)

Deduce the significance of the difference between two sets of Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate values.data using calculated values for t and the appropriate values.

(For a t-test to be applied, the data must have a (For a t-test to be applied, the data must have a normal distribution and a sample size of at least normal distribution and a sample size of at least 10. The t-test can be used to compare two sets of 10. The t-test can be used to compare two sets of data and measure the amount of overlap. Students data and measure the amount of overlap. Students will not be expected to calculate values of t. Only a will not be expected to calculate values of t. Only a two-tailed, unpaired t-test is expected. Aim 7: two-tailed, unpaired t-test is expected. Aim 7: While students are not expected to calculate a While students are not expected to calculate a value for the t-test, students could be shown how value for the t-test, students could be shown how to calculate such values using a spreadsheet to calculate such values using a spreadsheet program or the graphic display calculator. TOK: program or the graphic display calculator. TOK: The scientific community defines an objective The scientific community defines an objective standard by which claims about data can be standard by which claims about data can be made.)made.)

Page 21: Topic 1 (1.1.1-1.1.6)

If knowledge of degrees of freedom is needed to answer an IB If knowledge of degrees of freedom is needed to answer an IB biology SL or HL test question all that one needs to know is that biology SL or HL test question all that one needs to know is that the degrees of freedom is represented by d.f. and that degrees the degrees of freedom is represented by d.f. and that degrees of freedom equals the sample size minus 1 when constructing a of freedom equals the sample size minus 1 when constructing a two-tailed confidence interval.two-tailed confidence interval.

Two-tailed means that variable that the test involves is thought Two-tailed means that variable that the test involves is thought to be greater than it is presumed to be or less than it is to be greater than it is presumed to be or less than it is presumed to be. The presumption is referred to as the null presumed to be. The presumption is referred to as the null hypothesis, which is Ho. When a t confidence interval is two hypothesis, which is Ho. When a t confidence interval is two tailed the level of significance, which is denoted by the Greek tailed the level of significance, which is denoted by the Greek letter “alpha”, is divided by 2. For one-tailed confidence intervals letter “alpha”, is divided by 2. For one-tailed confidence intervals the level of significance is unchanged.the level of significance is unchanged.

The sample size must be greater than or equal to 10 or less than The sample size must be greater than or equal to 10 or less than 30 and the population standard deviation must be unknown in 30 and the population standard deviation must be unknown in order for a t-test to be used. If those conditions are not meet a z-order for a t-test to be used. If those conditions are not meet a z-test must be used, but IBO will not ask you to do a z-test.test must be used, but IBO will not ask you to do a z-test.

To the deduce the significance calculations involving t-values To the deduce the significance calculations involving t-values need to be done, which involves the use of a formula. So any IB need to be done, which involves the use of a formula. So any IB question involving 1.1.5 should just involve the use of a formula question involving 1.1.5 should just involve the use of a formula and several calculations, so it is all numerical and not verbal.and several calculations, so it is all numerical and not verbal.

Page 22: Topic 1 (1.1.1-1.1.6)
Page 23: Topic 1 (1.1.1-1.1.6)
Page 24: Topic 1 (1.1.1-1.1.6)
Page 25: Topic 1 (1.1.1-1.1.6)

This may need to be used on an IB test to answer a question that involves 1.1.5. The This may need to be used on an IB test to answer a question that involves 1.1.5. The table can be used to find t-values using the degrees of freedom and the probability that table can be used to find t-values using the degrees of freedom and the probability that chance alone could produce the difference, which is 1 minus the percent of confidence chance alone could produce the difference, which is 1 minus the percent of confidence in decimal form.in decimal form.

Page 26: Topic 1 (1.1.1-1.1.6)

Key vocabulary listKey vocabulary list

1.1. Confidence intervalConfidence interval2.2. Level of significance (alpha)Level of significance (alpha)3.3. Degrees of freedomDegrees of freedom4.4. T-distributionT-distribution5.5. Sample standard deviationSample standard deviation6.6. Sample meanSample mean7.7. ProbabilityProbability8.8. Population meanPopulation mean9.9. Two-tailedTwo-tailed10.10.Unpaired(in reference to t-tests)Unpaired(in reference to t-tests)11.11.Normal distributionNormal distribution

Page 27: Topic 1 (1.1.1-1.1.6)

Explain that the existence of a correlation does Explain that the existence of a correlation does not establish that there is a casual relationship not establish that there is a casual relationship between two variables. between two variables.

(Aim 7: While calculations of such values (Aim 7: While calculations of such values are not expected, students who want to are not expected, students who want to use r and r2 values in their practical work use r and r2 values in their practical work could be shown how to determine such could be shown how to determine such values using a spreadsheet program.)values using a spreadsheet program.)

Page 28: Topic 1 (1.1.1-1.1.6)

When a mathematical correlation test is used the When a mathematical correlation test is used the values of r range form -1 to 1. A r-value of 1 implies values of r range form -1 to 1. A r-value of 1 implies that there is a completely positive correlation. A r-that there is a completely positive correlation. A r-value of -1 implies that there is a completely value of -1 implies that there is a completely negative correlation. A r-value of 0 implies that there negative correlation. A r-value of 0 implies that there is no correlation. is no correlation.

If the r-values show that there is a correlation If the r-values show that there is a correlation between the two variables an experiment needs to between the two variables an experiment needs to be performed in order to know if there is a casual be performed in order to know if there is a casual relationship between the two variables. relationship between the two variables.

A variable is a characteristic or attribute that can A variable is a characteristic or attribute that can assume different values.assume different values.

In a question that involves 1.1.6 and example needs In a question that involves 1.1.6 and example needs to be mentioned to support the points made by the to be mentioned to support the points made by the IB biology SL or HL student. On the next slide is an IB biology SL or HL student. On the next slide is an excellent example that could be used.excellent example that could be used.

Page 29: Topic 1 (1.1.1-1.1.6)

Africanized honey beesAfricanized honey bees

““The story of Africanized honey bees (AHBs) invading the USA includes anThe story of Africanized honey bees (AHBs) invading the USA includes an

interesting correlation. In 1990, a honey bee swarm was found outside a smallinteresting correlation. In 1990, a honey bee swarm was found outside a small

town in southern Texas. They were identified as AHBs. These bees were broughttown in southern Texas. They were identified as AHBs. These bees were brought

from Africa to Brazil in the 1950s, in the hope of breeding a bee adapted to thefrom Africa to Brazil in the 1950s, in the hope of breeding a bee adapted to the

South American tropical climate. But by 1990, they had spread to the southern South American tropical climate. But by 1990, they had spread to the southern

US. Scientists predicted that AHBs would invade all the southern states of the US, US. Scientists predicted that AHBs would invade all the southern states of the US,

but this hasn’t happened. Look at Figure 1.5: the bees have remained in the but this hasn’t happened. Look at Figure 1.5: the bees have remained in the

southwest states (area shaded in yellow) and have not travelled to the south-southwest states (area shaded in yellow) and have not travelled to the south-

eastern states. The edge of the areas shaded in yellow coincides with the point at which eastern states. The edge of the areas shaded in yellow coincides with the point at which

there is an annual rainfall of 137.5cm (55 inches) there is an annual rainfall of 137.5cm (55 inches) spread evenly throughout the year. Thisspread evenly throughout the year. This

level of level of year-round wetness seems to be a barrier to the movement of the bees andyear-round wetness seems to be a barrier to the movement of the bees and

they do not move into such areas.” The experiment shows that the existence of the presumed correlation did not prove they do not move into such areas.” The experiment shows that the existence of the presumed correlation did not prove that there was a casual relationship between the two variables, which is the most important aspect of 1.1.6.that there was a casual relationship between the two variables, which is the most important aspect of 1.1.6.

Page 30: Topic 1 (1.1.1-1.1.6)

Key vocabulary listKey vocabulary list

1.1. CorrelationCorrelation2.2. Casual relationshipCasual relationship3.3. Completely positive correlationCompletely positive correlation4.4. Completely negative correlationCompletely negative correlation5.5. ExperimentExperiment6.6. Africanized honey beesAfricanized honey bees7.7. VariableVariable

Page 31: Topic 1 (1.1.1-1.1.6)
Page 32: Topic 1 (1.1.1-1.1.6)
Page 33: Topic 1 (1.1.1-1.1.6)
Page 34: Topic 1 (1.1.1-1.1.6)
Page 35: Topic 1 (1.1.1-1.1.6)
Page 36: Topic 1 (1.1.1-1.1.6)
Page 37: Topic 1 (1.1.1-1.1.6)
Page 38: Topic 1 (1.1.1-1.1.6)

IBO makes reference to the use of spread sheet programs IBO makes reference to the use of spread sheet programs in the topic 1 detailed syllabus. Good spread sheet in the topic 1 detailed syllabus. Good spread sheet programs are Microsoft Excel and Minitab. Minitab is a programs are Microsoft Excel and Minitab. Minitab is a statistical program that can be downloaded online. There statistical program that can be downloaded online. There is a free 30 day trail for it. It is must better than Microsoft is a free 30 day trail for it. It is must better than Microsoft excel. If you ever require use of Minitab as a spreadsheet excel. If you ever require use of Minitab as a spreadsheet program go to program go to http://www.minitab.com/Downloads/

If you are unsure about how to use Minitab you can use its If you are unsure about how to use Minitab you can use its help feature that is very detailed. One of the examples in help feature that is very detailed. One of the examples in this PowerPoint presentation was created by using this PowerPoint presentation was created by using Minitab, which is the SHOW confidence interval data Minitab, which is the SHOW confidence interval data example for 1.1.5. By using the help feature for Minitab example for 1.1.5. By using the help feature for Minitab you should be able to do anything that you need to do for you should be able to do anything that you need to do for IB Biology SL or HL that involves the use of a spreadsheet. IB Biology SL or HL that involves the use of a spreadsheet. The help feature for Microsoft Excel can also be utilized, The help feature for Microsoft Excel can also be utilized, but Minitab is much better software than Microsoft Excel.but Minitab is much better software than Microsoft Excel.

Microsoft Excel free trails can be downloaded at Microsoft Excel free trails can be downloaded at http://us1.trymicrosoftoffice.com/default.aspx?WT.srch=1&WT.mc_id=78C4B07A-6906-484D-B4DD-47E2084740A6