PROBLEM This is where you decide what you would like more information on. PLAN You need to know what...

22
Statistics Achievement Standard 1.10

Transcript of PROBLEM This is where you decide what you would like more information on. PLAN You need to know what...

Page 1: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

StatisticsAchievement Standard 1.10

Page 2: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

The Statistical Enquiry CyclePROBLEM

This is where you decide what you would like more information on.

PLANYou need to know what you will measure and how you will do it.• What data do you need• How will you collect it• What will you record• How will you record it

DATAThis is where your data is collected, managed and organised.

ANALYSISThis is where you look at the data to see what it tells you about your problem.• Graph data and collect statistics

CONCLUSIONWhat did you learn about your investigation• What do the graphs say• What differences are there in the

statistics• Can you infer that the difference in the

sample is also in the population • Are there new problems to investigate

Page 3: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

The Statistical Problem

What is it that you would like information on, considering information is often more interesting when one group is different to another.

Remember that when comparing groups we should look to use numeric data or qualitative groups.

Once you have decided what you would like information on you need to write a question fully describing what you are interested in investigating.

There are two ways in which you can write a question. Two example questions investigating bag weights could be:

e.g. ‘I wonder if there is a difference between the bag weights of Year 11 boys and Year 11 girls at CHS’

e.g. ‘’Do Year 11 boys at CHS tend to have heavier bag weights than Year 11 girls at CHS’

Page 4: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

The Plan

Now that you know what you are going to investigate, you will need to find variables that will illustrate the differences.

Sometimes there are restrictions on what data can be collected or the ease with which data can be collected.

You may need to consider how you will collect and record the data. Data is often best recorded in a table.

Remember: This Achievement Standard is an investigation into multivariate data which implies that you will initially look at a number of variables before deciding which you will look for differences between

There are web sites from which you can extract data and a good starting point is asking yourself what differences you will expect.

Page 5: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

The Data

It is important that the largest sample that is possible is used. Recommended minimum is 30 as drawing conclusions from small samples is suspect.

Census At School is an excellent resource that contains a huge amount of interesting statistical information. It is likely this data will be used.

When you collect data from website, you may need to ‘clean’ it. This refers to the process of removing invalid data points from your sample.

e.g. When using data regarding bag weights, if an individual has no bag, do not record it as a 0.

e.g. Check units are consistent, all 0 entries and always be suspicious of any data that seems out of place.

Page 6: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

The Analysis – Statistics and Graphs

1. Mean

- easy to calculate but is affected by extreme values

- to calculate use:Sum of all values

Total number of values

e.g. Calculate the mean of 6, 11, 3, 14, 8

6 + 11 + 3 + 14 + 8

5Mean = =

42

5

Push equals on calculator BEFORE dividing

= 8.4

e.g. Calculate the mean of 6, 11, 3, 14, 8, 100

Mean =6 + 11 + 3 + 14 + 8 + 100

6 =

142

6 = 23.7 (1 d.p.)

MEASURES OF CENTRAL TENDENCY

Page 7: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

2. Median

- middle number when all are PLACED IN ORDER (two ways)

- harder to calculate but is not affected by extreme values

a) for an odd number of values, median is the middle value

e.g. Find the median of 39, 44, 38, 37, 42, 40, 42, 39, 32

32, 37, 38, 39, 39, 40, 42, 42, 44To find placement of median use: n + 1 2n = amount of data

9 + 1 = 10 = 5 2 2 OR

Cross of data, one at a time from each end until you reach the middle value.

b) for an even number of values, median is average of the two middle values

e.g. Find the median of 69, 71, 68, 85, 73, 73, 64, 75

64, 68, 69, 71, 73, 73, 75, 85

n + 1 = 8 + 1 = 4.5 2 2

Median = 71 + 73 = 144 = 72 2 2

OR

Median = 39

Page 8: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

3. Mode - only useful to find most popular item - is the most common value (can be none, one or more)

e.g. Find the mode of 188, 93, 4, 93, 15, 0, 100 15

Mode = 15 and 93

Page 9: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

MEASURES OF SPREAD

- can show how spread out the data is- is the difference between the largest and smallest values

e.g. Find the range of 4, 2, 6, 9, 8

highest valuelowest value

Range = 9 – 2

Note: Its a good idea to write in brackets the values that make up the range.

(2 – 9)= 7

Range

Page 10: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

– are measures of spread which with the median splits the data into quarters– method used is similar as to when finding median

When the data is in order:– the lower quartile (LQ) has – the upper quartile (UQ) has

25% or ¼ 75% or ¾

of the data below it. of the data below it.

– the Interquartile Range (IQR) = UQ – LQ and describes the middle 50%

Quartiles

¼

Standard Deviation

– is the measure of the average spread of the numbers from the mean.– for Year 11, your only concern is that the bigger the value, the more spreadthe data is.

Page 11: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

e.g. Find the LQ, UQ and the interquartile range of the following data

6, 6, 6, 7, 8, 9, 10, 10, 11, 14, 16, 16, 17, 19, 20, 20, 24, 24, 25, 29

Note: always find the median first 20 + 1 = 21 = 10.5 2 2

10 + 1 = 11 = 5.5 2 2

ORLQ =8 + 9 = 17 = 8.5 2 2

UQ = 20 + 20 = 40 = 20 2 2

OR

e.g. Find the LQ, UQ and the interquartile range of the following data

5, 6, 8, 10, 11, 11, 12, 15, 18, 22, 23, 28, 30

Remember, always find the median first 13 + 1 = 14 = 7 2 2

or cross off data

or cross off data

As the median is an actual piece of data, it is ignored when finding the LQ and UQ

6 + 1 = 7 = 3.5 2 2

LQ = 8 + 10 = 18 = 9 2 2

UQ = 22 + 23 = 45 = 22.5 2 2

IQR = UQ - LQ IQR = 20 – 8.5 = 11.5

IQR = 22.5 – 9 = 13.5

Page 12: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

Dot Plots– are like a bar graph– each dot represents one item

e.g. Plot these 15 golf scores on a dot plot 

70, 72, 68, 74, 74, 78, 77, 70, 72, 72, 76, 72, 76, 75, 78

68 70 72 74 76 78

Golf Scores

Range plot between lowest and highest values

Page 13: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

– records and organises data– most significant figures form the stem and the final digits the leaves– can be in back to back form in order to compare two sets of data

e.g. Place the following heights (in m) onto a back to back stem and leaf plot BOYS = 1. 59, 1.69, 1.47, 1.43, 1.82, 1.70, 1.73, 1.35, 1.76, 1.68, 1.62, 1.84, 1.45, 1.50, 1.54, 1.73, 1.84, 1.71, 1.66  GIRLS = 1. 44, 1.46, 1.63, 1.29, 1.48, 1.57, 1.51, 1.42, 1.34, 1.45, 1.57, 1.59, 1.42

Unordered Graph of Heights Ordered Graph of Heights Boys Girls Boys Girls  1.8 1.8

1.7 1.7 1.6 1.6 1.5 1.5 1.4 1.4 1.3 1.3 1.2 1.2

Look at the highest and lowest data values to decide the range of the stem

Place the final digits of the data on the graph on the correct side

,7

,9,9

,3

,2,0,3

5

,6,8,2

4

5,0 4

,3 1 6

,4

4, 6,

3

9

8,7, 1,

2,4

5,7, 9

2

4, 4, 26, 3, 3, 1, 0

9, 8, 6, 2 9, 4, 07, 5, 3

5

3 1, 7, 7, 92, 2, 4, 5, 6, 8

4 9

Stem and Leaf Plots

Page 14: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

Graph of Heights Boys Girls  1.8 1.7

1.6 1.5 1.4 1.3 1.2

4, 4, 26, 3, 3, 1, 0

9, 8, 6, 2 9, 4, 07, 5, 3

5

3 1, 7, 7, 92, 2, 4, 5, 6, 8

4 9

e.g. From the ordered plot state the minimum, maximum, LQ, median, UQ, IQR and range statistics for each side

BOYS GIRLS

Minimum:Maximum:LQ:Median:UQ:IQR:Range:

For each statistic, make sure to write down the whole number, not just the ‘leaf’!

1.29 m1.63 m

1.63 – 1.29 = 0.34 m

Median = 13 + 1 = 7 2

When finding median, LQ and UQ, make sure you count/cross in the right direction!

LQ/UQ = 6 + 1 = 3.5 2

1.42 m1.46 m1.57 m

1.57 – 1.42 = 0.15 m

Remember: If you find it hard to calculate stats off graph, write out data in a line first!

1.35 m1.84 m1.50 m1.68 m1.73 m

1.73 – 1.50 = 0.23 m1.84 – 1.35 = 0.49 m

Calculating Statistics from Stem and Leaf Plots

Page 15: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

– shows the minimum, maximum, LQ, median and UQ– ideal for comparing two sets of data

e.g. Using the height data from the Stem and Leaf diagrams, draw two box and whisker plots (Boys and Girls)

1.20 1.30 1.40 1.50 1.60 1.70 1.80

Height (m)

1.90

Note: Use the minimum and maximum values to determine length of scale

Males

Females

Question: What is the comparison between the boy and girl heights?

ANSWER?

Minimum LQ Median UQ Maximum

EVIDENCE?

Box and Whisker Plot of Boys and Girls Heights

Box and Whisker Plots

Page 16: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

– display grouped data– frequency is along vertical axis, group intervals are along horizontal axis– there are NO gaps between bars

e.g. Graph the grouped frequency table data about heights onto a histogram

F

req

ue

ncy

(n

o.

of

stu

de

nts

)

140 150 160 170 180 190

Height (cm)

Student Heights

2

0

4

10

6

8

12

Note: The groups from the table form the intervals along the horizontal axis and the highest frequency determines the height of the vertical axis.

Histograms

Page 17: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

– Side by side histograms can also be used to compare data

Female Heights

140 150 160 170 180 190 200

2

4

6

8

Male Heights

140 150 160 170 180 190 200

2

4

6

8

Question: What is the comparison between the female and male heights?ANSWER?

EVIDENCE?

Page 18: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

– looks for a relationship between two measured variables – points are plotted like co-ordinates e.g. Below are the heights and weights of Year 7 boys. Place on a scatter plot.Height (cm)

Weight (kg)

144 48152 52161 50148 49155 53140 47158 58139 45147 50150 51152 49138 46137 44145 49

45

50

55

60

Wei

ght (

kg)

135 140 145 150 155 160

Height (cm)

Scatter Diagram for boys heights and weights

Use the data to determine scale to use on both axes

If points form a line (or close to) we can say there is a relationship between the two variables.

Line of best fit

Outliers can generally be ignored

What is the relationship between the boys height and weight?

ANSWER?EVIDENCE?

Scatter Plots

Page 19: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

– a collection of measurements recorded at specific intervals where the quantity changes with time.

Features of Time Series a) Order is important with all measurements retained to examine trendsb) Long term trends where measurements definitely tend to increase or decreasec) Seasonal trends resulting in up and down patterns

e.g. Draw a time series graph for the following data:

Season Quarterly sales

Sept. 9040Dec. 8650

Mar. 96 8370June 9250Sept. 9033Dec. 8578

Mar. 97 8495June 9407Sept. 9209Dec. 8740

Mar. 98 8618June 9504Sept. 9246Dec. 8929

Mar. 99 8670

Quarterly Sales for Elliots's Fish and Chips Shop

8300

8500

8700

8900

9100

9300

9500

Sale

s (

$)

Sept.

Dec.

Mar.

96

June

June

Mar.

97

Dec.

Sept.

June

Mar.

98

Dec.

Sept.

Mar.

99

Dec.

Sept.

Quarter Years

9700

9900

Join up each of the points

What are the short and long term trends? ANSWER? EVIDENCE?

Time Series

Page 20: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

The Conclusion (and Making an Inference)

When you have finished your analysis, it is important to see if you can make an inference. An inference is when you make a generalised statement about the whole population (generally if there is a difference), by using the findings from your sample.

It is easiest if you write your statement in the following manner:“From my sample data I can make an inference about the population that…Year 11 Boys tend to be taller than Year 11 Girls (example). This is because…

You will need to justify this inference by using your findings from both your statistics and/or graphs. Ways to justify your inference will be shown on an extra handout.

Once you have made your inference with justifications, you should end off your investigation by making a conclusion. Your conclusion MUST answer your original question and can be written in the format:

“Therefore I can conclude that… Year 11 boys at CHS are typically taller than Year 11 girls at CHS (example).

Page 21: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

Choosing a SampleA sample should:1) Be large enough to be representative of whole population2) Have people/items in it that are representative of the population

It is best to choose samples that are large and random but size may be affected by time, money, personnel, equipment etc.

Some Sampling MethodsSimple random sampling:1- obtain a population list2- number each member3- use random table or random number on calculator

Systematic sampling:

1- obtain a population list2- randomly select a starting point on the list3- select every nth member until desired sample size is reached

Note: every nth member is found by: Population/group size

Size of sample needed

Page 22: PROBLEM This is where you decide what you would like more information on. PLAN You need to know what you will measure and how you will do it. What data.

Limitations and Improvements1. In terms of Data CollectionTypical Limitations Improvements- Sample too small- Not random orrepresentative

- Outliers distort data

- Get a representativesample

- Taken over too shorta time period

- Take data over a longer time period

- Obtain a bigger sample

- Ignore extreme outliers

2. In terms of Your ProcessTypical Limitations Improvements- Not enough statisticscalculated

- Calculate more statistics

- Not enough graphs used,data could be compared better

- Use other graphs (i.e. comparative histograms)

- Scales on graphs too large - Change scales on graph (smaller)

- Way graphs are drawn - Alter the way the graphs may be drawn