Revisionguide - Stats

download Revisionguide - Stats

of 34

Transcript of Revisionguide - Stats

  • 8/20/2019 Revisionguide - Stats

    1/89

     

    UltimateGCSE StatisticsRevision Guide

    Updated for 

    the 

    2010 

    exam 

  • 8/20/2019 Revisionguide - Stats

    2/89

      2

     

  • 8/20/2019 Revisionguide - Stats

    3/89

      3

    Contents Page

    Page

    Exam questions

     by

     topic

     2004

     ‐ 2009 4

    Types of  data 10

    Sampling 11

    Scatter graphs 15

    Averages & standard deviation from a table 19

    Interquartile range & Outliers & Box plots 24

    Cumulative frequency curves 26

    Histograms 28

    Time series 32

    Index numbers 35

    Spearman’s Rank correlation coefficient 41

    Misleading graphs 45

    Reading and interpreting from a table of  Statistics  47

    Questionnaires 49

    Odds 53

    Venn diagrams 54

    Simulation 56

    Experimental probability 58

    Probability tree diagrams 59

    Conditional probability 63

    Binomial distribution 65

    Standardised scores 75

    Normal distribution  78

    Choropleth graphs 83

    Comparative Pie Charts/Diagrams 86

     

  • 8/20/2019 Revisionguide - Stats

    4/89

      4

    Examination Questions

    2004 Marks

    Section AQ 1(a) Pie charts 1

    1(b) Misleading graphs (pie charts) 1

    2(a) Advantages & Disadvantages of a random sample 2

    2(b) Description of systematic sample 2

    3 Reading / interpreting data 4

    4(a) Rounding errors 1

    4(b) Composite bar charts 3

    5(a) Spearman’s rank 3

    5(b) Interpretation of rank 2

    6 Index numbers 4

    7 Standardised scores 6

    8 Choropleth graphs 6

    TOTAL 35

    Section B 9 Scatter graphs 11

    10 Stem and leaf diagrams & outliers 9

    11  Venn diagrams & conditional probability 6

    12(a) Census 112(b) Sample over census 2

    12( c) Selecting a sample (stratified) 2

    12(d) Pilot survey 2

    12(e) Biased questions 1

    12(f) Advantages & Disadvantages of interviewing 2

    13 Normal distribution & s.d. limits 8

    14(ai) Mean from a grouped frequency table 2

    14(aii) Standard deviation from a grouped frequency table 3

    14(b) Histograms 3

    14(c/d) Suitable distribution 6

    15 Binomial distribution 7

    TOTAL 65

    TOTAL FOR PAPER 100

  • 8/20/2019 Revisionguide - Stats

    5/89

      5

     

    2005Section A 1(a) Types of data (Quantative, continuous..) 2

    Q 1(b) Taking a stratified sample 2

    2(a) Sampling frames 1

    2(b) Suitable sampling method, convenience, cluster, 1

    systematic

    3(ai) Two way table probability 1

    3(aii) Two way table conditional probability 1

    3(b) Interpreting probabilities 2

    4(a) Composite bar charts 3

    4(b) Interpreting a composite bar chart 2

    5 Misleading graphs 3

    6 Weighted index numbers 77 Simulation & random numbers 4

    8(a) Census 1

    8(b) Determining what the population is 1

    8(c ) Advantage of closed questions 1

    8(d) Questionnaires 3

    TOTAL 35

    Section B 1 Spearman's rank 6

    2 Interpreting graphs 53 Time series - moving averages 8

    4(a) IQR 2

    4(b) Outliers 2

    4(c ) Box plots 2

    4(d) Suitable distribution 2

    4(e) Interpreting data 1

    5(a) Normal distribution & s.d. limits 1

    5(b) Normal distribution & s.d. limits 1

    5(c-f ) Quality assurance charts 46 Drawing pie charts 6

    7 Probability tree diagrams & conditional probability 8

    8 Binomial distribution 8

    9(a) Histogram 3

    9(b) Mean from a grouped frequency table 3

    9(c ) Standard deviation from a grouped frequency table 3

    TOTAL 65

    TOTAL FOR PAPER 100

  • 8/20/2019 Revisionguide - Stats

    6/89

      6

    2006Section A

    Q 1(a-b) Grouped frequency tables & finding mean 3

    1(c ) Suitable average 1

    2 Two way table probability 63(a-b) Odds - probability 2

    3(c ) Estimation - probability 1

    4(a) Random sample description 1

    4(b-c) Random numbers & simulation 4

    5 Interpreting data 4

    6 Comparative pie charts 3

    7 Index numbers 7

    8(a) Normal distribution & s.d. limits 1

    8(b) Quality assurance 2

    TOTAL 35

    Section B 1 Spearman's rank 6

    2(a) Sample over census 2

    2(b) Suitable sampling method 1

    2(c ) Closed questions 2

    2(d) Pilot survey 2

    2(e) Questionnaires 2

    3(a) Mean & S.D. from a table 3

    3(b) Interpreting data 24 Probability tree diagrams 8

    Conditional probability

    5 Scatter diagrams 11

    6(a-b) Stem and leaf diagrams 3

    6(c ) Outliers 3

    6(d-e) Box plots 5

    6(f) Improving a study - accuracy 1

    7 Binomial distribution 7

    8 Time series & moving averages 7TOTAL 65

    TOTAL FOR PAPER 100

  • 8/20/2019 Revisionguide - Stats

    7/89

      7

     

    2007Section A

    Q 1 Interpreting data 4

    2(a) Comparative pie charts 2

    2(b) Stratified sample - reasons 1

    2(c ) Stratified sample 1

    3 Probability estimation 3

    4(a) Advantage of census 1

    4(b) Advantage of closed questionnaires 1

    4( c ) Pilot surveys 1

    4(d) Questionnaires 2

    5 Spearman’s rank 5

    6 Types of data (Quantative, continuous..) 47 Standardised scores 5

    8(a) Suitable sampling method 1

    8(b) Normal distribution & s.d. limits 1

    8(c ) Normal distribution & s.d. limits 3

    TOTAL 35

    Section B 1 Time series 6

    2 Box plots, skewness, outliers 13

    3 Scatter diagrams 14

    4(a-c) Probability tree diagrams 8

    4(d-f) Binomial distribution 5

    5(a) Mean from a frequency table 3

    5(b) Histogram 3

    5(c ) Median from a histogram (interpolation) 2

    5(d) Suitable distribution 2

    6 Index numbers & geometric mean 9

    TOTAL 65

    TOTAL FOR PAPER 100

  • 8/20/2019 Revisionguide - Stats

    8/89

      8

    2008Section A

    Q 1 Misleading graphs - pie charts 2

    2 Interpreting data 5

    3 Population pyramids 34 Averages from a table 6

    5(a) Advantages of a pilot study 2

    5(b) Questionnaires 1

    5(c) Random sample - description 2

    5(d) Types of data (Quantative, continuous..) 2

    6 Index numbers 3

    7 Mean & S.D. from a table 5

    8 Normal distribution & s.d. limits 4

    TOTAL 35

    Section B 1 Box plots & skewness 9

    2  Venn diagrams & conditional probability 7

    3 Scatter diagrams 7

    4(a) Disadvantages of a census 2

    4(b) Suitable sample 2

    4(c) Closed questionnaires 2

    5 Binomial distribution 7

    6 Standardised scores 7

    7 Spearman’s rank 6

    8(a-b) Histograms 4

    8(c) Median from histogram (interpolation) 3

    8(d) Suitable distribution 1

    9 Time series 8

    TOTAL 65

    TOTAL FOR PAPER 100

  • 8/20/2019 Revisionguide - Stats

    9/89

      9

    2009Section A

    Q 1 Misleading graphs - pie charts & calculating an angle 3

    2 Interpreting data 5

    3 Random numbers & simulation 44 Interpreting data 5

    5 Time series 5

    6 Odds - probability & estimation 3

    7 Standardised scores 5

    8 Mean & S.D. from a table 5

    TOTAL 35

    Section B 1(a) Advantage of sample over census 2

    1(b) Suitable sample 1

    1(c ) Questionnaires 3

    1(d) Advantages of a pilot study 2

    2 Scatter diagrams 9

    3 Spearman’s rank 7

    4(a)(b) Composite bar charts 5

    4(c )-(e)Chain Index numbers & geometric mean 7

    5 Stem and Leaf & Box plots & skewness 12

    Outliers

    6 Normal distribution & s.d. limits 9

    7 Tree diagrams & Binomial distribution 8TOTAL 65

    TOTAL FOR PAPER 100

    2010 likely topics   Venn diagrams

     

    Spearman’s rank  Normal distribution  Time series (inc average seasonal variation)  Choropleth graphs  Standardised scores  Binomial distribution  Outliers  Scatter diagrams  Weighted index numbers (see q6 2005 A)

  • 8/20/2019 Revisionguide - Stats

    10/89

  • 8/20/2019 Revisionguide - Stats

    11/89

      11

    Sampling

    When organisations require data they either use data collected by somebody else(secondary data), or collect it themselves (primary data). This is usually done bySAMPLING that is collecting data from a representative SAMPLE of the population theyare interested in.

     A POPULATION need not be human. In statistics we define a population as thecollection of ALL the items about which we want to know some characteristics.Examples of populations are hospital patients, road accidents, pet owners, unoccupiedproperty or bridges. It is usually far too expensive and too time consuming tocollect information from every member of the population (known as taking a census),exceptions being the General Election and The Census, so instead we collect it from asample.

    If it is to be of any use the sample must represent the whole of the population we areinterested in, and not be biased in any way. This is where the skill in sampling lies: inchoosing a sample that will be as representative as possible.

    The basis for selecting any sample is the list of all the subjects from which the sampleis to be chosen - this is the SAMPLING FRAME. Examples are the Postcode AddressFile, the Electoral register, telephone directories, membership lists, lists created bycredit rating agencies and others, and maps. A problem, of course, is that the list maynot be up to date. In some cases a list may not even exist.

    Simple randomsampling

     A simple random sample gives each member of the population anequal chance of being chosen. This can be achieved using randomnumber tables.

    Systematicsampling

    This is random sampling with a system! From the sampling frame,a starting point is chosen at random, and thereafter at regularintervals. For example, suppose you want to sample 8 houses froma street of 120 houses. 120/8=15, so every 15th house is chosenafter a random starting point between 1 and 15. If the randomstarting point is 11, then the houses selected are 11, 26, 41, 56, 71,86, 101, and 116.

    Clustersampling

    In cluster sampling the units sampled are chosen in clusters, closeto each other. Examples are households in the same street, orsuccessive items off a production line. The population is divided intoclusters, and some of these are then chosen at random. Withineach cluster units are then chosen by simple random sampling orsome other method. Ideally the clusters chosen should be dissimilarso that the sample is as representative of the population as possible

    Quota sampling In quota sampling the selection of the sample is made by theinterviewer, who has been given quotas to fill from specified sub-groups of the population. For example, an interviewer may be told

    to sample 50 females between the age of 45 and 60.

  • 8/20/2019 Revisionguide - Stats

    12/89

      12

     

  • 8/20/2019 Revisionguide - Stats

    13/89

      13

     

  • 8/20/2019 Revisionguide - Stats

    14/89

      14

    StratifiedSampling

     A Stratified Sample will give a sample proportional to the size of the

    strata. We use the formula, "no. in stratum"

     sample size"total no. in population"

  • 8/20/2019 Revisionguide - Stats

    15/89

      15

    Scatter graphs

     A typical GCSE Statistics question on scatter graphs will have the following structure;  Plot some missing points on a scatter graph  Describe relationship between variables

      Draw a line of best fit (through ,x y  )  Use the line of best fit to estimate one variable if given the other. If inside the

    data range this is known as interpolation and if outside the data range, this isknown as extrapolation and may not be suitable (as trends may not continue)

      Find the equation of the line of best fit in the form y = ax + b   State what a and b represent in context of the question

    Finding a and b

     All is the average x value and the average y value, so add all the x valuestogether and divide by how many you have and do the same for y.

    To find a, the gradient of the line, pick two points that lie on your LOBF call them

    1 1, x y   and 2 2, x y   then find the difference between the y’s over the difference

    between the x’s i.e. 2 1

    2 1

    y y 

     x x 

     

    To find b, look at the y value where your LOBF crosses the y axis

    ,x y 

    ,x y 

  • 8/20/2019 Revisionguide - Stats

    16/89

      16

     

  • 8/20/2019 Revisionguide - Stats

    17/89

      17

     

  • 8/20/2019 Revisionguide - Stats

    18/89

      18

     When a linear model (straight line of best fit) is not appropriate, another model maybe suitable.

    Suitable models could be;

  • 8/20/2019 Revisionguide - Stats

    19/89

      19

     Averages & standard deviation from a table

     You must be able to find the mean, median, modal class interval, range and standarddeviation from a frequency distribution & a grouped frequency distribution.

    The median occurs at the

    1

    2

    n  position for a set of n  numbers.

    The modal class interval is the interval with the largest frequency.

    The range is the largest value in the distribution minus the smallest.

    How to find the mean and standard deviation using the calculator for alist of numbers.

    Press mode

    Select option 2 : STAT

    Select option 1 : 1-VAR

    Enter your data in the X column

    Press SHIFT then 1and then 5 : VAR

    Option 2 will give you the mean x   and option 3 will give you thestandard deviation x n     

     You can verify this method for finding the mean and standard deviation using theexam on the following page.

  • 8/20/2019 Revisionguide - Stats

    20/89

      20

     

  • 8/20/2019 Revisionguide - Stats

    21/89

      21

    How to find the mean and standard deviation using the calculator for afrequency distribution

     You must first switch the FREQ mode on in your calculator.To do this, go to

    Press SHIFT, then Setup

    Press the down arrow

    Select option 3 : STAT

    Select option 1 : ON

     You can now follow the same steps as you did for finding the mean and standarddeviation of a list of numbers, the only difference being, you can now also enter thefrequency. 

  • 8/20/2019 Revisionguide - Stats

    22/89

      22

     

  • 8/20/2019 Revisionguide - Stats

    23/89

      23

    How to find the mean and standard deviation using the calculator for agrouped frequency distribution

    Check  – Make sure the FREQ is switched on in your calculator (See page 14).When you are faced with this screen, you

    must enter the midpoints of the intervalsin place of X, you must also write thesedown on the exam paper to gain fullmarks.

  • 8/20/2019 Revisionguide - Stats

    24/89

      24

    Interquartile range & Outliers & Boxplots

    The IQR is calculated as follows : IQR = UQ – LQ.

    The UQ is found ¾ of the way through the data i.e. at position 3

    14 n   .

    The LQ is found ¼ of the way through the data i.e. at position 1

    14

    n   .

    To find an outlier we work out 1.5 times the IQR and subtract/add to the LQ/UQrespectively. If an item is outside this range, it is considered an outlier.

    This data can also be shown on a box plot.

  • 8/20/2019 Revisionguide - Stats

    25/89

      25

     

  • 8/20/2019 Revisionguide - Stats

    26/89

      26

    Cumulative frequency curves

    Cumulative frequency is a running total. It is calculated by adding up the frequenciesup to that point. Note that the first point that is plotted is the lower boundary of thefirst class interval which has a cumulative frequency of 0. Notice also thecharacteristic S-shape of the cumulative frequency curve. Draw lines up to the c.fcurve where necessary.

  • 8/20/2019 Revisionguide - Stats

    27/89

      27

     

  • 8/20/2019 Revisionguide - Stats

    28/89

      28

    Histograms

    With a histogram, it is the area of the bar that represents the frequency. Along the y  axis, frequency density is plotted. The formula can be found in the boxbelow.

    FrequencyFrequency Density=

    Class Width 

     You may need to rearrange this formula to get Frequency as the subject.

    Frequency = Frequency Density Class Width  

    Usually an examination question will have part of the table filled in and part of the

    histogram drawn. If you look at the information for a bar that is shown on thehistogram and where the frequency is given in the table, you can work out thefrequency density and hence the scale on the y  axis.

    In the question shown on the following page, the interval 10 15h   had thefrequency given in the table as well as the bar drawn so the frequency density wasworked out. The scale was then easy to figure out and the rest straight forward tocomplete.

  • 8/20/2019 Revisionguide - Stats

    29/89

      29

     

  • 8/20/2019 Revisionguide - Stats

    30/89

      30

    Finding the median from a histogram (interpolation)

    Consider this example.

    Previously (in Unit 1) you were asked to find the class interval the median lies in.

    The total frequency in this case is 200. Using1

    2

    n  

    , we find the median occurs at

    position200 1

    100.52

    . If we work out the cumulative frequencies, we find that

    by the end of the interval 5 6t  , our running total is 94 so 100.5 must be in thenext interval which is 6 8t  .If we were drawing a cumulative frequency curve, the points we would plot for these

    two intervals would be 6,94 and 8,154 . We want to find the time (t) when thecumulative value is 100.5.

  • 8/20/2019 Revisionguide - Stats

    31/89

      31

    Consider this,

    Time (t) Cumulative frequency How far through theinterval…

    6 94

    95 1

    60  

    96 2

    60 

    97 3

    60 

    98 4

    60 

    99 5

    60 

    100 6

    60 

    100.5 6.5

    60 

    .

    .

    .

    8 154

    So to review, 100.5 occurs 6.560

     of the way through the interval.

    We need to find what value of t  is6.5

    60 of the way through the time interval,

    i.e.6.5

    60 of the way between 6 and 8.

    6.52 0.216...

    60 , add this to 6 to find our value of t ,

    6 0.216... 6.216... i.e. about 6.2  

    602

     Verify the median area forthis question is 64.1 (3s.f)

  • 8/20/2019 Revisionguide - Stats

    32/89

      32

    Time series

     You will be required in a GCSE Statistics exam to;  Calculate an n-point moving average

     

    Plot the moving averages on a time series graph  Draw a trend line (possibly find equation of it)  Describe the trend  Calculate the mean seasonal variation for a particular quarter  Use the mean season variation and your trend line to calculate an estimate for

    that quarter in the following year

     A trend line should go through as many of the moving averages as possible and onlygo within the data range (You may have to extend it in a later part of the question).

    Trend should be described as; increasing, decreasing, fluctuating or no real trend.

    Once you have calculated the mean seasonal variation for a given quarter, you canuse it to predict the sales for that quarter in the next year. Your trend line will give anestimate of what the sales should be and then you just add the mean seasonalvariation and you have your answer.

  • 8/20/2019 Revisionguide - Stats

    33/89

      33

     

  • 8/20/2019 Revisionguide - Stats

    34/89

      34

     

  • 8/20/2019 Revisionguide - Stats

    35/89

      35

    Index numbers

    What are Index Numbers ? An index number is a statistical measure designed tofollow or track changes over a period of time in the price, quantity or value of an itemor group of items.

    Types of Index Numbers

    1. PRICE RELATIVEThe Price Relative is the ratio of the price of a commodity at a given time to its priceat a different time - either before or after the given time.

    E.g. In January 1980 the price of a bar of soap was 40p., whilst in January 1985 itsprice was 60p. If we take January 1980 as the base year, the index for January 1985is calculated as follows;

    quantityIndex number = 100

    quantity in base year  

    60Index number = 100 = 150

    40  

    The percentage sign is usually omitted, and we say that the index is 150 based onJanuary 1980 which is 100. This indicates that the price of the soap has increased by50% over the five year period.

    If January 1985 was taken as the base period, then the price relative index is now:-

    40Index number = 100 = 66.6

    60

     

    i.e. the index is 66.6…% based on January 1985 (which is 100). This indicates thatthe price of soap in January 1980 was 66.6… % of the price in January 1985, oralternatively, the price of soap was 33.3… % less in January 1980.

    If information for a series of years is given, then any year can be used as the baseperiod but it is usually specified in the examination paper.

  • 8/20/2019 Revisionguide - Stats

    36/89

      36

     

  • 8/20/2019 Revisionguide - Stats

    37/89

      37

     

  • 8/20/2019 Revisionguide - Stats

    38/89

      38

    2. CHAIN BASE INDEXThis where index numbers are calculated by using the preceding year's index as thebase for calculating the present year's index.

    e.g. The prices of a commodity in the years 1994 - 1999 are given below:-

     Year 1994 1995 1996 1997 1998 1999

    Price (penceper kg)

    150 170 160 180 225 260

    For each year the index number can be calculated - using 1994 = 100

     Year Calculation Index Number

    1994 100

    1995 170 x 100150

    113.3

    1996160 x 100170

    94.1

    1997180 x 100160

    112.5

    1998225 x 100180

    125

    1999260 x 100225

    115.6

  • 8/20/2019 Revisionguide - Stats

    39/89

      39

    3. WEIGHTED INDEX NUMBER

    If you have a product which is made of different materials, each differing inproportion, then we can calculate an accurate index number for the product based onthe weightings of the materials.

    We can use the formula,

    weighting indexWeighted Index Number =

    weighting

    .

    E.G.

    Index in 2008 Index in 2009

    Product A 71% 100 102

    Product B 29% 100 109

    Find the weighted index number in 2009.

  • 8/20/2019 Revisionguide - Stats

    40/89

      40

     

  • 8/20/2019 Revisionguide - Stats

    41/89

      41

    Spearman’s Rank correlation coefficient

    This is used when a comparison needs to be made between two sets of data to see ifthere is any connection or relationship between the data.

    e.g. You may wish to see if two different groups of people - boys/girls, Year 7/Year11, children/adults - have the same “preferences” or “likes/dislikes”, or whether theyare completely different, or even if there is no connection between the two groups ornot.

    e.g. You may wish to see if two people judging at an event award marks consistentlyor not, or whether people mark work consistently or not.

    Each set of data is ranked in order, giving the largest value rank 1 then the nextlargest rank 2 and so on.

    e.g. Two competitors rank the eight photographs in a competition as follows:-

    Photograph Rank(Judge A)

    Rank(Judge B)

    Differenced (A - B)

    Difference2 d2 

     A 2 4 -2 (-2)2  = 4

    B 5 3 2 (2)2  = 4

    C 3 2 1 (1)2  = 1

    D 6 6 0 (0)2  = 0

    E 1 1 0 (0)2

      = 0F 4 8 -4 (-4)2  = 16

    G 7 5 2 (2)2  = 4

    H 8 7 1 (1)2  = 1

    d2  = 30

     To work out the correlation between the two judges, the following formula is used:

    2

    2

    6 dSRCC = 1

    n n -1

    .

    In this example, “n” is the number of photographs, which equals 8

    26×30

    SRCC = 18 8 -1

    = 0.64 ( 2 d.p.)

    Interpretation

    The coefficient will lie between ± 1. The closer the value is to 1, then the stronger thepositive correlation. The closer the value is to -1 then the stronger the negative

    correlation. If the value is around 0, then there is no correlation. A rough guide canbe found on the following page.

  • 8/20/2019 Revisionguide - Stats

    42/89

      42

     

  • 8/20/2019 Revisionguide - Stats

    43/89

      43

     

  • 8/20/2019 Revisionguide - Stats

    44/89

      44

     

  • 8/20/2019 Revisionguide - Stats

    45/89

      45

    Misleading graphs

     Always check the scales to see if the graph is misleading. For pie charts, a 3D effectdistorts segment size as well as shading.

  • 8/20/2019 Revisionguide - Stats

    46/89

      46

     

  • 8/20/2019 Revisionguide - Stats

    47/89

      47

    Reading and interpreting from a table of Statistics

    Be careful when reading from a table, make sure you look at the right column and usea ruler to make sure you are reading the right line.If the total percentages don’t add up to 100%, this is due to rounding errors.

  • 8/20/2019 Revisionguide - Stats

    48/89

      48

     

  • 8/20/2019 Revisionguide - Stats

    49/89

      49

    Questionnaires

    Designing a questionnaire – The question you ask must have a timeframespecified, for instance, How many hours of T.V. do you watch per week ?The response boxes you provide for the question must cater for every single person.Some of these boxes may be appropriate.

    0 More than … Other Don’t Know

  • 8/20/2019 Revisionguide - Stats

    50/89

      50

    Open questions – Have no suggested answers and gives people chance to reply asthey wish

     Advantage – Allows for a range of answersDisadvantage – Range of response too broad- hard to analyse

    Closed questions – Gives a set of answers for the person to choose from Advantage – Restricts response making it easy to analyse responsesDisadvantage – Will not necessarily cover all responses

  • 8/20/2019 Revisionguide - Stats

    51/89

      51

    Pilot survey (pre-test) – A preliminary test to see if there is a line of enquiry toinvestigate further. It is a small scale replica of the survey / study. It can identify anyproblems with the wording of questions, likely responses etc.

    Reasons to do a pilot study

     

    Show if questions are understandable / clear  Indicates likely answers  Gives an indication of how long it takes to complete  Find errors  Give feedback so alterations can perhaps be made

  • 8/20/2019 Revisionguide - Stats

    52/89

      52

    Leading questions – Avoid questions that infer an opinion such as “Smoking is badfor you. Do you agree?”

    Interviews – One on one conversation which allows any ambiguities the intervieweemay have with the questions to be rectified.

     Advantage – All questions are answeredDisadvantage – Time consuming and expensive

  • 8/20/2019 Revisionguide - Stats

    53/89

      53

    Odds

    Odds are another way of expressing probability. Odds are given as a ratio betweenthe estimated number of failures and the estimated number of successful outcomes.

    The ratio, failures : successes is the odds against an event happening.The ratio, successes : failures is the odds for/on an event happening.

    Odds may be changed into probabilities.

    There are 2 chances offailure to every 1 of

    success, hence for every(2+1) = 3 attempts there

    will be 1 success

  • 8/20/2019 Revisionguide - Stats

    54/89

      54

     Venn diagrams

     A Venn diagram may be used to calculate probabilities. Each region of a Venn diagramrepresents a different set of data. 

    Goes outside the set for Radioand outside the set for

    Television

  • 8/20/2019 Revisionguide - Stats

    55/89

      55

     

  • 8/20/2019 Revisionguide - Stats

    56/89

      56

    Simulation

    It may not be possible to carry out an experiment in order to estimate the probabilityof an event happening. This may be because it is too complex or just undesirable.

    In such cases you can imitate or simulate the problem.

    Simulation is quick and cheap, easily altered and repeatable. There are several waysof introducing randomness to a simulation. You could use; coins, dice or randomnumbers (from your calculator or published tables).

    Usually you have 100 numbers available to you; 00 – 99 inclusive. If the probability ofan event happening is ½, then you can use half of the numbers to simulate it,i.e. 00- 49.

    If the probability of an event happening was 110

     then you could use a tenth of the

    100 hundred numbers, i.e. 00-09.

    #

    To improve the results of a simulation, just do more simulations.

  • 8/20/2019 Revisionguide - Stats

    57/89

      57

    Random number tables can also be used to aid simulations.

    Random numbers can also be generated on your calculator or you could put numbers00 – 99 in a bag and select.Type 100 then SHIFT RAN# (above the decimal point).

    Ignore allnumbers

    >79

  • 8/20/2019 Revisionguide - Stats

    58/89

      58

    Experimental probability

    When estimating the number of times an event might occur, multiply the number oftrials by the probability of it occurring.

    Estimate for no. of times an event may occur = no. of trials probability of it occuring

     

  • 8/20/2019 Revisionguide - Stats

    59/89

      59

    Probability tree diagrams

    Recall for probability tree diagrams; a branch MUST add to 1, when moving alongbranches you multiply but add when selecting the outcomes in the final column thatrelevant to a given event.

     A tip for the exam, do NOT simplify your fractions. The calculator may do this for youso work out the fractions manually; there is then less chance of you making amistake.

    The probability of choosing a red card is 0.4 I pick two cards.Fill in the probability tree diagram below.

    The various outcomes are listed on the right hand side. I can see the probability ofgetting a red and another red is 0.16.I can work out the probability of getting different colour socks can happen in twoways, black and red or red and black, so the probability of getting different socks is0.24 + 0.24 = 0.48.

     Always the case in a GCSE Statistics exam, you the second event will depend on thefirst.

  • 8/20/2019 Revisionguide - Stats

    60/89

      60

     

  • 8/20/2019 Revisionguide - Stats

    61/89

      61

     

  • 8/20/2019 Revisionguide - Stats

    62/89

      62

     

  • 8/20/2019 Revisionguide - Stats

    63/89

      63

    Conditional probability

    Conditional probability is the probability of some event A, given the occurrence ofsome other event B.

    The formula to be used is,  

    P A BP A|B =

    P B

     and is said, the probability of A given B

    is equal to the probability of A AND B over the probability of B.

     You should always put the “given that…” probability in the denominator.Some questions from exam papers are shown below to show how to answer aquestion on conditional probability.

    In this example, the previous part of the question asked, what is the probability Joan

    was late for work. The answer was 0.24.

    In the previous part of this question from a tree diagram you could read off theprobability of a person having tooth decay was (0.02 + 0.09 =) 0.11.

  • 8/20/2019 Revisionguide - Stats

    64/89

      64

    In the previous part of this question, it was worked out the probability of going to

    France was131

    200.

  • 8/20/2019 Revisionguide - Stats

    65/89

      65

    Binomial distribution

    Consider these examples.1. In a series of 5 Test Matches the England cricket captain only won the toss once.2. I bought 8 pens from a shop and 1 of them did not work.3. In a random sample of 100 people, 8 said they would vote for the Green party.4. Over several years a woman gives birth to 6 children, of which 5 were girls.

    Each situation involves an unpredictable event with two possible outcomes. It istraditional to label one outcome as "success" and the other as "failure". The captainmay guess correctly (success) or incorrectly; the pen may work (success) or may notwork; the person may vote Green (success) or for some other party; the child may bea girl (success) or a boy.

    In each situation there are a given number of trials of this event. We call this number

    n . Thus there were 5n   matches, 8n   pens, 100n   voters and 6n   children.

    Each trial of the event is independent (the outcome of one doesn’t affect theoutcome of the other) of the others. The fact that the captain guessed wrongly in thefirst three matches does not make it more (or less) likely that he will guess correctlyin the fourth match. Provided the pens were all the same brand, why should the factthat one works have any effect on another? The voters were selected at random andcould not therefore influence each other. As several English kings discovered, havingseveral princesses does not make it more likely that the next child will be a prince!

    Since the trials are independent the probability of each outcome remains constant. Wecall the probability of a success p  and the probability of failure q .There is a 50% chance that the captain wins any toss so 0.5 p   and 0.5q  .

    There is perhaps a 1% chance that any pen does not work so 0.99 p   and 0.01q  .

    Maybe 5% of people vote Green so 0.05 p   and 0.95q  . Approximately 50% of

    children born are girls so 0.5 p   and 0.5q  . Notice that 1 q p .

    It is important to realise that the number of successes we get in our n  trials dependson chance. The fact that 50% of children born are girls does not mean that in every 6children 3 will be girls. It is possible to get no girls, or all girls, or indeed any numberin between. Common-sense tells us that 3 girls are more likely than 5 girls, but thequestion is how can we calculate the probabilities of getting 0, 1, 2 … 6 girls in 6births. This is where the Binomial Distribution comes in.

    Suppose that we have a situation where:

    there are n  repetitions or "trials" of a random eventeach trial has two possible outcomes, "success" or "failure" ( p  and q )trials are independent (one doesn’t affect the next)the probability of a "success", p, remains constant from trial to trial

  • 8/20/2019 Revisionguide - Stats

    66/89

      66

    What is the probability of obtaining r  successes in n  trials?

    The case 1n   

    When 1n   there is only one trial. There is a probability p  that the trial results in

    success (S) and probability 1 q p  that the trial results in failure (F). We can showthis in two ways – by a tree diagram and a table.

    The case n=2

    There are four possible results from two trials: SS, SF, FS and FF. Because the trialsare independent the probabilities obey the multiplication rule:

    P(S first and S second) = Pr(S first) x Pr(S second) = P(S) × P(S) = p×p = p²

    In fact for n  binomial trials, the probability for each event will be terms of the

    expansion   n 

     p q  . At GCSE Statistics level this is all you are required to know,

    however beyond this a new formula will be introduced using combinations.

    Successes 0 1 Total

    Probability   q p 1  p q 

    Successes 0 1 2 Total

    Probability q² 2pq p² p²+2pq+q²

    Probability

     p 

    Probability2 p 

     pq 

     pq 

    2q 

  • 8/20/2019 Revisionguide - Stats

    67/89

      67

    In a GCSE Statistics exam, the expansion of   n 

     p q   for the relevant value of n  in the

    question will be given to you.

    Whenever you start a binomial distribution question always write down what p  and q  

    are equal to. Remember that p  is the probability of success and q  is 1 p .

    Lets look at an example. Each term in the expansion is explained. Notice the power of p  and the explanation.

    4 4 3 2 2 3 44 6 4  p q p p q p q pq q   

    So if you were asked in a question to work out the probability of exactly 3 successes,

    you would use the term 34 p q  ( p  would be given in the question and 1 q p ).

    If you were asked to find the probability of less than 2 successes, you would interpret

    this as 1 success or 0 successes, so you would work out 34 pq   and 4q   and add the

    results together.

    Exactly 4successes

    Exactly 3successes

    Exactly 2successes

    Exactly 1success

    No (0)successes

  • 8/20/2019 Revisionguide - Stats

    68/89

      68

     

  • 8/20/2019 Revisionguide - Stats

    69/89

  • 8/20/2019 Revisionguide - Stats

    70/89

      70

     

  • 8/20/2019 Revisionguide - Stats

    71/89

      71

     

  • 8/20/2019 Revisionguide - Stats

    72/89

      72

     

  • 8/20/2019 Revisionguide - Stats

    73/89

      73

     

  • 8/20/2019 Revisionguide - Stats

    74/89

      74

     

  • 8/20/2019 Revisionguide - Stats

    75/89

      75

    Standardised scores

    To compare values from different data sets you usually need to set up standardisedscores. For this you will need to know the mean and standard deviation.

    The formula to work out these scores is given as follows,

    score meanStandardised score =

    standard deviation

     

    The mean and standard deviation will be given in the examination question.

    The standardised score indicates how many standard deviations a score is above orbelow the mean. This is very useful when comparing two sets of data.

  • 8/20/2019 Revisionguide - Stats

    76/89

      76

     

  • 8/20/2019 Revisionguide - Stats

    77/89

      77

     

  • 8/20/2019 Revisionguide - Stats

    78/89

      78

    Normal distribution

    The normal distribution is used with data where the mean = median = mode. Thenormal distribution is known as a continuous probability distribution. It takes theshape of a bell and symmetrical about the mean.

    The width of the curve shows how spread out the data is.

    In the picture above, the two distributions have the same mean but the blue curve isless spread out.

    Properties you need to learn for the exam.

    Mean ± 1 standard deviation will contain 68% of the data.

       M  e  a  n

       M  e  a  n

      +    1

      s .   d

       M  e  a  n

      -   1  s .   d

  • 8/20/2019 Revisionguide - Stats

    79/89

      79

    Mean ± 2 standard deviations will contain 95% of the data.

    Mean ± 3 standard deviations will contain 99.8% of the data.

       M  e  a  n

       M  e

      a  n

      +    2

      s .   d

       M

      e  a  n

      -   2  s .   d

       M  e  a  n

       M  e  a  n  +    3

      s .   d

       M  e  a  n

      -   3  s .   d

    99.8%

  • 8/20/2019 Revisionguide - Stats

    80/89

      80

     

  • 8/20/2019 Revisionguide - Stats

    81/89

      81

     

  • 8/20/2019 Revisionguide - Stats

    82/89

      82

     

  • 8/20/2019 Revisionguide - Stats

    83/89

      83

    Choropleth graphs

     A choropleth map shows information as a series of graduated shadings - this caneither be shades of grey or colour.

    They are designed to show statistical data in a series of "multi-coloured" values -moving from smallest to largest - with the lightest colours for the smaller values and,as the values get larger, the colours become darker.

    These maps are useful for showing the following types of information: -

     Average rainfall in inches over a county in a state. Average numbers of cattle per farm across a county.Population density across a constituency.

    There are advantages and disadvantages in these maps.

     AdvantagesThey take statistical information and change it into averages that can be understoodgraphically.

    In this example, the shading changes progressively from 0% black to 100% black toreflect the amount of rainfall across a country.

    0 % Up to 5 cm rainfall

    20% Up to 10 cm rainfall40% Up to 15 cm rainfall

    60% Up to 20 cm rainfall

    80% Up to 25 cm rainfall

    100% Over 25 cm rainfall

    Disadvantages A person's attention can be focussed on the size of the area, rather than the data.Large areas tend to dominate a map, but they usually have the least denselypopulated areas.

    The second disadvantage lies with us - we have difficulty in distinguishing betweenshades of grey or colour.

  • 8/20/2019 Revisionguide - Stats

    84/89

      84

     

  • 8/20/2019 Revisionguide - Stats

    85/89

      85

     

  • 8/20/2019 Revisionguide - Stats

    86/89

      86

    Comparative Pie Charts/Diagrams

    These are used when two sets of data with differing totals are to be compared.Examples could include comparing the costs from one year with the preceding year,or sales in consecutive years etc. As in a normal Pie Chart, the angle of each sector isdetermined by the fraction of the total - where the total is represented by 3600. Thesedifferent totals are represented by differently sized circles. The ratio of the radii is inproportion to the ratio of the amounts (this is equivalent to the area factor).

    e.g. The following agricultural statistics refer to land use, in hectares, of threeparishes. Draw three pie diagrams to compare this data.

    Parish Barley Wheat Woodland Total Land(hectares)

     Appleford 1830 1640 550 4020

    Burnford 645 435 120 1200

    Carnford 320 160 150 630

    Let Carnford be represented by a circle of radius 4 cm.

    To Calculate The Radius of the Circle for Burnford

    The area of the circle for Burnford has to be enlarged by an area factor which is foundas follows:-

     Area factor = Area of Burnford = 1200 = 1.9047.... Area of Carnford 630

     Scale Factor =  Area Factor = 1.38

     Radius of New Circle = 4 x 1.38 = 5.76= 5.8 cm

    To Calculate The Radius of the Circle for Appleford

    By a similar method, the area factor is first found, and then the scale factor.

     Area factor = Area of Appleford = 4020 = 6.3809 Area of Carnford 630

     Scale Factor =  Area Factor = 2.526...

     Radius of New Circle = 4 × 2.526 = 10.104...= 10.1 cm

    In this question any parish can be used as a starting point, and the resulting radii willbe based on this parish.

  • 8/20/2019 Revisionguide - Stats

    87/89

      87

     

    Firstly find the ratio of the areas196400

    107000  then to find the scale factor in which to

    multiply the radius (3cm) by, find the square root of the ratio of the areas and thenmultiply it by the radius (3cm).

    1964003 4.06..cm

    107000  

  • 8/20/2019 Revisionguide - Stats

    88/89

      88

     A pie chart shows proportions but not frequencies.

    Comparative pie charts can be used to compare two sets of data of different sizes.The areas of the two circles should be in the same ratio as the two frequencies.

  • 8/20/2019 Revisionguide - Stats

    89/89