Item code: PSTA15

The Actuarial Education Company © IFE: 2015 Examinations

ActEd Study Materials: 2015 Examinations

Stats Pack

Contents

Introduction 12 Chapters

If you think that any pages are missing from this pack, please contact our administration team by email at [email protected] or by phone on 01235 550005.

Important: Copyright Agreement

This study material is copyright and is sold for the exclusive use of the purchaser. You may not hire out, lend, give out, sell, store or transmit electronically or photocopy any

part of it. You must take care of your material to ensure that it is not used or copied by anybody else. By opening this pack you agree to these conditions.

Item code: PSTA15

© IFE: 2015 Examinations The Actuarial Education Company

All study material produced by ActEd is copyright and is sold for the exclusive use of the purchaser. The copyright is owned

by Institute and Faculty Education Limited, a subsidiary of the Faculty and Institute of Actuaries.

You may not hire out, lend, give out, sell, store or transmit electronically or photocopy any part of the study material.

You must take care of your study material to ensure that it is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In addition, we may seek to take disciplinary action through the

profession or through your employer.

These conditions remain in force after you have finished using the course.

Stats Pack-00: Introduction Page 1


Stats Pack

Introduction

Background Stats Pack was originally developed in response to requests from students who have studied very little statistics before in their schooling and for whom Subject CT3 is a significant jump. It now forms part of the syllabus for the Actuarial Common Entrance Test (ACET). How to use the Stats Pack Don’t be put off by the size of this course! The style of this pack is deliberately chatty – its purpose being to ensure that you understand the concepts, as doing so will make remembering and applying the results far easier. The earlier parts of some chapters are pitched deliberately low so that those with a non-mathematical background (or those who haven’t studied maths for a while) can quickly get into it. If you find it too easy, skip through it and try the questions! Do however take the time to try the questions. This will make a real difference to your understanding (especially if you try them before looking at the solutions). You will find extra practice questions at the end of each chapter to enable you to consolidate what you have learnt. Many of these questions are from the CT3 exam.

Stats Pack-00: Introduction


Stats Pack Online Classroom

Please note that this Stats Pack comes with complimentary access to the Stats Pack online classroom. This is a series of pre-recorded tutorials covering the main points from the course with examples as well as a dedicated forum for queries staffed by tutors. To access the online classroom please visit:

https://learn.bpp.com

You should have received an email with your access details. If you have lost this then enter your username (which is your email address used by ActEd) and click the “Forgotten your password?” to have a new password emailed to you. Should you have any problems with accessing the online classroom then please do email our admin team at [email protected].

Queries and feedback We have worked hard to ensure the Stats Pack is clear and accessible and we honestly believe that the Stats Pack will be an invaluable aid in helping you to get to grips with the fundamentals of statistics. However, if you find that anything is still unclear please post your queries in the forum in the Stats Pack Online Classroom or alternatively, you can post your query in the “FAC and StatsPack” forum at www.ActEd.co.uk/forums (or use the link from our homepage at www.ActEd.co.uk). If you have any feedback on this course then please do email [email protected]. Thanks.

ACET Mock Exam

A practice exam containing questions of the same standard as the ACET exam can be found in the FAC online classroom.

Stats Pack: Index Page 1


Stats Pack Index Addition rule for mutually exclusive events ....................... Ch4 p6 Addition rule for non-mutually exclusive events ................ Ch4 p9 Attribute data ....................................................................... Ch1 p4 Bar chart .............................................................................. Ch1 p9 Bernoulli distribution........................................................... Ch8 p7 Binomial distribution ........................................................... Ch8 p11 Bivariate data ....................................................................... Ch12 p2 Boxplot ................................................................................ Ch1 p23 Ch3 p18 Categorical data ................................................................... Ch1 p2 Central moment ................................................................... Ch3 33 Ch7 p38 Coefficient of skewness ....................................................... Ch7 p35 Ch9 p33 Combinations ....................................................................... Ch6 p8 Combinations to calculate probabilities .............................. Ch6 p11 Comparison of data.............................................................. Ch3 p39 Complementary events ........................................................ Ch4 p4 Conditional probability ........................................................ Ch4 p15 Ch5 p10 Continuous uniform distribution ......................................... Ch10 p2 Continuous random variables .............................................. Ch9 p3 Correlation ........................................................................... Ch12 p4 Correlation coefficient ......................................................... Ch12 p10 Covariance ........................................................................... Ch12 p7 Cumulative distribution function ......................................... Ch7 p11 Ch9 p11 Cumulative frequency curve ................................................ Ch1 p20 Cumulative frequency table ................................................. Ch1 p8 Dichotomous data ................................................................ Ch1 p4 Discrete data ........................................................................ Ch1 p3 Discrete random variables ................................................... Ch7 p3 Discrete uniform distribution .............................................. Ch8 p2 Dotplot ................................................................................. Ch1 p19

Stats Pack: Index


Expectation Of a continuous random variable ............................. Ch9 p17 Of a discrete random variable .................................. Ch7 p17 Of a function of a continuous random variable ....... Ch9 p22 Of a function of a discrete random variable ............ Ch7 p20 Of linear functions of random variables .................. Ch7 p22 Ch9 p24 Explanatory variable ............................................................ Ch12 p3 Exponential distribution ...................................................... Ch10 p10 Frequency density ................................................................ Ch1 p12 Frequency distribution ......................................................... Ch1 p5 Grouped frequency distribution ........................................... Ch1 p6 Histogram ............................................................................ Ch1 p10 Independent events .............................................................. Ch4 p11 Interpolation ......................................................................... Ch2 p31 Interquartile range From a frequency distribution .................................. Ch3 p10 From a grouped frequency distribution ................... Ch3 p13 From a list ................................................................ Ch3 p5 Using cumulative frequency .................................... Ch3 p16 Line of best fit ...................................................................... Ch12 p15 Lineplot ................................................................................ Ch1 p19 Location ............................................................................... Ch2 p1 Lower quartile ...................................................................... Ch3 p5 Mean From a frequency distribution .................................. Ch2 p9 From a grouped frequency distribution ................... Ch2 p11 From a list ................................................................ Ch2 p7 Of a discrete random variable .................................. Ch7 p17 Of a continuous random variable ............................. Ch9 p16 Median From a frequency distribution .................................. Ch2 p18 From a grouped frequency distribution ................... Ch2 p20 From a list ................................................................ Ch2 p15 Of a continuous random variable ............................. Ch9 p18 Of a discrete random variable .................................. Ch7 p18 Using cumulative frequency .................................... Ch2 p21

Stats Pack: Index Page 3


Mode From a frequency distribution ................................. Ch2 p4 From a grouped frequency distribution ................... Ch2 p5 From a list ................................................................ Ch2 p3 Of a continuous random variable ............................. Ch9 p20 Of a discrete random variable .................................. Ch7 p19 Moment................................................................................ Ch2 p25 Ch3 p33 Ch7 p37 Ch9 p34 Multiplication rule for independent events .......................... Ch4 p11 Mutually exclusive events ................................................... Ch4 p5 Negative correlation ............................................................ Ch12 p4 Nominal data ....................................................................... Ch1 p4 Normal distribution General probability .................................................. Ch11 p22 Moments .................................................................. Ch11 p7 PDF .......................................................................... Ch11 p3 Probabilities for any normal distribution ................. Ch11 p26 Standard normal ....................................................... Ch11 p8 Standard normal probabilities .................................. Ch11 p9 Standardising ........................................................... Ch11 p24 Numerical data ..................................................................... Ch1 p2 Ordinal data ......................................................................... Ch1 p4 Permutations of all objects .................................................. Ch6 p4 Permutations of some objects .............................................. Ch6 p5 Poisson distribution ............................................................. Ch8 p21 Positive correlation .............................................................. Ch12 p4 Probability ........................................................................... Ch4 p2 Probability density function ................................................ Ch9 p6 Probability distributions ...................................................... Ch7 p4 Probability functions ........................................................... Ch7 p5 Probability tree diagrams ..................................................... Ch5 p5 Qualitative data .................................................................... Ch1 p2 Quantitative data .................................................................. Ch1 p2

Stats Pack: Index


Random variable .................................................................. Ch7 p3 Range From a frequency distribution .................................. Ch3 p3 From a grouped frequency distribution ................... Ch3 p4 From a list ................................................................ Ch3 p2 Regression line .................................................................... Ch12 p20 Residual ............................................................................... Ch12 p21 Response variable ................................................................ Ch12 p3 Sample space ....................................................................... Ch4 p2 Scatterplot ............................................................................ Ch12 p2 Skewness ............................................................................. Ch1 p26 Ch1 p34 Ch3 p34 Ch7 p32 Ch9 p30 Negative skew .......................................................... Ch2 p26 Positive skew ........................................................... Ch2 p26 Standard deviation From a frequency distribution .................................. Ch3 p26 From a grouped frequency distribution ................... Ch3 p29 From a list ................................................................ Ch3 p20 Of a continuous random variable ............................. Ch9 p26 Of a discrete random variable .................................. Ch7 p26 Standard normal distribution ............................................... Ch11 p8 Probabilities ............................................................. Ch11 p9 Stem and leaf diagram ......................................................... Ch1 p17 Transformation of data ........................................................ Ch2 p28 Ch3 p37 Tree diagrams ...................................................................... Ch5 p5 Upper quartile ...................................................................... Ch3 p5 Uniform distribution (continuous) ....................................... Ch10 p2 Uniform distribution (discrete) ............................................ Ch8 p2 Variance From a list ................................................................ Ch3 p24 Of a continuous random variable ............................. Ch9 p26 Of a discrete random variable .................................. Ch7 p26 Of a linear function of random variables ................. Ch7 p30 Ch9 p28 Waiting time for a Poisson distribution ............................... Ch10 p21

Stats Pack-01: Statistical diagrams Page 1


Chapter 1

Statistical diagrams

Links to CT3: Chapter 1 Sections 1.1 – 1.6, 4.1 Syllabus objectives: (i)1. Summarise a set of data using a table or frequency distribution, and display it

graphically using a line plot, a bar chart, histogram, stem and leaf plot, or other elementary device.

0 Introduction

The whole basis of this course is that we will be dealing with data (that is information or facts) such as claim types and amounts, number and age of deaths and so on. We will then summarise these data† using diagrams (Chapter 1) and analysing them using averages and measures of spread (Chapter 2). We can then take it a step further: we use these figures to construct statistical models that fit the data we observe. An insurance company can then make predictions about future claims using these models. † Technically speaking the word data is in fact plural (datum is the singular) and so we use ‘these data’ rather than ‘this data’.

Stats Pack-01: Statistical diagrams


1 Types of data

Before we start summarising data using diagrams we will briefly describe the various types of data that we might meet and give them their mathematical names. First we need to consider whether we are dealing with numbers or not.

data

numerical(ie numbers)

categorical(ie not numbers)

Numerical data consists of numbers (eg 34

1, 0.8, 3.7, , , ) or quantities (eg 1.2 kg,

1 378

4 , ,ms m ). This is why, in some textbooks, you will see numerical data being

referred to as quantitative data. Categorical data consists of non-numerical information (eg sex, eye colour, preferred payment method) and can therefore only take various categories (eg male/female, blue/green/brown/…, cheque/cash/visa/…). In some textbooks, you will see categorical data being referred to as qualitative data. Next we can subdivide the numerical data into two types:

data



discrete continuous



Discrete data is numerical data that can only take particular values. For example, the number of claims can only be whole numbers ( 0,1,2,3, ). We certainly can’t have

2 claims or 3.8 claims! Typically we get discrete data from counting, eg number of actuaries, number of claims, number of deaths. Continuous data is numerical data that can take any value within a specified range. For example, the length of time between claims can take any positive value – it doesn’t have to be a whole number, eg 85 minutes, but it could be 84.6914, etc. Typically we get continuous data from measuring, eg height (cm) or time (secs). Since continuous data can take an infinite number of different values it is usually rounded off when written down, eg to the nearest second.

Question 1.1

(i) For each of the following state whether the data is numerical or categorical: (a) weight (b) place of birth (c) number of claims to be processed (d) nature of car insurance claim (e) age (f) amount of claim. (ii) For the numerical data in part (i), state whether it is discrete or continuous.



We now subdivide the categorical data into 3 types:

data



discrete continuous nominal ordinalattribute(dichotomous)

Attribute (or dichotomous) data is categorical (ie non-numerical) data that has only two categories. For example, claim/no claim, dead/alive or male/female. It is called attribute data as we are simply saying whether the item has this attribute (ie characteristic) or not. Nominal data is categorical (ie non-numerical) data that cannot be ordered in any way. For example, hair colour (blonde, brunette, ginger or black), type of policy (whole life assurance, term assurance or endowment assurance) and nature of claim (fire, theft, accident, earthquake, etc). Ordinal data is categorical (ie non-numerical) data that can be ordered. For example, tidiness (messy, fairly tidy or very neat), build (fat, medium sized or thin), agreement (strongly agree, agree, neither agree nor disagree, disagree, strongly disagree).

Question 1.2

State what type of data is required by each of these questions: (i) Which area do you work in (life, pensions, general, health or investment)? (ii) Did you study mathematics at university? (iii) How would you rate your revision technique (1 excellent, 5 poor)?

Since we will be concentrating on numerical data in the Subject CT3 course, it is the distinction between discrete and continuous data that will be the most important.



2 Summarising data in tables

The first step to summarising a list of data values is to put the values in a table.

2.1 Frequency distributions

Below we have the number of claims reported each day to a small general insurance company over the last 28 working days: 4 2 0 3 2 1 1 4 2 5 0 3 2 1 3 4 3 5 1 2 4 2 3 1 4 2 3 2 This list of data is not very helpful in telling us exactly what is going on. So to help make things clearer we’re going to count how many there are of each number (called the frequency) and then put this into a table. In our list we have two days where 0 claims were reported, five days where only 1 claim was reported, eight days where 2 claims were reported and so on.

Claims reported each day

Frequency

0 2 1 5 2 8 3 6 4 5 5 2

This table is called a frequency distribution as it shows the distribution of the frequencies between the data values (ie how the frequencies are shared out amongst the data values). A frequency distribution is suitable for categorical or discrete numerical data.

Question 1.3

For the frequency distribution above, explain how we can obtain: (i) the number of results obtained (ii) the total number of claims reported.



2.2 Grouped frequency distributions

Below is a list of the age last birthday at death of 30 male policyholders who held life assurance policies with a particular company: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101 The frequency distribution for this data is:

Age at death Frequency 48 1 … … 51 1 … … 57 1 … … 63 1 … … 66 2 67 0 68 1 69 1 70 1 71 0 72 2 73 1 74 1 75 1 76 1 77 2 78 1 79 0 80 1 81 3 82 0 83 1 84 1 85 0 86 1 87 0 88 1 89 0 90 1 … … 94 1 … … 96 1 … …

101 1



As you can see this is particularly unhelpful! Why? Because the data values are too spread out. To counter this we can put the data into groups (called classes). We have 1 result (48) that is between 40 and 49, 2 results (51, 57) that are between 50 and 59, 5 results (63, 66, 66, 68, 69) that are between 60 and 69 and so on.

Age at death Frequency 40 49 1 50 59 2 60 69 5 70 79 10 80 89 8 90 99 3

100 109 1 This table is called a grouped frequency distribution as it shows the distribution of the frequencies between the groups (classes). Continuous data is unlikely to produce any repeats (as the values could be anything) and so we would expect the data values to be spread out. Hence, a grouped frequency distribution is how we should tabulate continuous data.

Question 1.4

A consumer watchdog measures the length of time (to the nearest 1 100 th minute) for

which 30 phone calls to a helpline were put on hold. The results are: 1.45 0.32 1.81 0.90 1.02 2.00 1.63 0.86 8.56 0.78

0.16 3.36 2.70 0.64 1.46 4.29 0.50 3.18 4.64 1.70

2.69 4.20 1.50 3.90 6.20 3.15 4.99 2.05 7.90 9.10 Complete this grouped frequency distribution:

Time (t) Frequency 0 0.5t 0.5 1t 1 2t 2 5t 5 10t



2.3 Cumulative frequency tables

A cumulative frequency table is one where we accumulate (ie add up) the frequencies as we go through each of the data values. For example, using the ages of death given in the previous section we get:

Age at death Frequency Cumulative Frequency

40 49 1 1 50 59 2 3 60 69 5 8 70 79 10 18 80 89 8 26 90 99 3 29

100 109 1 30 What do each of the cumulative frequencies represent? Well the 1 is all the deaths from 40 49 (ie up to age 49), the 3 is all the deaths from 40 59 (ie up to age 59), the 8 is all the deaths from 40 69 (ie up to age 69) and so on. Therefore it would make sense to label the cumulative frequency table as follows:

Age at death Cumulative Frequency

up to 49 1 up to 59 3 up to 69 8 up to 79 18 up to 89 26 up to 99 29 up to 109 30

Question 1.5

Draw up a cumulative frequency table for length of time for which 30 phone calls to a helpline were put on hold using the data from Question 1.4.

A cumulative frequency table is helpful in finding the positions of data values, such as the middle value (the median). This will be covered in Chapter 2.

1 2 3

3 5 8

8 10 18



3 Summarising data in diagrams

Whilst putting data into frequency tables is helpful, a diagram can often make the patterns in the data much clearer. We now look at six types of diagram.

3.1 Bar chart

A bar chart can be drawn for discrete or categorical data. For each data item, we simply draw a ‘bar’ showing its frequency (ie how often that value occurs). A general insurance company has analysed the types of claims it received over the last month. The results are as follows:

Claim type Frequency House theft 57 House fire 48 Car theft 156

Car accident 245 The bar chart for these data is:

0

50

100

150

200

250

300

House theft House fire Car theft Car accident

Types of claims

Fre

qu

ency

Generally, the x-axis is used to show the data items and the y-axis is used to show frequency. However, they can be drawn the other way round. If we are given a list of data values, it is usually easier to put them into a frequency table first and then draw the bar chart.



Question 1.6

ActEd carried out a study into the mock exam results of students who passed their Subject CT3 exam. The results of a randomly selected group of 20 students (who subsequently passed) in their Subject CT3 mock exam are as follows:

72, 70, 71, 74, 68, 69, 71, 72, 70, 75, 71, 72, 71, 71, 72, 69, 74, 70, 72, 71 Draw a bar chart to represent these data.

Bar charts show the shape of the distribution clearly and simply, but are not suitable for continuous data. This is because continuous data can take any value and so we would need a bar for every number! In Subject CT3 we shall be dealing with numerical data (eg number of claims, age of death, amount of claims) and so all the remaining diagrams in this chapter are only suitable for numerical data.

3.2 Histogram

In the last section we used a bar chart to display discrete data. A histogram is similar to a bar chart but is used to display continuous data. Therefore we will use a continuous scale with no ‘gaps’ between the bars. A general insurance company recorded the claim amounts that it received over the last week. The results are as follows:

Claim amount (x) Frequency 0 500x 6

500 1,000x 10

1,000 1,500x 9

1,500 2,000x 8

2,000 2,500x 3

2,500 3,000x 2

3,000 3,500x 1

3,500 4,000x 1



The histogram for these data would be:

0 1,000 2,000 3,000 4,000

claim amount (£)

2

4

6

8

10

Fre

quen

cy

In this case the groups (called classes) all have the same width (called the class width) of £500. However, in practice we may have groups with different widths:


500 1,000x 10 1,000 1,500x 9

1,500 2,000x 8

2,000 4,000x 7

This would mean our diagram would look like this:

0 1,000 2,000 3,000 4,000

claim amount (£)

2

4

6

8

10

Fre

quen

cy



The problem with this diagram is that most people would think that the 2,000 4,000x group has the most claims. Why? Because of its huge area! How

can we draw a ‘fairer’ diagram? Well since the bar for the last group is four times wider than the other bars, we should reduce the height by a factor of four:

0 1,000 2,000 3,000 4,000

claim amount (£)

2

4

6

8

10

Essentially what we are doing is working with the area of the bars instead of the heights (as we did in a bar chart). So for a histogram: area frequency †

But wait a second! We can no longer use ‘frequency’ on the vertical axis, since there are clearly more than 7 4 1.75 claims in the 2,000 4,000x group!

Since we are dealing with rectangular bars the area is height width . Therefore, if we

are given the frequency and the class width of the group we can calculate the height by:

frequencyheight

class width

Definition The frequency density (height) of each bar on a histogram is given by:

frequencyfrequency density

class width



So for our data we get:

Claim amount (x) Frequency Frequency density 0 500x 6 6 500 0.012

500 1,000x 10 10 500 0.02

1,000 1,500x 9 9 500 0.018

1,500 2,000x 8 8 500 0.016

2,000 4,000x 7 7 2,000 0.0035

Hence, our histogram is given by:

0 1,000 2,000 3,000 4,000

claim amount (£)

0.004

0.008

0.012

0.016

0.020

Fre

quen

cy d

ensi

ty

All that has changed from our previous ‘fair’ diagram is the scale on the vertical axis so that the area of each bar is now the frequency (eg for the 0 500x group, the area is 0.012 500 6 , which is the frequency). Note that in general, a histogram is drawn with vertical bars and a continuous scale on the x-axis. However, it can be drawn with horizontal bars instead. In an exam, it is expected that you would use graph paper to draw a histogram. In summary, to draw a histogram we first have to calculate the frequency densities (by dividing the frequencies by the class widths). We then draw the histogram using the frequency densities for the heights. † Technically the area is proportional to the frequency. Thus A k f , however for simplicity we have assumed that 1k for

this section.



Question 1.7

Another general insurance company recorded the claim amounts that it received during the previous month:


250 500x 75 500 1,000x 50

1,000 2,000x 40

2,000 5,000x 30

(i) Calculate the frequency densities for each of the groups. (ii) Hence draw a histogram to represent these data. Once we know the width of the group it is fairly straightforward to then calculate the height of the bar (ie the frequency density) and thus draw the histogram. We are now going to look at how we can calculate the widths for two other ways of grouping continuous data. Continuous data could be rounded eg time to the nearest minute. In which case we could get a group of: 10 19 mins Since the times are rounded to the nearest minute, the smallest value that could be included in this group is 9.5 mins (as this will round up to 10 mins). Similarly, the largest value that could be included in this group is (just below) 19.5 mins (as this will round down to 19 mins). Therefore we get: class width 19.5 9.5 10 mins When we construct our histogram we would actually draw the 10 19 mins bar from 9.5 to 19.5. The only other type of group that we could meet is one that involves ages. In which case we could get a group of: 11 20 years



The problem with this group is that most people give their age last birthday (eg someone who is actually 24 years 9 months would say that they were 24 years old). The lowest age that could be included in this group is 11 years (as it could be the person’s 11th birthday). However, up until the day before your 21st birthday you would still say that you were 20 years old. Therefore the largest age that could be included in this group is (just below) 21 years. So we get: class width 21 11 10 years When we construct our histogram we would actually draw the 11 20 years bar from 11 to 21.

Question 1.8

Write down the class width for each of these groups: (i) £150 £170x where x represents a claim amount (ii) £150 £169 for claim amounts recorded to the nearest £ (iii) £0 £149 for claim amounts recorded to the nearest £ (iv) 30 35 years for age last birthday before the death of an individual.

Question 1.9

A life assurance company has analysed the ages of its current policyholders. All ages are recorded as age last birthday. The results are as follows:

Age Frequency 24 29 72 30 34 80 35 39 100 40 49 80 50 64 75

Draw a histogram of these data.



We can also work backwards from a histogram to get the frequency table. For a bar chart, we just needed to read off the frequencies off the vertical axis. Recall that for a histogram the frequency of each group is the area of its bar. This histogram shows the journey times (in minutes) of employees to their offices:

0 20 40 60 80 100 120

0

1

2

3

4

5

6

7

Fre

quen

cy d

ensi

ty

Journey time (mins)

The first group (0 to 10 mins) has a frequency of: 5 10 50 Question 1.10

Complete the frequency table for the journey times histogram:

Time Frequency 100 t 50

10 20t 20 40t



3.3 Stem and leaf diagram

A stem and leaf diagram is an alternative to a histogram. Here are the ages of 9 individuals in a company: 17 19 19 24 25 27 28 30 31 A stem and leaf diagram splits each data value up into 2 parts as follows:

1 7 9 9

2 4 5 7 8

3 0 1

The single number on the left-hand side is called the stem and the numbers on the right-

hand side are the leaves associated with the stems. For the first row 1 7 9 9 , the

stem is 1 and the leaves are 7, 9 and 9. This row represents the numbers 17, 19 and 19. In this case each number has been split up into tens (stem) and units (leaves). Each of the numbers on the right-hand side represents a data value. To make clear what each value is actually shown we need a key:

Key: 2|4 represents 24

Question 1.11

Write down the data represented by this stem and leaf diagram:

1 7 9 9

2 4 5 7 8

3 0 1

Key: 2|4 represents 2.4

Note how we have arranged the leaves in numerical order. This will allow us to use the diagram to find the middle value (the median) and the values that are a quarter and three-quarters of the way through the data (the lower and upper quartiles). This will be covered in Chapters 2 and 3.

stem leaves



In the previous example each of the numbers had only two digits, eg 24. In cases where we have more digits we can either place more digits on the stem or use rounding on the numbers. For example, a company averages their students’ mock examination results and gets the following data:

56.2, 61.0, 62.8, 63.9, 64.5, 61.8, 59.4, 58.6, 65.1, 62.1, 60.3, 57.9, 62.3, 62.1, 60.7, 59.4, 61.4, 58.7, 63.0, 70.5, 68.3, 61.9, 60.5, 63.2, 64.8

Using a key of 61|4 represents 61.4 we get:

56 2

57 9

58 6 7

59 4 4

60 3 5 7

61 0 4 8 9

62 1 1 3 8

63 0 2 9

64 5 8

65 1

66

67

68 3

69

70 5 Alternatively, rounding each of the data values to the nearest whole number and using a key of 5|8 represents 58 gives:

5 6 8 9 9 9 9

6 0 1 1 1 1 2 2 2 2 2 3 3 3 4 5 5 5 8

7 1



Question 1.12

Represent the following claim amounts on a stem and leaf diagram:

1730, 2480, 3010, 2820, 5390, 6360, 8340, 3710, 2270, 2500, 3450, 4830, 2360, 4340, 7510, 6270, 1750, 2720, 9340, 7550, 11920, 4840, 5670, 930, 2750, 220, 2340, 3510, 4890, 1040, 3410, 5580, 3760

Comment on the shape of the diagram.

Stem and leaf diagrams show the shape of the distribution (like bar charts) but have the advantage of not losing the detail of the original data.

3.4 Dotplot/Lineplot

A dotplot (also called a lineplot) is another alternative to the histogram. Here are the starting salaries (in £000’s) of 7 new students joining a company: 21 23 24 24 25 25 25 27 27 28 We just plot each data value against a number line using a cross or a dot:

20 21 22 23 24 25 26 27 28 29

Salary (£000's)

If there are two or more pieces of data to be plotted against the same number then you use the appropriate number of crosses (or dots) on top of each other.

Question 1.13

Plot the CT3 mock exam results from Question 1.6 on a line plot:

72, 70, 71, 74, 68, 69, 71, 72, 70, 75, 71, 72, 71, 71, 72, 69, 74, 70, 72, 71

Like histograms, dot plots show the shape of the distribution clearly. They also have the advantage of being quick to draw.



A dot plot (or line plot) is often used in the Subject CT3 course to compare the spread (variance) of two or more data sets. They are also commonly used in exam questions as a quick way to check the whether the data set looks like it has come from a normal distribution. This is covered in Chapter 10.

3.5 Cumulative frequency curves

In Section 2.3 we constructed cumulative frequency tables from frequency tables:

Claim amount (x) Frequency

Claim amount (x)Cumulative Frequency

0 500x 6 500x 6 500 1,000x 10 1,000x 16

1,000 1,500x 9 1,500x 25

1,500 2,000x 8 2,000x 33

2,000 4,000x 7 4,000x 40

To obtain a cumulative frequency curve of these data all we do is plot a graph of the cumulative frequencies against the largest claim amount in each group (ie plot 6 against 500).

0

5

10

15

20

25

30

35

40

0 1000 2000 3000 4000

claim size

cum

ula

tive

fre

qu

ency

In this case we can start at zero as this is the lowest possible value the claims can be. In an exam you would be expected to use graph paper to draw this diagram.



Typically we get an S-shaped graph as there tend to be lots of values in the middle (so the cumulative frequency rises quickly here) and few extreme values (so the cumulative frequency rises slowly at the ends). Recall from Section 3.2 that there were various ways of grouping continuous data. We are now going to look at how we plot points for each of these other ways. Continuous data could be rounded eg time to the nearest minute. In which case we could get a group of: 10 19 mins Since the times are rounded to the nearest minute, the largest value that could be included in this group is (just below) 19.5 mins (as this will round down to 19 mins). Therefore we would plot the cumulative frequency against 19.5 mins. For groups involving ages, such as age last birthday: 11 20 years The largest age that could be included in this group is (just below) 21 years. This is because up until the day before your 21st birthday you would still say that you were 20 years old. So we would plot the cumulative frequency against 21 years. Question 1.14

A life assurance company has analysed the ages of its current policyholders. All ages are recorded as age last birthday. The results are as follows:

Age Frequency 24 29 70 30 34 80 35 39 100 40 49 80 50 64 70

Construct a cumulative frequency graph for these data.



We will now use the cumulative frequency curve to make some ‘guesstimates’ about the data. For example, how many of our claims were for less than £750? Reading £750 off of our graph we see that about 11 claims are less than this amount.

0

5

10

15

20

25

30

35

40

0 1000 2000 3000 4000

claim size

cum

ula

tive

fre

qu

ency

Similarly, we could find the amount that 50% of the claims were under by reading off (50% of 40 which is) the 20th value. We see that this is about £1,225.

0

5

10

15

20

25

30

35

40

0 1000 2000 3000 4000

claim size

cum

ula

tive

fre

qu

ency



Question 1.15

Use your cumulative frequency curve from Question 1.14 to estimate: (i) how many policyholders are aged 32 or less (ii) the age under which 75% of the policyholders lie.

3.6 Boxplot

A boxplot (also called a box and whisker plot) is another way of showing data:

25% of data

lower quartile

upper quartile

lowest value

highest value

median

Q1 Q3 M

25% of data 25% of data 25% of data

The rectangle (box) in the middle represents the middle 50% of the data (between the values that are a ¼ and ¾ of the way through the data). The lines (whiskers) extend from the box to the smallest and largest values. The diagram also shows the middle value (called the median). A boxplot is particularly effective when comparing two sets of data, however to draw the diagram we need to calculate the median and the quartiles. Since the median will be covered in Chapter 2 and the quartiles will be covered in Chapter 3 we will deal with this type of diagram at the end of Chapter 3. In the exam it is expected that you would draw a boxplot accurately on graph paper.



4 Using diagrams to compare data

Once we have drawn our diagrams we can use them to interpret the patterns in the data or compare two or more data sets. In Subject CT3 we will be looking at three features of any data set: the location, the spread and the skewness.

4.1 Location

The location of a data set is simply where the data is located – ie where is the centre of the data or about what values is it grouped. In everyday language you may use ‘average’ to describe the location. The stem and leaf diagrams below show the claim amounts (in $’s) under two different types of insurance:

Type A Type B

0 2 7 0 8

1 1 1 3 6 8 9 1 0 2 3

2 3 4 4 4 7 2 1 4 6 8

3 0 5 3 2 3 3 6 9 9

4 1 4 0 1 5

5 2 5 4

Key: 2|5 represents $250

Type A claims are mostly located between $100 and $200 whereas type B claims are located between $200 and $300. So we could say the type B claims are greater on average than type A claims. In Chapter 2, we will use the mean, median and mode to measure the location of a set of data.



4.2 Spread

The spread of a set of data is simply how spread out (ie how variable) the values are. Are the values bunched together or are they very diverse? The dotplots below show the number of telephone calls received in the last six hours in two different departments of the same company:

0 1 2 3 4 5 6 7 8 9 10

Dept A

0 1 2 3 4 5 6 7 8 9 10

Dept B

For Department A, the number of phone calls are all bunched together about 5 per hour, whereas for Department B they are very diverse ranging from zero to ten. So we would say that the number of phonecalls per hour is more spread out in Department B than Department A. In Chapter 3, we will use the interquartile range and standard deviation to measure the spread of a set of data.



4.3 Skewness (shape)

The skewness describes the shape of the distribution – is it symmetrical or not? The more skew the data, the more asymmetrical the distribution is. The histograms below show the ages of the population in two different towns:

age (years)

freq

uenc

y de

nsit

y

Town A population

age (years)

freq

uenc

y de

nsit

y

Town B population

We can see that the population in Town A is skewed (ie not symmetrical) as the ‘hump’ is on the left. However, it is called postively skew as most of the people in the town are to the right of the hump (ie on the positive side). The population in Town B is also skewed (ie not symmetrical) as the ‘hump’ is on the right. We call this negatively skew as most of the people in the town are to the left of the hump (ie on the negative side). Smoother sketches are shown below:

positively skewed symmetrical negatively skewed

In Chapter 3, we will use the third central moment to measure the skewness of a data set.



Question 1.16

The diagrams below show the boxplots for two different distributions:

0 5 10 15 20

Group A

Group B

Compare the location, spread and skewness of these two distributions using the middle lines (the median), the boxes and the whole boxplot, respectively.



Extra practice questions Section 3: Summarising data in diagrams

P1.1 The lengths travelled by snails in 5 mins were measured to the nearest cm. The results are shown in the table below:

Length (cm) Frequency 0 4 4 5 6 7 7 8 15 9 12 23

13 18 11 Calculate the frequency densities that you would need to plot on a histogram for these data.

P1.2 The mortality of males before retirement is being investigated. The age last birthday at death of 500 males was as follows:

Age 5 19 20 29 30 39 40 49 50 54 55 59 60 64 Frequency 3 20 27 63 67 116 204

(i) Draw a histogram to represent these data. Below is a histogram showing the deaths of 500 females in the same age range:

45.2

26.2

15.6

age at death

3.81.50.70.333

5 20 30 40 50 60

10

20

30

40

Fre

quen

cy d

ensi

ty

(ii) Use the two histograms to compare the male and female mortality. (iii) Construct a grouped frequency distribution for the females.



P1.3 The following data shows the times taken (in days) to completely process some simple claims:

8.02 5.11 5.04 3.88 4.76 3.25 4.41 5.19 4.48 6.28

9.12 6.53 5.14 2.57 6.80 7.31 5.71 6.16 7.51 8.58 (i) Display these data in a stem and leaf diagram by rounding to 1 decimal place. (ii) Comment on the shape of the distribution.

P1.4 The length of time (in minutes) for which calls to a helpline were put on hold are given in the following table:

Time (t) Frequency 0 0.5t 2 0.5 1t 5 1 2t 7 2 5t 12 5 10t 4

(i) Construct a cumulative frequency curve for these data. (ii) Use this graph to estimate: (a) how many calls were held for less than 3 minutes (b) the time for which more than 50% of the calls were on hold for.

P1.5 Subject C1, September 1996, Q8 (part) The following table gives the ages of 100 men (in years) in the form of a grouped frequency distribution, where the ages are in groups of width five years, with the exception of the final group. Age last birthday: 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-64 Number of men: 1 2 10 16 22 20 15 14

Draw a histogram of the data. [2]



P1.6 Subject 101, September 2002, Q6 (part) As part of an investigation an insurance company collected data for the year 2000 on claims sizes for all claims on a certain type of motor insurance policy. The resulting data are given below in the form of a grouped frequency distribution.

Claim size (£) Frequency 100 862

> 100 and 200 608

> 200 and 300 1,253

> 300 and 400 1,066

> 400 and 500 558 > 500 1,290 Total 5,637

(i) Calculate the cumulative frequencies and draw a graph of the claim size

distribution function (ie the cumulative frequencies against claim size). [3] (ii) Determine the proportion of claim sizes which are less than £250. [2] [Total 5] Section 4: Using diagrams to compare data

P1.7 The ages of employees in two departments are given below: Marketing 24 25 27 27 28 28 28 29 29 32 Personnel 27 31 35 38 44 44 47 47 47 51 Draw dotplots for each of these departments and hence compare the two departments.



P1.8 Subject 101, April 2002, Q7 The following information on white blood cell count (WBCC) was collected from subjects one week after the start of chemotherapy treatment. One group of subjects (A) received steroids in addition to the chemotherapy treatment and the other group (B) received a placebo in addition to the chemotherapy. The subjects were assigned to the groups at random. Group A — Steroid WBCC (millions of cells per ml)

12.4 15.2 12.7 15.9 12.2 14.2 12.9 14.2 12.4 14.6 12.7 13.6 12.5 13.3 12.1 13.9 17.1 13.6 17.2 13.1

Group B — Placebo WBCC (millions of cells per ml)

17.0 13.5 15.4 14.1 15.4 14.8 12.9 14.4 13.2 13.1 12.9 13.9 13.0 13.6 13.0 13.4 12.9 13.1 14.4 13.8

(i) Construct stem and leaf diagrams for Group A and Group B separately. [2] (ii) Comment on the results in the context of investigating an association between

WBCC and the treatment with or without steroids. [2] [Total 4]



This page has been left blank so that you can keep the chapter summaries together for revision purposes.



Chapter 1 Summary Data Data (ie information or facts) can be subdivided as follows:

data



discrete continuous nominal ordinalattribute(dichotomous)

Discrete data is numerical data that can only take particular values (eg 0,1,2,3, ).

Continuous data is numerical data that can take any value. We can summarise data using tables (frequency distributions) or diagrams. Histograms A histogram is similar to a bar chart but is drawn for continuous data. Therefore it has no gaps between the bars. However, for a histogram the area of the bar gives the frequency of the group (class). The frequency density (the height of the bars) is found from:


class width

where the class width is the difference between the largest and smallest values allowed in the class. Line plots We just plot each data value against a number line using a cross or a dot.



Stem and leaf diagrams A stem and leaf diagram splits each data value up into 2 parts as follows:

1 7 9 9

2 4 5 7 8

3 0 1

Key: 2|4 represents 24

This diagram represents the values: 17, 19, 19, 24, 25, 27, 28, 30, 31. Cumulative frequency diagrams Cumulative frequency is the sum of the frequencies. A cumulative frequency diagram plots the largest possible value in each group against the cumulative frequency. Boxplots

25% of data

lower quartile

upper quartile

lowest value

highest value

median

Q1 Q3 M

25% of data 25% of data 25% of data

Comparing data sets When comparing data sets we look at the location, spread and skewness (shape) of each distribution. The types of skewness are:


stem leaves



Chapter 1 Solutions Solution 1.1

(i) (a) Numerical (eg 75kg, 200g, 3 tons) (b) Categorical (eg London, Glasgow, Bognor Regis) (c) Numerical (eg 12 claims, 193 claims) (d) Categorical (eg theft, fire, accident, hurricane) (e) Numerical (eg 23 years, 65 years) (f) Numerical (eg £180, €2m, $740.99) (ii) (a) Continuous as items can weigh absolutely any positive value. (c) Discrete, as there can only be a whole number of claims (ie 0, 1, 2, …).

(e) Depends! When we give our age we usually give our age last birthday (eg 23 years) which is discrete rather than our exact age (eg 23 years, 3 months, 2 days, 14 hours, …) which is continuous.

(f) Well technically discrete – since you can only have a whole number of

pence (eg £450.62). However, in Subject CT3 we shall treat it as continuous as the numbers involved as often so large (eg £4,267,593.81) that it can take (as good as) any value.

The advantage of treating it as continuous is that we can use continuous

functions (eg 2 2 1y x x ) to calculate amounts (and then just round them to

the nearest pence afterwards). This is much more preferable to awkward functions that would only give whole numbers…

Solution 1.2

(i) Nominal data as we cannot put them in any order. (ii) Attribute data as the answer is yes or no. (iii) Ordinal data as the categories are ordered from poor to excellent.



Solution 1.3

(i) We just total up the frequencies 2 5 8 6 5 2 28 . (ii) From the table, we have 2 days with 0 claims (total 2 0 0 claims), 5 days

with 1 claim (total 5 1 5 claims), 8 days with 2 claims (total 8 2 16 claims), 6 days with 3 claims ( total 6 3 18 claims), 5 days with 4 claims (total 5 4 20 claims) and 2 days with 5 claims (total 2 5 10 claims). This gives us a grand total of 0 5 16 18 20 10 69 claims.

What we are doing is multiplying the frequencies by each data value and then

totalling all of these up. Later we shall write this in shorthand as fx .

Solution 1.4

The completed frequency table is as follows:

Time (t) Frequency 0 0.5t 2 0.5 1t 5 1 2t 7 2 5t 12 5 10t 4

The only problems that might occur are placing 0.5 in 0 0.5t group rather than the 0.5 1t group and not including some of the data values in the table. Crossing off the data values as you put them in the table is a useful way to ensure we don’t miss any values. We could also check that we have the correct total number of results by adding up the frequencies.



Solution 1.5

The cumulative frequency table is:

Time (t) Cumulative Frequency

0.5t 2 1t 7 2t 14 5t 26

10t 30 Or we could use “up to 0.5 mins”, “up to 1 min”, etc as the groups. Solution 1.6

Putting this data into a frequency table:

Mock result Frequency 68 1 69 2 70 3 71 6 72 5 73 0 74 2 75 1

It is now easy to draw the bar chart:

0

1

2

3

4

5

6

7

68 69 70 71 72 73 74 75

Frequency

Moc

k r

esu

lts



Solution 1.7

(i) Using


class width we get:

Claim amount (x) Frequency Frequency density

0 250x 60 60 250 0.24 250 500x 75 75 250 0.3

500 1,000x 50 50 500 0.1

1,000 2,000x 40 40 1,000 0.04

2,000 5,000x 30 30 3,000 0.01

(ii) The histogram is:

0 1,000 2,000 3,000 4,000

claim amount (£)

0.05

0.1

0.15

0.2

0.25

Fre

quen

cy d

ensi

ty

0.3

5,000



Solution 1.8

(i) The group ranges from exactly £150 to (just below) £170. Hence: £170 £150 £20class width

(ii) Since the amounts are rounded to the nearest £, the smallest value that could be

included in this group is £149.50 (as this would round up to £150). Similarly (treating the amounts as continuous) the largest value that could be included in this group is (just below) £169.50. Hence:

£169.50 £149.50 £20class width

(iii) This is very similar to part (ii) except that we can’t get claims smaller than £0.

Therefore the smallest value that could be included is £0. Hence: £149.50 £0 £149.50class width

(iv) The smallest age that could be included in this group is 30 years (as the person

could have died on their 30th birthday). However, if a person dies up until the day before their 36th birthday we would still say they were age 35. Hence the largest age that could be included is (just below) 36 years. This gives:

36 30 6class width years



Solution 1.9

First we need to calculate the frequency densities. The first class goes from age 24 to (just below) age 30 so the class width is 30 24 6 . Similarly, the second class goes from age 30 to (just below) 35 so the class width is 35 30 5 and so on.

Age Frequency Frequency density 24 29 72 72 6 12 30 34 80 80 5 16 35 39 100 100 5 20 40 49 80 80 10 8 50 64 75 75 15 5

Now we can draw the histogram, remembering to draw start and end points of the bars at the correct values (eg the first bar should be drawn from ages 24 to 30):

age (years)20 30 40 50 600

5

10

15

20

Fre

quen

cy d

ensi

ty



Solution 1.10

Using the fact that the frequency is given by the area: ( )frequency area height frequency density class width

We get:

Time Frequency 0 10t 50

10 20t 7 10 70 20 40t 4 20 80 40 70t 2.5 30 75 70 120t 0.5 50 25

Solution 1.11

The data represented by the stem and leaf diagram is: 1.7 1.9 1.9 2.4 2.5 2.7 2.8 3.0 3.1



Solution 1.12

Rounding each of the claims to the nearest 100 we get a stem and leaf diagram of:

0 2 9

1 0 7 8

2 3 3 4 5 5 7 8 8

3 0 4 5 5 7 8

4 3 8 8 9

5 4 6 7

6 3 4

7 5 6

8 3

9 3

10

11 9

Key: 4|8 is 4,800.

The data is concentrated at the lower end ie there are many claims for small amounts and few claims for high amounts. This is known as positively skewed. We will meet this in Section 4 of this chapter and also in Chapter 3. Solution 1.13

The dot plot (or line plot) is as follows:

68 69 70 71 72 73 74 75

mock exam mark



Solution 1.14

We first need to calculate the cumulative frequencies:

Age Cumulative Frequency

29 70

34 150

39 250

49 330

64 400

Since the first group started at age 24 the graph can start from this value. However don’t get caught out! Age 29 goes all the way up to (just before) age 30. Therefore the first cumulative frequency should be plotted against 30, the second against 35 and so on.

0

100

200

300

400

20 30 40 50 60

age (years)

cum

ula

tive

fre

qu

ency



Solution 1.15

(i) We use our graph from Solution 1.14 to read off 32 years:

0

100

200

300

400

20 30 40 50 60

age (years)

cum

ula

tive

fre

qu

ency

We can see that roughly 100 policyholders are younger than this. (ii) 75% of 400 policyholders is 300. So reading off the 300th value:

0

100

200

300

400

20 30 40 50 60

age (years)

cum

ula

tive

fre

qu

ency

We can see that this is roughly 45½ years.



Solution 1.16

Using the middle line (the median) on each boxplot to compare the locations, we see that Group A is located at 8 and Group B is located at 7. Therefore on average the values in Group A are higher than Group B. Using the boxes to measure the spread, we see that Group A has a smaller spread than Group B. Looking at the whole boxplot, we see that Group A is roughly symmetrical whereas Group B is positively skew (as most of the data values are to the right of the middle value).



Solutions to extra practice questions

P1.1 Since the lengths are rounded to the nearest cm, the first group ranges from 0 cm to 4.5 cm. Similarly the second group ranges from 4.5 cm to 6.5 cm and so on. This gives:

Length (cm) Frequency 0 4 4 4.5 0.89 5 6 7 2 3.5 7 8 15 2 7.5 9 12 23 4 5.75

13 18 11 6 1.83 Note that if we were constructing the histogram we would draw the first bar from 0 to 4.5, the second bar from 4.5 to 6.5 and so on.



P1.2 (i) Using


class width we get:

Age (years) Frequency Frequency

density 5 19 3 3 15 0.2

20 29 20 20 10 2 30 39 27 27 10 2.7 40 49 63 63 10 6.3 50 54 67 67 5 13.4 55 59 116 116 5 23.2 60 64 204 204 5 40.8

The first bar is drawn from 5 to 20, the second bar from 20 to 30 and so on:

40.8

23.2

13.4

age at death

6.32.7

5 20 30 40 50 60

10

20

30

40

Fre

quen

cy d

ensi

ty

20.2

(ii) The mortality for this group of males is much higher in the 20 49 age range

and lower in the 50 64 age range than the mortality for this group of females. So it appears that on average males die at a younger age. Both male and female ages at death have negatively skewed distributions.

(iii) Using ( )frequency area height frequency density class width we get:

Age (years) Frequency

5 19 0.333 15 5 20 29 0.7 10 7 30 39 1.5 10 15 40 49 3.8 10 38 50 54 15.6 5 78 55 59 26.2 5 131 60 64 45.2 5 226



P1.3 (i) Rounding each of the values to 1 decimal place, we get:

2 6

3 3 9

4 4 5 8

5 0 1 1 2 7

6 2 3 5 8

7 3 5

8 0 6

9 1

Key: 3|9 represents 3.87

(ii) The data appears to be symmetrical about roughly 5 days.



P1.4 (i) The cumulative frequency table for these data is:


0.5t 2 1t 7 2t 14 5t 26

10t 30

Since the data starts at 0 our cumulative frequency curve will start from there. The next point would be at (0.5, 2) and so on.

0

10

20

30

0 2 4 6 8 10

time (mins)

cum

ula

tive

fre

qu

ency

(ii) (a) Reading 3 mins off the graph gives about 19 phone calls. (b) Reading off 15 (50% of 30) gives about 2¼ minutes.

0

10

20

30

0 2 4 6 8 10

time (mins)

cum

ula

tive

fre

qu

ency



P1.5 First we need to calculate the frequency densities:

Age Frequency Frequency density 20 24 1 1 5 0.2 25 29 2 2 5 0.4 30 34 10 10 5 2 35 39 16 16 5 3.2 40 44 22 22 5 4.4 45 49 20 20 5 4 50 54 15 15 5 3 55 64 14 14 10 1.4

Now we can draw the histogram, remembering to draw the bars the correct widths as well (eg the first bar should be drawn from ages 20 to 25).

age (years)20 30 40 50 600

1

2

3

4

Fre

quen

cy d

ensi

ty

5



P1.6 (i) The cumulative frequencies are shown in the following table:

Claim size (£) Cumulative frequency

100£ 862 200£ 1,470 300£ 2,723 400£ 3,789 500£ 4,347 £ • 5,637

Claims start from £0 hence our cumulative frequency curve will start from there. Once again note that the values are plotted at the end of each group:

0

1000

2000

3000

4000

5000

6000

0 100 200 300 400 500

claim size (£)

cum

ula

tive

fre

qu

ency

Since we don’t know what the largest claim is we simply draw a line to indicate the maximum cumulative frequency that can be attained.



(ii) Reading £250 off the cumulative frequency curve to see how many values are less than this we get:

0

1000

2000

3000

4000

5000

6000

0 100 200 300 400 500

claim size (£)

cum

ula

tive

fre

qu

ency

From the graph we can see that about 2,075 claims are less than £250 (well it would be if it was drawn on graph paper). This would give a proportion of 2,0755,637

37% .

Alternatively, using the original frequency table, £250 is halfway through the

200 300and group. So half of the 1,253 values in this group will be less

than £250. In addition, the 862 and 608 values in the first two groups are also less than £250. Hence 862 608 626.5 2,096.5 values are less than £250.

So the proportion of claim sizes less than £250 is 2,096.55,637

37.2% . This method

is called interpolation and will be met in more detail in the next chapter.



P1.7 The dot plot (or line plot) for each department is:

20 25 30 35 40 45 50 55

Marketing

20 25 30 35 40 45 50 55

Personnel

We can see that the ages of those working in Personnel are higher on average than those working in Marketing. The spread of the ages of those working in Personnel is wider than Marketing. Finally, the ages appear to be fairly symmetrical in Marketing whereas they are negatively skewed for Personnel.

P1.8 (i) For Group A: For Group B:

12 1 2 4 4 5 7 7 9 12 9 9 9

13 1 3 6 6 9 13 0 0 1 1 2 4 5 6 8 9

14 2 2 6 14 1 4 4 8

15 2 9 15 4 4

16 16

17 1 2 17 0

Key: 13|1 is 13.1

(ii) Group A seems to have slightly more data at lower values, so the results for

group A are slightly lower on average than group B. However, group A is slightly more spread out than group B (from 12.1 to 17.2 whereas group B ranges from 12.9 to 17.0). Finally, both distributions are positively skewed. So overall it seems that the treatment with or without steroids is pretty much the same.



by Institute and Faculty Education Limited, a subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire out, lend, give out, sell, store or transmit electronically or

photocopy any part of the study material.





Stats Pack-02: Sample calculations 1 Page 1


Chapter 2

Sample calculations 1

Links to CT3: Chapter 1 Sections 2.1-2.3 Syllabus objectives: (i)2. Describe the level/location of a set of data using the mean, median, mode, as

appropriate.

0 Introduction

Below is a list of the age last birthday at death of 30 male policyholders who held life assurance policies with a particular company: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101 This list is not very helpful in telling us what exactly is going on. In Chapter 1 we used diagrams to make sense of the data, such as the simple dot plot below:

40 50 60 70 80 90 100 110

Age of death (yrs)

We could also look at the location of the distribution, the spread of the distribution and the shape of the distribution (skewness). Recall that the location gives the ‘centre’ or ‘average’ of a set of data.

Stats Pack-02: Sample calculations 1


In this chapter we will find a single numerical value to summarise the location of the entire data set. That is, a single figure that will tell us whereabouts the data is grouped (ie a ‘typical’ value to represent the data). We will cover the three measures of location of a sample data set: the mode, the median and the mean.



1 Sample mode

1.1 Sample mode from a list

Here are the salaries of 7 individuals in a company (in £000’s): 18 21 25 25 25 25 30 If I asked you to give one salary that summarised these results, it’s quite likely that you would say £25,000. Why? Because most of the employees earn £25,000. This summary figure is called the mode of the data – it is simply the data value that appears most often (ie the most frequent value). You may also see the mode referred to as the modal value.

Question 2.1

Below are the numbers of new actuarial students taken on in 2003 by six pension companies: 8 5 19 3 6 5 Find the modal number of new actuarial students employed.

The mode is very easy to obtain and is not affected by extreme values (eg 19 in the above question), however, there are a couple of problems that limit its usefulness. These are illustrated in the next question.

Question 2.2

Find the mode of each of the following data sets: (i) 6 4 7 5 4 6 (ii) 1 2 3 4 5

Since the mode may not exist or may not be unique, we will not be making much use of the mode as a measure of location.



1.2 Sample mode from a frequency distribution

We will now look at how we can calculate the mode from a table of results (a frequency distribution) that we used in Chapter 1 to summarise a large data set. Recall that the mode was the data value that occurred most often. Take this set of 20 values: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5 It is clear to see that the mode is 3 as there are more of this number than any other. Putting this set of data into a frequency table:

Value, x 1 2 3 4 5 Frequency, f 3 4 6 5 2

We can see that the number 3 has the highest frequency (since it occurs most). This gives us the method of finding the mode from a frequency table: find the value with the highest frequency. Question 2.3

The number of personal pension reviews completed by a student each day over the last four weeks are given below:

Reviews completed in a day 4 5 6 7 8 Frequency 5 7 4 3 1

(i) James thinks that the modal amount of reviews completed in a day is 8 as it is

the highest number. What has he done wrong? (ii) What is the correct mode?



1.3 Sample mode from a grouped frequency distribution

At the beginning of this chapter we had a list of the age last birthday at death of 30 male policyholders who held life assurance policies with a particular company: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101 This data is best suited to a grouped frequency table (as there are 25 different values with hardly any repeats). Age 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99 100 – 109

Frequency 1 2 5 10 8 3 1

If we do not have the original list of data we will not be able to tell which value occurs most. For example, looking at the frequency table above, you would not be able to tell that 81 is the mode! All we can do in this situation is to state the modal group, which in this case is the 70 – 79 group. Question 2.4

A general insurance company records the amount claimed on the last 100 claims on a particular type of car insurance. The results were:

Claim Amount, c No. of claims 0 £500c£ < 6

£500 £1,000c£ < 11

£1,000 £1,500c£ < 49

£1,500 £2,000c£ < 26

£2,000 £5,000c£ < 8

State the modal group.



1.4 Sample mode summary

In summary: Advantages Easy to calculate

Unaffected by extreme values (see Question 2.1)

Disadvantages May not be unique (see Question 2.2 (i))

May not exist (see Question 2.2 (ii))

Does not use all the data values

Cannot be used in further calculations

May only be able to obtain a modal group



2 Sample mean

2.1 Sample mean from a list

Looking again at the salaries of 7 individuals in a company (in £000’s): 18 21 25 25 25 25 30 We could share out the salaries equally between the 7 individuals as a way of finding the ‘average’ or ‘centre’ salary. This gives:

18 21 25 25 25 25 30 169

24.1437 7

+ + + + + + = =

So this gives a salary of £24,143 each. This method has the advantage of using all the data values and we can see that this gives a value slightly less than the mode of £25,000 because there were two people who earned less than this compared to one who earned more. This summary figure is called the mean of the data and this is what most people would call the ‘average’.

Question 2.5

The sizes of ten car claims received by an insurance company were: £1,500 £1,820 £840 £260 £2,100 £790 £530 £1,360 £1,780 £1,650 Find the mean car insurance claim amount.

The formula Suppose we have a sample of n values 1 2, , , nx x x .

We add these numbers up and divide by how many data values there are (ie n):

1 2 nx x x

n

+ + +



Using the sigma notation for summation, this becomes:

1

n

ii

x

n=Â

Although it is usually abbreviated to:

orix x

n nÂ Â

Definition The sample mean, x , is given by:

x

xn

= Â

This formula is given on page 22 of the Tables. Whilst the mean is a vast improvement over the mode as a measure of the ‘centre’ of the data, it still can give some dodgy answers. The next question illustrates this:

Question 2.6

(i) Below are the numbers of new actuarial students taken on in 2003 by six pension companies:

8 5 19 3 6 5

Find the mean number of new actuarial students employed. (ii) Below are the salaries (in £000’s) of eight individuals in a small company: 12 12 12 12 12 12 12 50 Find the mean salary. (iii) What are the problems with the values obtained in (i) and (ii)?



Despite these problems, the mean is still used as the main measure of ‘location’ throughout the actuarial exams. This is mainly due to the fact that the sample mean has a number of properties that make it useful in further calculations. These will be covered in Chapters 8 and 9 of the Subject CT3 course.

2.2 Sample mean from a frequency distribution

We will now look at how we can calculate the mean from a frequency distribution (ie a table of results) that we used in Chapter 1 to summarise a large data set. Recall that we calculated the sample mean, x , by first adding up all the data values and then dividing the total by how many values there were. Take this set of 20 values: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5 We would find the mean by:

(1 1 1) (2 2 2 2) (3 3 3 3 3 3) (4 4 4 4 4) (5 5)

20

592.95

20

x+ + + + + + + + + + + + + + + + + + +=

= =

Surely there must be a quicker way? There is! How about we say we have three 1’s and four 2’s and so on? The calculation then becomes:

(3 1) (4 2) (6 3) (5 4) (2 5) 592.95

20 20x

¥ + ¥ + ¥ + ¥ + ¥= = =

Notice that we are multiplying each value by its frequency. Notice also that the total number of values is given by the total of the frequencies 3 4 6 5 2 20+ + + + = .

Question 2.7

Use the ‘shortcut method’ to calculate the mean of this set of data: 2, 2, 2, 4, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, 8, 10, 10, 10, 10, 12, 12, 14



So, how does this relate to a frequency table? Well, we are given the values and their frequencies in the table so we can work out the total in the same way as above (by multiplying each of the values by their respective frequencies) and then dividing by how many values there are (the total of the frequencies):


(3 1) (4 2) (6 3) (5 4) (2 5) 59

2.953 4 6 5 2 20

x¥ + ¥ + ¥ + ¥ + ¥= = =

+ + + +

Common Error: Students often confuse dividing by the total number of values (which is obtained by totalling the frequencies) with dividing by the number of groups.

Question 2.8

The frequency table shows the number of claims made on 100 car insurance policies in the last year. Calculate the mean number of claims per policy:

Number of claims per policy 0 1 2 3 Frequency 74 19 5 2

Formula In our table we have, say, m different values 1 2, , , mx x x with frequencies

1 2, , , mf f f . To find the mean we multiplied the frequencies by the corresponding

data values and divided by the total of the frequencies:

1 1

1

m m

m

f x f xx

f f

+ +=+ +

Writing this using the sigma notation for summation we get:

fxx

f= ÂÂ



2.3 Sample mean from a grouped frequency distribution

Now suppose we want to find the mean from this grouped frequency distribution:

Claim Amount, c Frequency 0 £500c£ < 6

£500 £1,000c£ < 11

£1,000 £1,500c£ < 49

£1,500 £2,000c£ < 26

£2,000 £5,000c£ < 8

Before when we calculated the mean from a frequency table, we multiplied the values by the frequency. The question now is which value in each group do we multiply the frequency by? Well the natural choice would be the middle of each group. We will use the midpoint of each group. We find the midpoint by averaging the largest and smallest possible value in each group. So the midpoint for the 0 £500c£ < group

is 0 500

£2502

+ = . Similarly, the midpoints for the other groups are

£750, £1,250, £1,750, and £3,500 . The mean claim amount is then:

(6 250) (11 750) (49 1,250) (26 1,750) (8 3,500)

6 11 49 26 8

144,500£1,445

100

x¥ + ¥ + ¥ + ¥ + ¥=

+ + + +

= =

Question 2.9

The heights, in cm, of thirty actuaries are recorded below. Find their mean height.

Heights, h Frequency 150 160h£ < 4 160 170h£ < 6 170 175h£ < 11 175 180h£ < 7 180 195h£ < 2



When calculating the midpoint of groups with rounded data and ages we will need to take care that we do use the correct largest and smallest possible values for that group. For example, when values are rounded to the nearest cm, the 10 19- cm group ranges from 9.5 cm to (just below) 19.5 cm. Hence, the midpoint would be:

9.5 19.514.5

2

+ = cm

Similarly, when age last birthday is used, the 10 19- years group ranges from 10 years to (just below) 20 years old. Hence, the midpoint would be:

10 2015

2

+ = years

Question 2.10

The table below contains the age last birthday at death of 30 male policyholders who held life assurance policies with a particular company:

Age 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99 100 – 109

Frequency 1 2 5 10 8 3 1

Find the mean age of the policyholders. Note that when we use the midpoint we assume that the values are evenly spread through the group. This is not necessarily the case. For example, the actual data values for Question 2.10 were: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101

The true mean of these values is 2, 277

75.930

x = = years.

This is slightly different to the mean value obtained in Question 2.10. Hence, the mean using midpoints is just an estimate – it is the best we can do without having the original list of data.



2.4 Other questions involving the mean

There are a couple of other questions that could be asked about the mean. For example, we could be given the mean and be asked to calculate a single value or the sample total. The following question covers each of these possibilities:

Question 2.11

(i) The mean age of death of 12 assurance policyholders was 72. What was the total age of the 12 policyholders?

(ii) The mean of the following list of investment returns is 4.2%. 5% 4.75% 3.6% % 3.25%x

Find the value of x. (iii) A small department employs ten actuaries; their mean salary is £48,000. When

an eleventh actuary joins the department the mean salary of all the actuaries drops to £45,800. Find the salary of the new employee.

(iv) The mean sum assured on 12 term assurances was £50,000 whereas the mean

sum assured on 8 endowment assurances was £30,000. Calculate the mean sum assured on all 20 policies.



2.5 Sample mean summary

In summary: Advantages Uses all the data values

Has properties that make it useful in further calculations (see Subject CT3 Chapter 6)

Disadvantages Can give impossible figures for discrete data (see Question 2.6 (i))

Affected by extreme values (see Question 2.6 (ii))

Can only be estimated when using grouped data



3 Sample median

3.1 Sample median from a list

Whilst the mean is the preferred measure of location it is affected by extreme values. So what we need is a measure that is unaffected by extreme values (unlike the mean) and that always exists and is unique (unlike the mode). Consider the heights of the five individuals below:

Billy Bertie Barry BorisBart

If I asked you to give me the person with the ‘typical’ height of these individuals you would probably choose Bart (despite his name). Why? Because he has the middle height. This gives us our third and final measure of location – the median – the middle value. So how do we calculate the median in practice? Here is a list of 5 numbers: 9 7 2 9 4 So the median is the middle value which is 2!?! Clearly not! We need to put the numbers in increasing order first (like the heights above) – otherwise the number in the middle of the list is not necessarily the middle value numerically! This gives: 2 4 7 9 9 So all we have to do now is locate the middle value – well simply “counting in” from both ends we arrive at the number 7. 2 4 7 9 9 median



Question 2.12

Find the median of these sums assured (£000’s) by a certain life assurance company: 125 75 25 20 50 25 50 15 30

What happens if we have an even number of data values? Consider the following list: 5 9 11 3 6 12 Firstly, rearranging them in order gives: 3 5 6 9 11 12 “Counting in” to the middle and we see that the middle lies between 6 and 9 3 5 6 9 11 12 median All we need is the value that is halfway between 6 and 9. This is 7½ – so the median is 7½. If you have trouble finding the value halfway between the two middle numbers, just

find the average of them, ie 6 9

7½2

+ = .

Question 2.13

Find the median of the following data set:

9 1 4 10 15 5 3 9

Now if we have a long list of numbers the last thing we would want to do is ‘count in’ to find the middle. Hence, we need to find a shortcut to locate the middle value.



Suppose we have 6 numbers surely the median would just be the 6 2 3rd∏ = number (ie the midpoint). But we can see the median is actually the 3½th number: 3 6 8 9 11 15 Similarly the middle value for 5 numbers is not the 5 2 2½th∏ = value but the 3rd

number. 2 4 7 9 9 Each time we divide the number of values by 2 we then add an extra ½: 6 numbers median 6 2 ½ 3½th= ∏ + = value 5 numbers median 5 2 ½ 3rd= ∏ + = value So in general for n numbers, we have:

Definition

The sample median of a set of ungrouped data, is the ( )1 12 2 thn + value*.

Question 2.14

At the beginning of this chapter we had a list of the age last birthday at death of 30 male policyholders who held life assurance policies with a particular company: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101 Find the median age last birthday at death of these policyholders.

Since the median is unaffected by extreme values it can be a useful alternative to the mean, but its lack of useful mathematical properties means it cannot be used further. So the use of the mean as a measure of location is more widespread. * There is actually a lot of mathematical disagreement over how the median is “best” defined but this is the version that is used in this course, the Indian entrance exam (ACET) and is one of the accepted versions used in Subject CT3.



3.2 Sample median from frequency distribution

The median of this set of 20 values would be the ½ 20 ½ 10½¥ + = th value: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5 median So the median for this list of numbers is 3. Well, how does this work if these values were displayed in a frequency table?


Firstly, the numbers are already in numerical order – so we don’t need to worry about that. So next we need to find out n (how many numbers there are) so we can use it in our handy formula. For data given in a frequency table, the total number of values, n, can be found by totalling up the frequencies:

3 4 6 5 2 20n f= = + + + + =Â

So, as before, we’re going to find the ½ 20 ½ 10½¥ + = th value. Since the frequencies tell us how many of each number there are, we need to count through the frequencies until we get to the 10½th value and then we need to see what number it is:


So the 10½th value is one of the 3’s (as they are the 8th to the 13th value in the list). So the median is 3, which is what we found earlier.

3 values so far

3 4 7+ = values now in total

3 4 6 13+ + = values now in total

so the 10½th value is in here!



Question 2.15

The frequency table shows the number of claims (of a particular type) made each week in the last year.

Number of claims per week 0 1 2 3 4 5 Frequency 5 7 15 12 9 4

(i) Calculate the median number of claims per week. (ii) In the first two weeks of the following year, 3 and 5 claims were made. Add

these values to the frequency distribution and find the new median of these 54 results.

Common Error: Some students confuse the total number of values, n, with the number of groups. Also students find the position using the ½ ½n + rule but then forget to look up the value!

See Appendix A for the difference between the median and the midpoint.



3.3 Sample median from a grouped frequency distribution

When we have a grouped frequency distribution, we are unable to calculate the median exactly as we don’t know the true value of the middle observation. So we will use a similar method to how we calculated the mean from a grouped frequency table. We will assume that the values are evenly spread over the ranges and so the median will be the ½n th “value” – ie the value which splits the distribution into two equal halves. Below is a table with 100 claims. So the median is the 50th claim amount. So just like before we need to count through the frequencies until we find the 50th claim amount:


£500 £1,000c£ < 11

£1,000 £1,500c£ < 49

£1,500 £2,000c£ < 26

£2,000 £5,000c£ < 8

We can see that the 50th claim amount is somewhere in the £1,000 £1,500c£ < group.

We are assuming that all of the 49 values in the £1,000 £1,500c£ < group are spread

out evenly. We counted 17 values before we got to the £1,000 £1,500c£ < group. So the median

is the 50 17 33rd- = value in this group.

Since there are 49 values altogether in this group – the median is 33

49 of the way

through this group. Our group ranges over £500 so if the values are spread out evenly over this group we’d expect the median to be:

33

500 £336.7349

¥ =

Greater than the lowest value in the group. Well, since this group started at £1,000, the median is: £1,000 £336.73 £1,336.73+ =

This method that we have used is called linear interpolation and basically assumes that the values are spread out linearly (ie uniformly) and so we can just use the proportions to find the median value. An alternative method is given in Appendix B.

6

6 11 17

6 11 49 66



Definition

For grouped data the sample median is the 12 thn value.

Question 2.16

The heights, in cm, of thirty actuaries are recorded below. Find their median height.


In general, we find the fraction representing the position of the median in the group, then:

( ) ( )median lowest value in group fraction class width= + ¥

When calculating the width of groups with rounded data and ages we will need to take care that we do use the correct largest and smallest possible values for that group. This is exactly what we did with the mean in section 2.3. Question 2.17


Age 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99 100 – 109

Frequency 1 2 5 10 8 3 1

Find the median age of the policyholders. Since we assume that the values are evenly spread through the group when using linear interpolation our median is an estimate. For example, in Question 2.14 we found the median of the actual data values used in the table in Question 2.17 to be 76.5 years. Notice that the medians are slightly different.



3.4 Calculating the sample median using cumulative frequencies

Recall from Chapter 1 that a cumulative frequency table is one where we accumulate (ie add up) the frequencies as we go through each of the data values. For example, using the table from Question 2.17 we get:


40 49- 1 1 50 59- 2 3 60 69- 5 8 70 79- 10 18 80 89- 8 26 90 99- 3 29

100 109- 1 30 This is grouped data so the median is the ½n th value. From this table it is then quick to see that there are 30 values in total (the last cumulative frequency figure) and that the median (the 15th value) is in the 70 79- group. However we will need the original frequency of the 70 79- group to estimate where the median is using linear interpolation. Question 2.18

A consumer watchdog measures the length of time (in minutes) for which 40 phone calls to a helpline were put on hold. The cumulative frequency is given below:


0.5t < 4 1t < 11 2t < 20 5t < 34

10t < 40 Estimate the median length of time that calls were placed on hold.



We can also use the cumulative frequency curve to find the median. Below is the cumulative frequency curve for Question 2.18.

The median is the 20th value – so we simply read off the 20th value to see what the time was. This gives us a median of about 2 minutes.



3.5 Sample median summary

In summary: Advantages Unaffected by extreme values (see Question 2.12)

Disadvantages Does not use all the data values





4 Sample moments

Recall the definition of the sample mean was:

1

ixnÂ

This is actually a member of a more general group of summary statistics called sample moments.

Definition The kth order sample moment is given by:

1 k

ixnÂ

So the first order sample moment, 1

ixnÂ , is the sample mean.

Question 2.19

Here are the salaries of 7 individuals in a company (in £000’s): 18 21 25 25 25 25 30 Find the second order sample moment of this data set.

We would find moments from frequency or grouped frequency distributions in exactly the same way as we did when calculating the mean. Sometimes these are called moments about the zero (or non-central moments) to distinguish them from central moments, which we will meet in Chapter 3.



5 Mode, mean, median and skewness

Recall that in Chapter 1, we defined the skewness to be the shape of the distribution – or more accurately how asymmetrical it is. There are two types of skewness, which are shown in the diagram below:


What we are interested in here is where the mean, mode and median would be located on each of these distributions. Starting with a symmetrical distribution:

1 2 3 4 5

Symmetrical

The data values for this dotplot are: 1 2 2 3 3 3 3 4 4 5

The mode is 3, the mean is 3010 3= and the median is the 5½th value, which is also 3. So

for a symmetrical distribution the mode, mean and median are all in the centre.

modemean

median



Now considering a positively skew distribution:

1 2 3 4 5

Positively skew


The mode is 2, the mean is 2710 2.7= and the median is the 5½th value, which is 2.5. So

for a positively skew distribution we have:

mode

median

mean

Why should this be? Well the mode is where the ‘peak’ is (as most values are at this point), the median is in the middle of all the values and the mean is ‘pulled’ to the right by the few large values.

Question 2.20

What would the position of the mean, mode and median be for a negatively skew distribution?



6 Transforming data sets

Suppose we have the following data set: 1 2 5 6 8 8 The mean of this data set is:

1 2 5 6 8 8 30

56 6

x+ + + + += = =

Now if we add, say, 3 to each of the numbers in the data set what happens to the mean? Well the new data set is: 4 5 8 9 11 11 So the new mean is:

4 5 8 9 11 11 48

86 6

x+ + + + += = =

The mean has also gone up by 3 (we have a mean of 8 instead of 5)! Now what happens to the mean if we multiply each of the values by, say, 5? 5 10 25 30 40 40 So our new mean is:

5 10 25 30 40 40 150

256 6

x+ + + + += = =

The mean has also been multiplied by 5 (we have a mean of 25 instead of 5)!

Question 2.21

The mean of n values is x . What would the new mean be if we multiplied each of the values by a and then added b to them?

A proof of this general result is found in Appendix C.



7 Appendix A – medians and midpoints

When calculating the mean from a grouped frequency distribution we used the midpoint as the ‘middle value’ of each group rather than the median. Why? Consider a particular group, eg 10 20x£ < . If we think the values in this group are: 10 11 12 13 14 15 16 17 18 19

We would get a median of 14.5. This is different to the midpoint of 10 20

152

+ = .

The reason for this is that the data is continuous – we are not limited to just the integer values. Now we assumed that the values in the group are spread uniformly over all the possible values. In which case we should be looking at the ‘spaces’ between the integer values:

'spaces'

integer values10 12 14 16 18 2011 13 15 17 19

midpoint

Counting the ‘spaces’ inwards we get to the midpoint of 15. So the midpoint is effectively the median for continuous data (assuming that the data is spread evenly over the whole range). This is also the reason why we calculate the median differently for data from a grouped frequency distribution – as we treat the data as continuously spread out.



8 Appendix B – linear interpolation

In Section 3.3, we used linear interpolation to calculate the approximate position for the median within a group. This assumed that the values in the group were spread out linearly and so we just used proportions to find the median value. Consider the example in Section 3.3 where the median is the 33th “value” (out of 49 values) into the £1,000 £1,500c£ < group.

An alternative way of thinking about the position of the median is to consider a number line:

£1,000 £1,500median

0 33 49 position

Claim amount

The proportion of the distance along the line needed to find the median:

33 0 33

49 0 49

- =-

The proportion of the distance along the line that the median’s value is:

1,000 1,000

1,500 1,000 500

m m- -=-

For linear interpolation, we assume that the values are spread out linearly over the group. This would mean that the proportions would be equal. Hence:

1,000 33

500 49

m - =

Rearranging this we get:

33

1,000 500 £1,336.7349

m = + ¥ =

This results in the same answer as before but approaches the problem in a slightly different way. Use the method that makes most sense to you.



9 Appendix C – proof of the transforming data result

Consider a set of n values: 1 2, , , nx x x

The mean of these values is:

1 2 in xx x xx

n n

+ + += = Â

Suppose we multiply each of the values by a and then add b: 1 2, , , nax b ax b ax b+ + +

The mean, y , of this new data set will be:

1 2

1 2

( ) ( ) ( )

( )

n

n

i

i

ax b ax b ax by

n

a x x x nb

n

a x nb

n

xa b

n

ax b

+ + + + + +=

+ + + +=

+=

= +

= +

Â

Â

We will meet this result again in Chapter 7 when we look at the expected mean of a theoretical population using random variables.



Extra practice questions Section 2: Sample mean

P2.1 Subject 101, April 2003, Q1 (part) Sickness and absence records were kept on 30 employees in a company over a 91-day period. These data are tabulated below:

Number of employees absent 0 1 2 3 4 5 Number of days 44 19 10 8 7 3

Calculate the sample mean of the number of employees absent per day. [1]

P2.2 The journey times (in minutes) of 300 employees to a company’s office are below:

Time (mins) Frequency 0 10t£ < 50

10 20t£ < 70 20 40t£ < 80 40 70t£ < 75 70 120t£ < 25

Calculate the mean journey time.

P2.3 Subject C1, April 1996, Q9 (part) Shortly before close of trading on a particular day an insurance office has sold 8 new policies. The sample mean of the sums assured has been calculated, in units of £1,000, as 31.5. Another policy for £60,000 is then sold just before the close. Calculate the sample mean of the full set of 9 sums assured. [1]



P2.4 Subject 101, September 2000, Q2 (part) Consider a random sample of 47 white-collar workers and a random sample of 24 blue-collar workers from the workforce of a large company. The mean salary for the sample of white-collar workers is £28,470; whereas the mean salary for the sample of blue-collar workers is £21,420. Calculate the mean of the salaries in the combined sample of 71 employees. [1] Section 3: Sample median

P2.5 Subject C1, September 1995, Q1 (adjusted) A random sample of 15 motor windscreen claim amounts (in £) is given by: 121 107 139 72 123 114 215 156 100 136 169 89 115 153 111 What is the median claim amount? [2]

P2.6 Subject 101, September 2001, Q1 (part) Data were collected on 100 consecutive days for the number of claims, x, arising from a group of policies. This resulted in the following frequency distribution:

x 0 1 2 3 4 5≥ f 14 25 26 18 12 5

Calculate the median for these data. [1]



P2.7 Subject C1, September 1997, Q9 (part) The table below shows a grouped frequency distribution for 100 claim amounts on a certain class of insurance policy.

Claim Amount Frequency under £100 4

£100 – 149.99 10 £150 – 199.99 25 £200 – 249.99 30 £250 – 299.99 15 £300 – 349.99 12 £350 – 399.99 4 £400 or over 0

Determine an approximate value for the median of these claim amounts. [4] Section 5: Location and skewness

P2.8 Subject C1, Specimen 1993, Q2 For a particular class of insurance policy the distribution of claim amounts is positively skewed. Which of the following statements about the claim amount distribution is true? A mode > median > mean

B mean > median > mode

C median > mode > mean

D mean > mode > median [2]



Section 6: Miscellaneous past exam questions

P2.9 Subject 101, September 2000, Q1 A random sample of fifty claim amounts (£) arising in a particular section of an insurance company’s business are displayed below in a stem and leaf plot:

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

14678 0233368889 0000001233457888 3456779 0257 0 3 07 3 3 8 2

Stem unit = 100 Leaf unit = 10 The sum of the fifty amounts (before rounding) is £92,780. Calculate the mean and median claim amounts. [2]



Chapter 2 Summary Location The location of a data set is a single value that is representative of that data set. Mode The mode is the data value that appears most often. To obtain it from a frequency distribution we find the value or group with the greatest frequency. Mean The sample mean, x , is given by:

x

xn

= Â

To calculate the mean from a frequency distribution we use:

fx

xf

= ÂÂ

For a grouped frequency distribution, we use the midpoint for the x values. Median The median is the middle value of a data set arranged in order. It is the: (½ ½)n + th data value

For a grouped frequency distribution the median is the ½n th “value”. we can then use interpolation to estimate the median within a group.



Moments The kth order sample moment is given by:

1 k

ixnÂ

The sample mean is the first order moment. Location and skewness For a symmetrical distribution the mode, mean and median coincide. For skewed distributions their locations are:

postive skew negative skew

mode

median

mean

mode

median

mean

Transforming data Given a sample mean of x , if we multiply each of the sample values by a and then add b the mean of these new values is: ax b+




The mode is 5 as this value appears twice whereas all the other values occur only once. Solution 2.2

(i) Both 4 and 6 appear the most frequently. So we have two modes. A data set with two modes is called bimodal.

(ii) All numbers occur once – so no value occurs more often than the rest. Hence,

we say that there is no mode. Solution 2.3

(i) It is not the highest number that is the mode, but the number with the highest frequency.

(ii) The mode is 5 reviews as it has the highest frequency (7). Solution 2.4

The modal group is £1,000 £1,500c£ < as it has the highest frequency (49).

Solution 2.5

1,500 1,820 840 260 2,100 790 530 1,360 1,780 1,650sample mean

10

12,630

10

£1,263

+ + + + + + + + +=

=

=



Solution 2.6

(i) 23

8 5 19 3 6 5 467

6 6x

+ + + + += = =

(ii) 34

12 12 50 13416

8 8x

+ + += = =

(iii) In part (i) we have an average of 237 students employed. Since we can’t employ

23 of a student (unless you want to be picky and talk about part-time) this figure

is impossible to obtain in real life. This is rather similar to the average of about 2.2 children in a family.

In part (ii) it seems strange to say that the average salary is £16,750 when nearly

everyone earns £12,000. This is because the mean is affected by the extreme value of £50,000.

Solution 2.7

56

(3 2) (5 4) (6 6) (3 8) (4 10) (2 12) 14 1646

24 24x

¥ + ¥ + ¥ + ¥ + ¥ + ¥ += = =

Solution 2.8

(74 0) (19 1) (5 2) (2 3) 350.35

74 19 5 2 100x

¥ + ¥ + ¥ + ¥= = =+ + +



Solution 2.9

The midpoints of the groups are: 155 165 172.5 177.5 187.5 Therefore, the mean is:

(4 155) (6 165) (11 172.5) (7 177.5) (2 187.5)

4 6 11 7 2

5,125170.8 cm

30

x¥ + ¥ + ¥ + ¥ + ¥=

+ + + +

= =

Solution 2.10

The 40 49- group ranges from 40 to (just below) 50, so the midpoint is 45 years. Similarly, the midpoints for the remaining groups are: 55 65 75 85 95 105 Hence, the mean is:

(1 45) (2 55) (5 65) (10 75) (8 85) (3 95) (1 105)

1 2 5 10 8 3 1

2,30076.7 years

30

x¥ + ¥ + ¥ + ¥ + ¥ + ¥ + ¥=

+ + + + + +

= =

Solution 2.11

(i) 72 72 12 864 yearsx x

x xn n

= fi = fi = ¥ =Â Â Â

(ii) 5 4.75 3.6 3.25 16.6

4.25 5

21 16.6

4.4%

x x xx

n

x

x

+ + + + += fi = =

fi = +

fi =

Â



(iii) Before:

48,000 480,00010

x xx x

n= fi = fi =Â Â Â

After:

45,800 503,80011

x xx x

n= fi = fi =Â Â Â

The difference between these totals is the 11th value, which is £23,800. (iv) The key to solving this problem is to find the total sums assured for the term and

the endowment policies. We can then get the grand total of all the policies to find the overall mean.

Let x be the term assurances and y the endowment assurances. This gives:

12 50,000 £600,000x = ¥ =Â

8 30,000 £240,000y = ¥ =Â

Hence, the grand total is £600,000 £240,000 £840,000+ = . This gives a mean

of all 20 policies to be:

840,000

£42,00020

=

Solution 2.12

Putting the values in order gives: 15 20 25 25 30 50 50 75 125 Counting in from both ends gives: 15 20 25 25 30 50 50 75 125 median



Solution 2.13

Placing the values in order: 1 3 4 5 9 9 10 15

We see that the 5 9

median 72

+= = .

Solution 2.14

There are 30 ages, so the median is the ½ 30 ½ 15½¥ + = th value. Putting the ages in order: 48 51 57 63 66 66 68 69 70 72 72 73 74 75 76 77 77 78 80 81 81 81 83 84 86 88 90 94 96 101 We can see that the median is 76½ years. Solution 2.15

(i) There are 52 weeks, so the median is the ½ 52 ½ 26½¥ + = th value.

Value, x 0 1 2 3 4 5 Frequency, f 5 7 15 12 9 4

Hence, the median is 2 claims per week.

5 values so far






(ii) We now have 54 values, so the median is the ½ 54 ½ 27½¥ + = th value.


Now the 27th value is a 2 and the 28th value is a 3. Therefore the 27½th value (the median) is 2½ claims per week.

Note that this is the only time that we would get an answer between two data values in a frequency distribution – when the median is exactly between the two ‘groups’.

Solution 2.16

There are 30 actuaries, so the median is the ½ 30 15¥ = th value.

Claim Amount, c Frequency 150 160h£ < 4 160 170h£ < 6 170 175h£ < 11 175 180h£ < 7 180 195h£ < 2

So the 15th value is the 5th value into the 170 175h£ < group. Since there are 11 values in this group and the width is 5 cm, the median is:

5

5 2.2711

¥ = cm into this group.

Hence, the median is: 170 2.27 172.27+ = cm

5 values so far



4

4 6 10

4 6 11 21



Solution 2.17

There are 30 ages, so the median is the ½ 30 15¥ = th value.

Age 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99 100 – 109

Frequency 1 2 5 10 8 3 1

So the 15th value is the 7th value into the 70 79- group. Since there are 10 values in this group and the width is 10 years (not 9), the median is:

7

10 710

¥ = years into this group

Hence, the median is:

7

70 10 7710

+ ¥ = years

Solution 2.18

There are 40 phonecalls, so the median is ½ 40 20¥ = th value.


0.5t < 4 1t < 11 2t < 20 5t < 34

10t < 40 We can see that the median (20th value) is exactly at the end of the 2t group. So we would say the median is 2 minutess.

1 3 8 18



Solution 2.19

The second order sample moment is 21ix

nÂ which is:

2 2 2 2 2 2 21 4,16518 21 25 25 25 25 30 595

7 7È ˘+ + + + + + = =Î ˚

Do be careful about the units – it is not £ since we have squared the values – so it is £². Also it was measured in 1,000’s so now it will be 1,000². Solution 2.20

The positions are reversed for a negatively skew distribution. For example:

1 2 3 4 5

Negatively skew


The mode is 4, the mean is 3310 3.3= and the median is 3.5. So we have:

mode

median

mean

Solution 2.21

The new mean would be ax b+ . See Appendix C for a proof of this result.




P2.1 The sample mean is given by:

(44 0) (19 1) (10 2) (8 3) (7 4) (3 5) 1061.165

44 19 10 8 7 3 91x

¥ + ¥ + ¥ ¥ ¥ + ¥ + ¥= = =+ + + + +

P2.2 Using the midpoints of 5, 15, 30, 55, 95 we get the mean of:

(50 5) (70 15) (80 30) (75 55) (25 95) 10,20034

50 70 80 75 25 300x

¥ + ¥ + ¥ ¥ ¥ + ¥= = =+ + + +

P2.3 For the eight policies, we have:

8 31,500 252,000x = ¥ =Â

When we add the ninth policy of £60,000 we get:

252,000 60,000 312,000x = + =Â

Therefore the sample mean of the nine policies is:

312,000

£34,666.679

xx

n= = =Â

P2.4 Using w for the salary of a white-collar worker and b for the salary of a blue-collar worker, we get:

28, 470 28, 470 47 1,338,09047

ww w= = fi = ¥ =Â Â

21, 420 21, 420 24 514,08024

bb b= = fi = ¥ =Â Â

So the overall mean is:

1,338,090 514,080 1,852,170£26,086.90

47 24 71

xx

n

+= = = =+

Â



P2.5 First of all we need to put the claim amounts in order:

72 89 100 107 111 114 115 121 123 136 139 153 156 169 215 The median is the ½ 15 ½ 8th¥ + = value which is £121.

P2.6 The median is the ½ 100 ½ 50½¥ + = th value. So counting through the frequencies:

x 0 1 2 3 4 5≥ f 14 25 26 18 12 5

So the 50½th value is 2 claims.

P2.7 The median is the ½ 100 50¥ = th value. So counting through the frequencies:


£100 – 149.99 10 £150 – 199.99 25 £200 – 249.99 30 £250 – 299.99 15 £300 – 349.99 12 £350 – 399.99 4 £400 or over 0

The 50th value is the 11th value in the £200 £249.99- group. Using interpolation, we get the median to be:

11

200 49.99 £218.3330

+ ¥ =

14 values

39 values

65 values

4

14

39

69



P2.8 In Section 5 we had the following diagram:

mode

median

mean

Hence, we can see that answer B is correct.

P2.9 Using the sum of the fifty amounts given in the question:

£92,780£1,855.60

50x = =

The median is the ½ 50 ½ 25½¥ + = th value. Counting through the leaves, we see that this lies between the 3 and the 4 on the 17 stem (ie between 1,730 and 1,740). Hence, the median is £1,735.



Chapter 3

Sample calculations 2

Links to CT3: Chapter 1 Sections 2.1-2.4, 3.1, 3.3, 3.4, 4 Syllabus objectives: (i)3. Describe the spread/variability of a set of data using the standard deviation,

range, interquartile range, as appropriate.

0 Introduction

We now know how to find the ‘average’ of a set of data, so why on earth would we want to know anything else about the data? Well here are two sets of data:

0 1 2 3 4 5 6 7 8 9 10

Group A

0 1 2 3 4 5 6 7 8 9 10

Group B

These are clearly very different distributions – but they both have a mean of 5! Group B is very spread out whereas Group A is bunched together. So we need some way to tell us how spread out the numbers are from the mean. In this chapter we will look at three measures of spread: the range, the interquartile range and the standard deviation.



1 Range

1.1 Range from a list

The first way to measure the spread of the data is to find the values over which the data range. For example, the data for group A range from 3 to 7, so they range over 7 3 4- = numbers. Unsurprisingly we call this the range.

0 1 2 3 4 5 6 7 8 9 10

Group A

Range

Question 3.1

Calculate the range for Group B.

If we have data values 1, , nx x the formula for the range is:

Range biggest smallest

max{ } min{ }i iii

x x

= -

= -

The bigger the range, the greater the difference there is between the biggest and smallest numbers and so the more spread out the data must be. However, this is a pretty crude measure as it only uses two of the data values to decide how spread out they all are. The next question shows the problem with this measure.

Question 3.2

Find the range of these salaries from a small company: £12k, £12k, £12k, £12k, £12k, £12k, £12k, £50k

The problem with the last question is that it uses the £50k figure, which is not representative of the whole of the staff (as it is probably the manager’s salary). This sort of problem is to be expected as the range uses the extreme values (the maximum and minimum) and in many data sets the extreme results are not the norm!



So we need a better measure of the spread that doesn’t use the extreme values. We will consider two alternatives later.

1.2 Range from a frequency distribution

We will now look at how we can calculate the range from a frequency distribution (a table of results) that we used in Chapter 1 to summarise a large data set. For this set of 20 values: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5 the range is given by 5 1 4- = . How does this work if the values are displayed in a frequency table?

value, x 1 2 3 4 5 frequency, f 3 4 6 5 2

We can easily see that the biggest value of x is 5 and the smallest value is 1. Hence, the range is 5 1 4- = as before. Question 3.3

The number of personal pension reviews completed by a student each day over the last four weeks are given below:

reviews completed in a day 4 5 6 7 8 frequency 5 7 4 3 1

(i) Janet thinks that the range of reviews completed in a day is 7 1 6- = . What has

she done wrong? (ii) What is the correct range?



1.3 Range from a grouped frequency distribution

The age last birthday at death of 30 male policyholders who took out life assurance with a particular company at the same time are as follows: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101 This data is best suited to a grouped frequency table (as there are 25 different values with hardly any repeats). Age 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99 100 – 109

Frequency 1 2 5 10 8 3 1

If we do not have the original list of data we will not be able to tell the biggest and smallest values. For example, looking at the frequency table above, you would not be able to tell that the largest value is 101 and the smallest value is 48! Therefore we are unable to find the range in this situation. However, if we wanted to have some idea of the range, we could say that the maximum possible range is the upper bound of the highest group less the lower bound of the lowest group - in this case, 110 40 70- = .

1.4 Range summary

In summary: Advantages Easy to calculate

Disadvantages Affected by extreme values (see Question 3.2)

Does not use all the data values


Cannot be found exactly from a grouped frequency distribution (see above)



2 Interquartile Range

2.1 Interquartile range from a list

The problem with the range was that it only used the end values. What can we use instead? How about the values that are ¼ and ¾ of the way through the data (called the lower quartile and upper quartile respectively)? So how do we work out the values that are ¼ and ¾ value of the way through the data? We use a similar method to how we worked out the value that was ½ way through the data (the median) in Chapter 2.

Recall that the median, M, was the ( )1 12 2 th valuen + . This gave us the number that was

half-way through all the results. So, to get the number that is a quarter of the way through the results (the lower quartile) we should just halve the formula:

( )1 11 4 4lower quartile th valueQ n= + *

Similarly, to get the value that is ¾ of the way through the results (the upper quartile), the formula should be three times the lower quartile formula:

( )3 33 4 4upper quartile th valueQ n= +

Let’s put this into practice and find the upper and lower quartiles of this data set: £50 £120 £40 £30 £15 £50 £20 First we need to put the amounts in order: £15 £20 £30 £40 £50 £50 £120 Now, there are seven values, so 7n = .

The lower quartile is the ( )1 14 47 2nd value¥ + = , which is £20.

The upper quartile is the ( )3 34 47 6th value¥ + = , which is £50.

1 There is actually a lot of mathematical disagreement over how the quartiles are “best” defined (five definitions are given here http://mathworld.wolfram.com/Quartile.html). This is the version that is used in this course, the Indian entrance exam (ACET) and is one of the accepted versions used in Subject CT3.



We can see how the median and quartiles split up the data into 4 equal parts: £15 £20 £30 £40 £50 £50 £120

1Q M 3Q

Question 3.4

Find the upper and lower quartiles of these sums assured (£000’s) by a certain life assurance company: 125 100 120 25 20 50 25 50 15 30 100 150 85 60 75

Now what happens if the upper and lower quartiles don’t turn out to be such nice positions? For example: £3 £5 £6 £9 £11 £16 Now, there are six values, so 6n = .

The lower quartile is the ( ) 31 14 4 46 1 th value¥ + = .

Eek! This is ¾ of the way between the 1st value (£3) and the 2nd value (£5):

£3 £5

1st 2nd3

4 1



We can tackle this using linear interpolation. £3 and £5 are £5 £3 £2- = apart. Three-quarters of this is ¾ 2 £1.50¥ = :

£3 £5

1st 2nd3

4 1

£ 5 – £ 3 = £ 2

3

4 £ 2 = £ 1.50

Hence, the lower quartile is £3 £1.50 £4.50+ = .

Similarly, the upper quartile is the ( )3 3 14 4 46 5 th value¥ + = .

This is 14 of the way between the 5th value (£11) and the 6th value (£16):

£11 £16

5th 6th5 1

4

Now £11 and £16 are £16 £11 £5- = apart. A quarter of this is ¼ £5 £1.25¥ = :

£11 £16

5th 6th5 1

4

£ 16 – £ 11 = £ 5

1

4 £ 5 = £ 1.25

Hence, the upper quartile is £11 £1.25 £12.25+ = . You may recall that we also used linear interpolation in Chapter 2 to find the position of the median within a group from a grouped frequency table.



In general, we find the two values that the quartile lies between, then we use:

fraction between distance values

quartile (lower value)the two values are apart

Ê ˆ= + ¥Á ˜Ë ¯

Question 3.5

Calculate the lower and upper quartiles for each of these data sets: (i) 0, 3, 3, 5, 6, 6, 7, 9, 10 (ii) 5, 14, 1, 15, 8, 1, 4, 8

Common Error:

Remember that these rules, eg ( )1 12 2 th valuen + , only give the position of the number,

we then need to look at the data to find out what number the lower or upper quartile actually is.

So we can now find the numbers that are one quarter and three quarters of the way through the data values. We are now going to use these to measure the spread of the data. The interquartile range is the range between the lower quartile and the upper quartile. ‘Inter’ means between, hence ‘interquartile range’ means the range between the quartiles.

3 1

Interquartile Range upper quartile lower quartile

IQR Q Q

= -

= -

In our earlier example we had a lower quartile of £20 and an upper quartile of £50: £15 £20 £30 £40 £50 £50 £120

1Q 3Q



We can see that the interquartile range is simply: 3 1 £50 £20 £30IQR Q Q= - = - =

Just like the range, the bigger the IQR, the greater the difference there is between the quartile numbers and so the more spread out the data must be.

0 1 2 3 4 5 6 7 8 9 10

Group A

IQR

0 1 2 3 4 5 6 7 8 9 10

Group B

IQR

The IQR for Group B is greater than the IQR for Group A. So the values are more spread out in group B.

Question 3.6

Calculate the interquartile range for each of these data sets: (i) 0, 2, 4, 9, 10 (ii) £20, £20, £50, £60, £70, £30, £90, £110, £125, £150.



Definition: The interquartile range is given by:

3 1

Interquartile Range upper quartile lower quartile

IQR Q Q

= -

= -

where for ungrouped data:

( )1 11 4 4lower quartile th valueQ n= +

( )3 33 4 4upper quartile th valueQ n= +

2.2 Interquartile range for a frequency distribution

Here is a set of 20 values: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5 1Q 3Q

The lower quartile is the ( )1 1 14 4 420 5 th¥ + = value, which is 2.

The upper quartile is the ( )3 3 34 4 420 15 th¥ + = value, which is 4.

Therefore the interquartile range is 4 2 2- = . How does this work if these values were displayed in a frequency table?


Firstly, the numbers are already in numerical order – so we don’t need to worry about that. Now there are:

3 4 6 5 2 20n f= = + + + + =Â numbers in the table



First, we want the ( )1 1 14 4 420 5 th¥ + = value. Counting through the frequencies:


So the 5¼th value (the lower quartile) is 2.

Next we want the ( )3 3 34 4 420 15 th¥ + = value. Counting through the frequencies:


So the 15¾th value (the upper quartile) is 4. Now we have both of the quartiles from the frequency distribution, we can find the interquartile range by subtracting: 4 2 2IQR = - =

3 values so far 3 4 7+ = values in total

so the 5¼th value is in here!

3 values so far

3 4 7+ = values in total

so the 15¾th value is in here!

3 4 6 13+ + = values in total

3 4 6 5 18+ + + = values in total



Question 3.7

The frequency table shows the number of claims (of a particular type) made each week in the last year.

Number of claims per week 0 1 2 3 4 5 Frequency 5 7 15 12 9 4

(i) Calculate the upper and lower quartile number of claims per week. (ii) Hence, calculate the interquartile range.

Common Error: Some students confuse the total number of values, n, with the number of groups.



2.3 Interquartile range from a grouped frequency distribution

We can use a similar idea to find the quartiles (and the interquartile range) from a grouped frequency distribution:


£500 £1,000c£ < 11

£1,000 £1,500c£ < 49

£1,500 £2,000c£ < 26

£2,000 £5,000c£ < 8

There are 6 11 49 26 8 100n f= = + + + + =Â claims in the table.

Since we don’t have the individual values of the observations we are unable to calculate the quartiles exactly. We will assume that the continuous distribution is evenly spread out and so the quartiles will be ¼n , ½n and ¾n values to split the distribution into four equal quarters.

Definition For grouped data the lower quartile and upper quartile are the ¼n and ¾n th values.

The lower quartile is the ¼ 100 25th¥ = value. So counting through the frequencies until the 25th value (the lower quartile) we get:


£500 £1,000c£ < 11

£1,000 £1,500c£ < 49

£1,500 £2,000c£ < 26

£2,000 £5,000c£ < 8

So we can see that the 25th value is somewhere in the £1,000 £1,500c£ < group.

We can use linear interpolation to estimate the median as we have assumed that all of the 49 values in the £1,000 £1,500c£ < group are spread out evenly.

We counted 17 values before we got to the £1,000 £1,500c£ < group. So the lower

quartile is the 25 17 8th- = value in this group.

6

6 11 17

6 11 49 66



Since there are 49 values altogether in this group – the lower quartile is 8 49 of the way

through this group. Our group ranges over £500 so if the values are spread out evenly over this group we’d expect the lower quartile to be:

849 500 £81.63¥ = from the lowest value in the group

Since this group started at £1,000 the lower quartile is: £1,000 £81.63 £1,081.63+ =

Now, the upper quartile is the ¾ 100 75th¥ = value. Counting through the frequencies until the 75th value (the upper quartile) we get:


£500 £1,000c£ < 11

£1,000 £1,500c£ < 49

£1,500 £2,000c£ < 26

£2,000 £5,000c£ < 8

So we can see that the 75th value is somewhere in the £1,500 £2,000c£ < group.

Using linear interpolation again, we counted 66 values before we got to the £1,500 £2,000c£ < group. So the upper quartile is the 75 66 9th- = value in this

group. Since there are 26 values altogether in this group – the upper quartile is 9 26 of the way

through this group. Our group ranges over £500 so we’d expect the upper quartile to be:

926 500 £173.08¥ = from the lowest value in the group

Since this group started at £1,500 the upper quartile is: £1,500 £173.08 £1,673.08+ =

Hence, the interquartile range is given by: £1,673.08 £1,081.63 £591.44IQR = - =

6

6 11 17

6 11 49 66

6 11 49 26 92



Question 3.8

The heights, in cm, of some student actuaries are recorded below.


Find the interquartile range of these heights. In general, we find which group the quartile is in and how far it is into that group, then we use:

( ) fraction intoquartile lowest value in group class width

the group

Ê ˆ= + ¥Á ˜Ë ¯

Notice the similarity between this and the rule we used earlier:

fraction between distance valuesquartile (lower value)

the two values are apart

Ê ˆ= + ¥Á ˜Ë ¯

We now find the fraction through the group, rather than between the values. The class width is the distance the highest and lowest values in the group are apart. When calculating the width of groups with rounded data and ages we will need to take care that we do use the correct largest and smallest possible values for that group (like we did in Chapter 2): Question 3.9

The table below shows the weights (to the nearest kg) of female actuarial students:

Weight 53 – 56 57 – 60 61 – 64 65 – 68 69 – 72 73 – 76 77 – 88

Frequency 1 2 5 10 6 4 2

Find the interquartile range for the weights of the students.



2.4 Calculating the IQR using cumulative frequencies

Recall from Chapter 1 that a cumulative frequency table is one where we accumulate (ie add up) the frequencies as we go through each of the data values. For example, using the table from Section 1.3 we get:


40 49- 1 1 50 59- 2 3 60 69- 5 8 70 79- 10 18 80 89- 8 26 90 99- 3 29

100 109- 1 30 It is then quick to see that there are 30 values in total (the last cumulative frequency figure), since this is grouped data the lower quartile (the 7½th value) is in the 60 69- group and that the upper quartile (the 22½th value) is in the 80 89- group. However, we will need the original frequencies of these two groups to estimate the quartiles using linear interpolation. Question 3.10

A consumer watchdog measures the length of time (in minutes) for which 40 phone calls to a helpline were put on hold. The cumulative frequency is given below:


0.5t < 4 1t < 11 2t < 20 5t < 34

10t < 40 Estimate the interquartile range for the length of time that calls were placed on hold.



We can also use the cumulative frequency curve to find the interquartile range. Below is the cumulative frequency curve for Question 3.10.

0

5

10

15

20

25

30

35

40

0 1 2 3 4 5 6 7 8 9 10

cum

ulat

ive

freq

uenc

y

time (mins)

Now the lower quartile was the:

14 40 10th¥ = value

So we read off the 10th value to see what the time was. This gives us a lower quartile of roughly 0.9 mins. Now the upper quartile was the:

34 40 30th¥ = value

So we read off the 30th value to give us an upper quartile of about 3.8 mins. Hence, the interquartile range is: 3.8 0.9 2.9IQR - = mins

Since we are reading from a graph, our IQR will be an approximation.



2.5 Boxplots

Recall from Chapter 1 that a boxplot (also called a box and whisker plot) was a convenient way of showing how the data are distributed:

lower quartile

upper quartile

lowest value

highest value

median

Q1 Q3 M

25% of data 25% of data 25% of data 25% of data

The rectangle (box) in the middle represents the middle 50% of the data (between the lower and upper quartiles). The lines (whiskers) extend from the box to the smallest and largest values. The diagram also shows the median. So for our earlier example: £15 £20 £30 £40 £50 £50 £120

1Q M 3Q

We would display it as follows:

15 40 1205020

Question 3.11

Sketch a boxplot for the sums assured (£000’s) given in Question 3.4: 125 100 120 25 20 50 25 50 15 30 100 150 85 60 75



2.6 Interquartile range summary

In summary: Advantages Unaffected by extreme values

Disadvantages Does not use all the data values





3 Sample standard deviation and variance

3.1 Sample standard deviation from a list

The interquartile range is clearly an improvement over the range – but it still only uses two values to determine the spread. What we want is some measure of spread that uses all of the data values. This is what the standard deviation does – it calculates the ‘average’ (ie standard) distance (ie deviation) of each number from the mean. For example, returning to the data from the beginning of the chapter:

0 1 2 3 4 5 6 7 8 9 10

Group A

small deviations from the mean

0 1 2 3 4 5 6 7 8 9 10

Group B

large deviations from the mean

We are now going to go step-by-step through how we calculate the standard deviation. This will enable you understand where the grotty formula comes from. In exams, however, all you will be expected to do is put the numbers into the formula (which is given on page 22 of the Tables) and turn the handle.



Finding the deviations We are going to calculate the standard deviation for group A. The numbers are: 3 4 5 5 6 7 and the mean is:

3 4 5 5 6 7

56

xx

n

+ + + + += = =Â

First we find the deviation of each number from the mean – ie how far each number is from the mean:

deviation number mean

x x

= -

= -

So for our results from group A: 2- 1- 0 0 1 2 Finding the ‘average’ deviation To find the ‘average’ deviation, surely we just add up all the deviations and divide by n? However, there is a problem with this:

2 1 0 0 1 2

06

- - + + + + =

The average deviation is zero! In fact, it will be zero for all data sets (if you’re not convinced try it out for Group B now). So what went wrong? Well the problem is that the mean is always in the ‘middle’ so the positive and negative deviations always cancel out. A formal proof of this can be found in Appendix A. So how can we get round this problem? There are two schools of thought: Ignore the signs (ie find the absolute value). This gives us what is called the

‘mean deviation’.

Square the values to get rid of the signs and then square root later to ‘undo’ the squaring. This gives us the ‘standard deviation’.



We will concentrate on the second method ie the ‘standard deviation’. So we will square each of the deviations to get rid of the signs, then we will find the average of these squared deviations and finally we will square root the answer to ‘undo’ the squaring. Squaring the deviations gives: 4 1 0 0 1 4 To find the average of these squared deviations we add them up and divide by n (which in this case is 6):

23

4 1 0 0 1 41

6

+ + + + + =

Then we square root this value to ‘undo’ the squaring and find the standard deviation:

231 1.291=

The formula So what is the formula for the standard deviation? Repeating the steps above but using symbols gives: The deviations are given by

x x- Squaring the deviations gives

2( )x x-

Adding these up and dividing by n to get the average squared deviation gives:

2( )x x

n

-Â

Square rooting to ‘undo’ the squaring gives:

2( )x x

n

-Â



Hence, the standard deviation is given by:

{ }21( )x x

n-Â

Question 3.12

Use this formula to find the standard deviation of group B, which has data values: 0 1 3 8 8 10

“Hurrah! We’ve finished!” you cry – well not quite. You see this formula calculates the standard deviation for the population – whereas we will typically be working with samples and not the whole population. So what? Will it make any difference? Well unfortunately the answer is yes. In a population you normally have lots of people around the ‘middle’ values and one or two people who have ‘extreme’ values. Think of people’s heights, IQ’s or weights. When we take a sample it is unlikely that we will get any of the ‘extreme’ people, as they are quite rare. This will mean our sample is more likely to have a smaller standard deviation than the population. To fix this, we divide by 1n - instead of n. Dividing by a smaller number will make our sample standard deviation bigger and so make it closer to the real standard deviation of the population. This may seem a little arbitrary at the moment but later, in the Subject CT3 course, you will see that this is mathematically sound. Dividing by 1n - makes the sample variance (which is the square of the sample standard deviation) an unbiased estimator of the population variance. This basically means that on average our sample variance will give the population variance (which is a good thing).

The sample standard deviation, s, is given by:

{ }21( )

1s x x

n= -

- Â

This is the formula we will always use when calculating the standard deviation of any set of figures given in an exam or in your work. Because we will always use the sample standard deviation – we quite often just say ‘standard deviation’.



Question 3.13

Calculate the sample standard deviation of these claim amounts: £124 £56 £78 £92 £230

The standard deviation measures the spread – so similarly to the range and the IQR, a bigger standard deviation means that the data are more spread out and vice versa.

3.2 Sample variance from a list

The variance is simply defined to be the square of the standard deviation. This will, fairly obviously, measure the spread squared of the data. As such it will be measured in square units – for example if we are calculating the variance of claims (which are measured in £) then the variance will be measured in £².

The sample variance, 2s , is given by:

{ }2 21( )

1s x x

n= -

- Â

The sample variance has some nice statistical properties, which means it is used more often than the standard deviation – but do remember that the variance measures the spread squared. Alternative formula Now the formula we are using is fine for a small list of numbers, but will be extremely tedious for a long list of numbers, as we have to calculate the deviation for each number. It will also be a pain to use if the mean is a grotty number. Therefore, we are going to rearrange the formula into a nicer format for dealing with such situations. Firstly, we expand the brackets and split up the sum:

{ } { }2 2 2 2 21 1( 2 ) 2

1 1s x xx x x xx x

n n= - + = - +

- -Â Â Â Â



We can take the x term out of the sum, as it is a constant (as well as any other constants):

{ } { }2 2 2 2 21 12 1 2

1 1s x x x x x x x nx

n n= - + = - +

- -Â Â Â Â Â

Now we use the fact that x

x x nxn

= fi =Â Â :

{ } { }2 2 2 2 2 21 12

1 1s x nx nx x nx

n n= - + = -

- -Â Â

An alternative formula for the sample variance, 2s , is:

{ }2 2 21

1s x nx

n= -

- Â

Just adding up the squares of the numbers should be quicker than calculating the squares of the deviations and adding them up. This is the formula that is given on page 22 of the Tables and is the only formula you need to use. Do make sure that you can do this rearrangement on your own, as the method will be used later on in the course.

Question 3.14

The sample standard deviation of these claim amounts from Question 3.13 was £68.34: £124 £56 £78 £92 £230 Using the alternative formula show you get the same answer for the sample standard deviation.



3.3 Sample standard deviation from a frequency distribution

We will now look at how we can calculate the standard deviation from a frequency distribution rather than from a list. Take this set of 20 values: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5 To calculate the mean, we need to evaluate the sum:

(1 1 1) (2 2 2 2) (3 3 3 3 3 3)

(4 4 4 4 4) (5 5) 59

x = + + + + + + + + + + + +

+ + + + + + + =

Â

Recall from Chapter 2 that the ‘shortcut’ method of doing this was to say we have three 1’s, four 2’s and so on:

(3 1) (4 2) (6 3) (5 4) (2 5) 59x = ¥ + ¥ + ¥ + ¥ + ¥ =Â

Notice that we are multiplying each value by its frequency. To calculate the standard deviation or the variance we need to evaluate the sum of squares:

2 2 2 2 2 2 2 2 2 2 2 2 2 2

2 2 2 2 2 2 2

(1 1 1 ) (2 2 2 2 ) (3 3 3 3 3 3 )

(4 4 4 4 4 ) (5 5 ) 203

x = + + + + + + + + + + + +

+ + + + + + + =

Â

We can use this ‘shortcut’ method to calculate the sum of squares:

2 2 2 2 2 2(3 1 ) (4 2 ) (6 3 ) (5 4 ) (2 5 ) 203x = ¥ + ¥ + ¥ + ¥ + ¥ =Â

Notice that we are multiplying each square value by its frequency.

Question 3.15

Use the ‘shortcut method’ to calculate the sum of squares for this set of data: 2, 2, 2, 4, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, 8, 8, 10, 10, 10, 10, 12, 12, 14



So, how does this relate to a frequency table? We are given the values and their frequencies in the table so we can work out the sum and the sum of squares using the ‘shortcut’ method above. The shortcut method is, multiply each of the values by their respective frequencies to get the sum and multiply each of the square values by their respective frequencies to get the sum of squares. We can then use these in our formula to calculate the standard deviation.


First, we need to calculate the mean, x

xn

= Â . The sum is given by:

(3 1) (4 2) (6 3) (5 4) (2 5) 59x = ¥ + ¥ + ¥ + ¥ + ¥ =Â

The total number of values, n, is given by the sum of the frequencies:

3 4 6 5 2 20n f= = + + + + =Â

Hence:

592.95

20

xx

n= = =Â

Now to calculate the variance, { }2 2 21

1s x nx

n= -

- Â , we need the sum of squares:

2 2 2 2 2 2(3 1 ) (4 2 ) (6 3 ) (5 4 ) (2 5 ) 203x = ¥ + ¥ + ¥ + ¥ + ¥ =Â

Substituting:

{ } { }2 2 2 21 1203 20 2.95 1.5237

1 19s x nx

n= - = - ¥ =

- Â

Therefore the standard deviation is given by:

1.5237 1.234s = =



Question 3.16

This frequency table shows the number of claims per policy made to a car insurance company in the last year. Calculate the mean and the standard deviation of the number of claims per policy:

Number of claims per policy 0 1 2 3 Frequency 74 19 5 2

Formulae In our table we had, say, m different values 1 2, , , mx x x with frequencies

1 2, , , mf f f . To find the sum, xÂ , we multiplied the frequencies by the

corresponding data values:

1 1sum m m i if x f x f x= + + =Â

The total number of values, n, was found by adding up the frequencies:

1 m in f f f= + + =Â

The sum of squares, 2xÂ , was found by multiplying the frequencies by the

corresponding squares of the data values:

2 2 21 1sum of squares m m i if x f x f x= + + =Â

This gives:

fxx

f= ÂÂ

and:

{ }2 2 21

1s fx nx

n= -

- Â



3.4 Standard deviation from a grouped frequency distribution

Now suppose we want to find the standard deviation or variance from this grouped frequency distribution:


£500 £1,000c£ < 11

£1,000 £1,500c£ < 49

£1,500 £2,000c£ < 26

£2,000 £5,000c£ < 8

Before, when we calculated the sum from a frequency table, we multiplied the values by the frequency. The question now is: “which value in each group do we multiply the frequency by?” Well the natural choice would be the middle of each group. We will use the midpoint of each group. We find the midpoint by averaging the largest and smallest possible values in each group. So the midpoint for the 0 £500c£ < group

is 0 500

£2502

+ = . Similarly, the midpoints for the other groups are

£750, £1,250, £1,750, and £3,500 . The sum is then:

(6 250) (11 750) (49 1, 250)

(26 1,750) (8 3,500) 144,500

x = ¥ + ¥ + ¥

+ ¥ + ¥ =

Â

The total number of values, n, is given by the sum of the frequencies:

6 11 49 26 8 100n f= = + + + + =Â

Hence:

144,500£1, 445

100

xx

n= = =Â

Now to calculate the variance, { }2 2 21

1s x nx

n= -


2 2 2 2

2 2

(6 250 ) (11 750 ) (49 1,250 )

(26 1,750 ) (8 3,500 ) 260,750,000

x = ¥ + ¥ + ¥

+ ¥ + ¥ =

Â



Substituting:

{ } { }2 2 2 21 1260,750,000 100 1,445 524,722

1 99s x nx

n= - = - ¥ =

- Â

Therefore the standard deviation is given by:

524,722 £724.38s = =

Question 3.17

The heights, in cm, of thirty actuaries are recorded below. Find the mean and the standard deviation of their heights.


When calculating the midpoint of groups with rounded data and ages we will need to take care that we do use the correct largest and smallest possible values for that group. For example, when values are rounded to the nearest cm, a 10 19- cm group ranges from 9.5 cm to (just below) 19.5 cm. Hence, the midpoint would be:

9.5 19.514.5

2

+ = cm

Similarly, when age last birthday is used, a 10 19- years group ranges from 10 years to

(just below) 20 years old. Hence, the midpoint would be:

10 2015

2

+ = years



Question 3.18


Age 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99 100 – 109

Frequency 1 2 5 10 8 3 1

Find the mean and the standard deviation of the policyholders’ ages. Note that when we use the midpoint we assume that the values are evenly spread through the group. This is not necessarily the case. For example, the actual data values for Question 3.18 were: 57 68 75 66 72 86 80 81 70 78 76 72 88 84 69 77 83 90 48 63 74 81 94 51 73 96 81 66 77 101

The true mean of these values is 2, 277

75.930

x = = years and the true standard deviation

is { }21177,157 30 75.9 12.22

29s = - ¥ = years.

These are both slightly different from the answers obtained in Question 3.18. Hence, the mean and standard deviation using midpoints is just an estimate – it is the best we can do without having the original list of data.

3.5 Other problems involving the sample standard deviation

There are a couple of other questions that could be asked about the standard deviation where we are given the standard deviation of a data set and then have to calculate the new standard deviation after a value (or another data set) is combined with the original data set.

The key to solving these problems is to use the information given to calculate 2xÂ for

the data set(s) by rearranging:

{ }2 2 21

1 is x nxn

= -- Â



We can then calculate the new 2xÂ and the new x (like we did in Chapter 2). Putting

these new values back into the formula gives our new variance (or standard deviation).

Question 3.19

(i) A group of 12 actuaries are weighed. Their weights have a mean of 78 kg and a

standard deviation of 4 kg. Find the sum of squares (ie 2xÂ ) of their weights.

(ii) The temperature over the previous 6 days had a mean of 19ºC and a standard

deviation of 5ºC. Today’s temperature was 16ºC. Calculate the mean and standard deviation of the temperature over all 7 days.

(iii) The ages at which a group of 10 male policyholders died had a mean of 72 years

and a standard deviation of 7 years. The ages at which a group of 8 female policyholders died had a mean of 78 years and a standard deviation of 9 years.

Calculate the mean and standard deviation of all 18 policyholders.

3.6 Sample standard deviation summary

In summary: Advantages Uses all the data values

Has properties that make it useful in further calculations (see Subject CT3, Chapter 9)

Disadvantages Can only be estimated when given grouped data



4 Central moments

Recall that the definition of the sample variance was:

21( )

1 ix xn

-- Â

This expression (but with n on the denominator) is a member of a more general group of summary statistics called central sample moments:

Definition The kth order central sample moment is given by:

1

( )kix x

n-Â

So the second order central sample moment is almost the variance.

Question 3.20

Here are the salaries of 7 individuals in a company (in £000’s): 18 21 24 25 25 25 30 Find the third order central sample moment of this data set.

We would find moments from frequency or grouped frequency distributions in exactly the same way as we did when calculating the mean or variance. The kth order central sample moment is sometimes called the kth order sample moment about the mean to distinguish it from the more general form of:

Definition The kth order sample moment about the value a is given by:

1

( )kix

na-Â



5 Skewness

Recall that, in Chapter 1, we defined the skewness to be the shape of the distribution – or more accurately how asymmetrical the distribution is: the more skew the data, the more asymmetrical the distribution. The types of skewness are shown in the frequency diagrams below:


What we would like is a single numerical value to summarise the skewness of the data set. It would be nice if the value was positive for a positively skewed data set, negative for a negatively skewed data set and zero for a symmetrical (ie not skewed) data set. To cut a very long story short, it turns out that the third central sample moment does the job:

31( )ix x

n-Â

The cube ensures that the signs of the data values come into play. If the majority of the values are to the right (ie on the positive side) of the mean we will get a positive value for the skewness. Similarly, if the majority of the values are to the left (ie on the negative side) of the mean we will get a negative value for the skewness. But why cube it at all? Why not keep the original values? We touched on this briefly

when we looked at standard deviation. We found that ( )ix x-Â always gave a value

of zero, which is not really helpful. See Appendix A for the proof of this result. To show how skewness works, we will consider three data sets.



Starting with a symmetrical distribution:

1 2 3 4 5

Symmetrical


The mean is 3010 3x = = and the third central moment is:

[ ]

3 3 3 3 3110

3 3 3 3 3110

110

(1 3) 2 (2 3) 3 (3 3) 2 (4 3) (5 3)

( 2) 2 ( 1) 3 0 2 1 2

8 2 0 2 8

0

È ˘= - + ¥ - + ¥ - + ¥ - + -Î ˚

È ˘= - + ¥ - + ¥ + ¥ +Î ˚

= - - + + +

=

The skewness of a symmetrical data set is zero, as desired. Now considering a positively skew distribution:

1 2 3 4 5

Positively skew




The mean is 2710 2.7= and the third central moment is:

[ ]

3 3 3 3 3110

3 3 3 3 3110

110

(1 2.7) 4 (2 2.7) 3 (3 2.7) (4 2.7) (5 2.7)

( 1.7) 4 ( 0.7) 3 0.3 1.3 2.3

4.913 1.372 0.081 2.197 12.167

0.816

È ˘= - + ¥ - + ¥ - + - + -Î ˚

È ˘= - + ¥ - + ¥ + +Î ˚

= - - + + +

=

This gives a positive value for a positively skewed distribution, as desired. It’s quite a small value as it tails off quickly on the positive side – it is only slightly positively skewed.

slightly positively skewed very positively skewed

Question 3.21

Find the third central moment of this data set: 0 2 3 3 4 4 4 5 Hence, comment on the skewness.

Note that it is unusual to calculate the third central sample moment to measure the skewness of a data set – simply using a diagram of the data to look at the shape of the data set is normally all that is required.



6 Transforming data sets

Suppose we have the following data set: 2 4 9

The mean is 153 5x = = , the range is 9 2 7- = and the standard deviation is:

2 2 2 2 2 2(2 5) (4 5) (9 5) ( 3) ( 1) 4

132 2

s- + - + - - + - += = =

Now if we add, say, 3 to each of the numbers in the data set what happens to the spread? Well the new data set is: 5 7 12 So the new range is 12 5 7- = . The range is unchanged! Why should this be?

2 94

5 127

We can see that all the values have just shifted up the number line by the same amount and so the spread is unchanged.

This works for the standard deviation as well. The new mean is 243 8x = = (it has

shifted up by 3), so the new standard deviation is:

2 2 2 2 2 2(5 8) (7 8) (12 8) ( 3) ( 1) 4

132 2

s- + - + - - + - += = =

This is the same value as we had before – so the standard deviation is unchanged. The spread of the new values about the new mean is the same as before.



Now, what happens to the spread if we multiply each of the values by, say, 5? The new values are: 10 20 45 So the new range is 45 10 35- = . The range has been multiplied by 5 (we have a range of 35 instead of 7). Why should this be?

2 4

20

9

10 45

We can see that the values have become 5 times as spread out as before.

This works for the standard deviation as well. The new mean is 753 25x = = (it has

been multiplied by 5), so the new standard deviation is:

2 2 2 2 2 2(10 25) (20 25) (45 25) ( 15) ( 5) 20325 5 13

2 2s

- + - + - - + - += = = =

We can see that the standard deviation has been multiplied by 5 (from 13 to 5 13 ).

Question 3.22

The standard deviation of n values is s. What would be the new standard deviation be if we multiplied each of the values by a and then added b to them?

Now we have a rule for the standard deviation we can get a rule for the variance:

Question 3.23

Using your solution to Question 3.22 write down the general rule for the variance.

A proof of these general results can be found in Appendix B.



7 Comparing data sets

When comparing data sets it is usual to compare the location (using either the median or the mean) and the spread (using either the IQR or the standard deviation). Since calculating the skewness is rather longwinded, we only tend to comment on this if we have drawn a suitable diagram (such as a boxplot). Let’s try an example. The Subject CT3 mock examination marks for students from two actuarial companies are as follows: Company A 65 82 55 53 74 80 77 74 77 61 Company B 79 38 87 55 70 63 88 82 63 24 First let’s compare the location of the two companies using the mean: Company A mean 69.8= Company B mean 64.9= So we can see that the students in Company A have performed better on average than those in Company B. Next we’ll look at the spread of the results in each company using the standard deviation: Company A 10.5sd = Company B 21.2sd = We can see that the results for students in Company B are much more spread out than those in Company A. This indicates that there is a greater diversity of students in Company B than Company A (ie those students in Company A perform fairly similarly whereas those in Company B range from great to poor).



Question 3.24

The diagrams below show the boxplots for two different distributions:

0 5 10 15 20

Group A

Group B

Compare the location, spread and skewness of these two distributions.



8 Appendix A – the ‘average deviation’ is always zero

In Section 3.1 when we were trying to find the standard deviation, we were finding the average ‘deviation’ from the mean, but we got a result of zero. Why was the deviation zero? Let’s assume that we have a sample of n values: 1 2, , , nx x x

The mean of these n values is given by:

1 2 1

n

in i

xx x x

xn n

=+ + += =Â

We find the deviations using: ix x-

So to find the ‘average deviation’, we totalled up each of the deviations and divided by how many results we had:

1

( )n

ii

x x

n=

-Â

Splitting up the summation gives:

1 1

n n

ii i

x x

n= =

-Â Â

Now, the x doesn’t depend on i, so we get 1

n

i

x x x x nx=

= + + + =Â :

1

n

ii

x nx

n=

-Â



But wait! The definition of the sample mean, x , is given by 1

n

ii

x

xn

==Â

. Rearranging

this gives:

1

n

ii

x nx=

=Â

So our average deviation formula becomes:

0nx nx

n

- =

So it is always zero – therefore we must find another way to find the ‘average deviation’ which stops the numbers above and below the mean from cancelling out. For the standard deviation we will square the deviations to remove the signs and then square root at the end to ‘undo’ it.



9 Appendix B – proof of the transforming data result

Consider a set of n values 1 2, , , nx x x .

The mean and variance of these values are:

1 2

22 22 1 ( )( ) ( )

1 1

in

in

xx x xx

n n

x xx x x xs

n n

+ + += =

-- + + -= =- -

Â

Â

Suppose we multiply each of the values by a and then add b: 1 2, , , nax b ax b ax b+ + +

The mean, y , and variance, 2ys , of this new data set will be:

1 2

1 2

( ) ( ) ( )

( )

n

n

i

ax b ax b ax by

n

a x x x nb

n

xa b ax b

n

+ + + + + +=

+ + + +=

= + = +Â

2 22 1

2 21

2 2 2 21

2 2

2 2

[( ) ( )] [( ) ( )]

1

[ ( )] [ ( )]

1

( ) ( )

1

( )

1

ny

n

n

i

ax b ax b ax b ax bs

n

a x x a x x

n

a x x a x x

n

a x x

n

a s

+ - + + + + - +=-

- + + -=

-

- + + -=-

-=

-

=

Â

So, the new standard deviation will be ys as= .



We will meet this result again in Chapter 7 when we look at the expected variance of a theoretical population using random variables.



Extra Practice Questions Section 2: Interquartile range

P3.1 Subject 101, April 2001, Q1 The following amounts are the sizes of claims (£) on house insurance policies for a certain type of repair. 198 221 215 209 224 210 223 215 203 210

220 200 208 212 216 Determine the lower quartile, median, upper quartile and interquartile range of these claim amounts. [2]

P3.2 Subject 101, September 2001, Q1 (part) Data were collected on 100 consecutive days for the number of claims, x, arising from a group of policies. This resulted in the following frequency distribution:

x 0 1 2 3 4 5≥ f 14 25 26 18 12 5

Calculate the interquartile range for these data. [1]



P3.3 Subject C1, September 1997, Q9 (part) The table below shows a grouped frequency distribution for 100 claim amounts on a certain class of insurance policy.


£100 – 149.99 10 £150 – 199.99 25 £200 – 249.99 30 £250 – 299.99 15 £300 – 349.99 12 £350 – 399.99 4 £400 or over 0

Determine an approximate value for the interquartile range of these claim amounts. [3]

P3.4 Subject C1, April 1996, Q1 (adapted) In a transport survey 100 passengers using a particular bus stop reported their waiting times to the nearest minute. The cumulative distribution of the resulting waiting times is given below:

Waiting time (mins)

Cumulative Frequency

– 4 7 – 5 15 – 6 27 – 7 45 – 8 70 – 9 88 – 10 97 – 15 100

What is the interquartile range of these waiting times? [2]



P3.5 Subject 101, April 2003, Q2 A set of claim amounts (£) is given below:

192 136 253 138 87 112 221 176 336 203 159 55 308 165 254

Present these data graphically using a boxplot. [3] Section 3: Standard deviation

P3.6 The number of yawns made by 5 students during a tutorial were:

3 8 0 2 4 Find the sample variance of the number of yawns.

P3.7 Subject 101, April 2003, Q1 (part) Sickness and absence records were kept on 30 employees in a company over a 91-day period. These data are tabulated below:

Number of employees absent 0 1 2 3 4 5 Number of days 44 19 10 8 7 3

Calculate the sample standard deviation of the number of employees absent per day. [3]

P3.8 The table below shows the journey times (in minutes) of students to reach their place of work:

Time (mins), t Number of students 0 10t£ < 4

10 20t£ < 13 20 40t£ < 21 40 60t£ < 9 60 120t£ < 3

Calculate the standard deviation of these times.



P3.9 Subject C1, April 1996, Q9 Shortly before close of trading on a particular day an insurance office has sold 8 new policies. The sample mean and standard deviation of the sums assured have been calculated, in units of £1,000, as 31.5 and 37.2367, respectively. Another policy for £60,000 is then sold just before the close. Calculate the sample mean and standard deviation of the full set of 9 sums assured. [4]

P3.10 Subject 101, September 2000, Q2 Consider a random sample of 47 white-collar workers and a random sample of 24 blue-collar workers from the workforce of a large company. The mean salary for the sample of white-collar workers is £28,470 and the standard deviation is £4,270; whereas the mean salary for the sample of blue-collar workers is £21,420 and the standard deviation is £3,020. Calculate the mean and the standard deviation of the salaries in the combined sample of 71 employees. [4] Section 6: Transforming data sets

P3.11 Subject C1, Specimen 1993, Q3 (adjusted) The marks (%) of a sample of 20 students from a large class in a recent examination had a sample mean 43 and a sample standard deviation 6. The marks were subsequently adjusted – each mark was multiplied by 1.3 and the result was then increased by 10. Calculate the sample standard deviation of the adjusted mark. [2]



Section 7: Comparing data sets

P3.12 Subject C1, September 1998, Q8 The following data are the rental rates per foot for boat storage at the various marinas in two different regions. Region A 6.37 6.60 6.27 6.49 6.64 6.82 7.16 6.45 5.60 5.95 4.50 6.60 6.00 6.82 7.04 5.50 7.05 7.05 6.96 Region B 4.60 4.75 4.70 8.75 4.50 5.40 6.00 6.00 6.50 6.00 5.00 5.00 5.50 4.35 4.50 5.20 4.95 (i) Draw stem-and-leaf displays for these data. [2] (ii) Determine the median rental rate and the interquartile range for each region and

comment on the data display. [4]



Chapter 3 Summary Spread The spread (or dispersion) of a data set can be measured by the range, IQR or the standard deviation. Range The range of a set of data values 1, , nx x is defined as:

Range max{ } min{ }i i

iix x= -

IQR The interquartile range is given by: 3 1IQR Q Q= -

For ungrouped data:

( )( )

1 11 4 4

3 33 4 4

lower quartile th value

upper quartile th value

Q n

Q n

= +

= +

For grouped data we use 14 n and 3

4 n to give the positions of the quartiles.

If the quartile lies between two values we use interpolation to estimate its value. Similarly, we use interpolation when estimating the quartile within a group. Standard deviation

The sample variance, 2s , measures the spread squared and is given by:

{ }2 2 2 21 1( )

1 1s x x x nx

n n= - = -

- -Â Â

The sample standard deviation, s, is the square root of the sample variance and measures the spread of the set of data.



To calculate the variance from a frequency distribution we use:

{ }2 2 2 21 1( )

1 1s f x x fx nx

n n= - = -

- -Â Â

For a grouped frequency distribution, we use the midpoint for the x values. Central moments The kth order central sample moment is given by:

1

( )kix x

n-Â

The sample variance is the second order central moment (except we divide by 1n - ). The third order central moment is used to measure the skewness. Skewness The third order central moment is used to measure the skewness.


A positive value indicates that the data is positively skewed. Similarly a negative value indicates that the data is positively skewed, whereas a value of zero indicates that the data is symmetrical. Transforming data Given a sample standard deviation of s, if we multiply each of the sample values by a and then add b the standard deviation of these new values is: as The new sample variance would be:

2 2a s




range 10 0 10= - =

Solution 3.2

range £50,000 £12,000 £38,000= - =

Solution 3.3

(i) She has confused the data values with the frequencies. The range is the difference between the biggest and smallest values not the biggest and smallest frequencies.

(ii) range 8 4 4= - = reviews

Solution 3.4

First, we put the sums assured in order:

15 20 25 25 30 50 50 60 75 85 100 100 120 125 150 There are 15 values, so 15n = .

The lower quartile is the ( )1 14 415 4th¥ + = value, which is £25,000.

The upper quartile is the ( )3 34 415 12th¥ + = value, which is £100,000.

Note how the median and the quartiles split the data up into 4 equal parts: 15 20 25 25 30 50 50 60 75 85 100 100 120 125 150

1Q M 3Q



Solution 3.5

(i) Since the numbers are already in order we can just apply the rule straightaway.

There are nine values, so 9n = .

The lower quartile is the ( )1 1 14 4 29 2 th¥ + = value. This is halfway between the

2nd value (3) and the 3rd value (3), which is 3. Alternatively, we could have used linear interpolation:

11 23 (3 3) 3Q = + ¥ - =

The upper quartile is the ( )3 3 14 4 29 7 th¥ + = value. This is halfway between the

7th value (7) and the 8th value (9), which is 8. If this is not clear, we could have used linear interpolation:

13 27 (9 7) 8Q = + ¥ - =

(ii) First, we need to rearrange these numbers into order: 1, 1, 4, 5, 8, 8, 14, 15 We have 8n = .

The lower quartile is the ( )1 1 14 4 48 2 th¥ + = value. This is a 1

4 of the way

between the 2nd value (1) and the 3rd value (4). Using linear interpolation:

311 4 41 (4 1) 1Q = + ¥ - =

The upper quartile is the ( )3 3 34 4 48 6 th¥ + = value. This is 3

4 of the way

between the 6th value (8) and the 7th value (14). Using linear interpolation:

3 13 4 28 (14 8) 12Q = + ¥ - =



Solution 3.6

(i) Since the numbers are already in order we can just apply the rule straightaway.

We have 5n = .

The lower quartile is the ( )1 1 14 4 25 1 th¥ + = value. This is halfway between the

1st value (0) and the 2nd value (2), which is 1. If this is not clear, we could have used linear interpolation:

11 20 (2 0) 1Q = + ¥ - =

The upper quartile is the ( )3 3 14 4 25 4 th¥ + = value. This is halfway between the

4th value (9) and the 5th value (10), which is 9½. If this is not clear, we could have used linear interpolation:

1 13 2 29 (10 9) 9Q = + ¥ - =

Hence, the interquartile range is:

1 13 1 2 29 1 8IQR Q Q= - = - =



(ii) Don’t get caught out! First, we need to rearrange these numbers into order: £20, £20, £30, £50, £60, £70, £90, £110, £125, £150 We have 10n = .

The lower quartile is the ( ) 31 14 4 410 2 th¥ + = value. This is 3

4 of the way

between the 2nd value (£20) and the 3rd value (£30). Using linear interpolation:

31 420 (30 20) £27.50Q = + ¥ - =

The upper quartile is the ( )3 3 14 4 410 8 th¥ + = value. This is 1

4 of the way

between the 8th value (£110) and the 9th value (£125). Using linear interpolation:

13 4110 (125 110) £113.75Q = + ¥ - =

Hence, the interquartile range is given by:

3 1 £113.75 £27.50 £86.25IQR Q Q= - = - =



Solution 3.7

(i) There are:

5 7 15 12 9 4 52n f= = + + + + + =Â weeks.

The lower quartile is the ( )1 1 14 4 452 13 th¥ + = value. Counting through the

frequencies:


We can see that the 13¼th value (the lower quartile) is a 2.

The upper quartile is the ( )3 3 34 4 452 39 th¥ + = value. Counting through:


We can see that the 39th value is a 3 and the 40th value is a 4. Therefore the 39¾th value (the upper quartile) is:

3 33 4 43 (4 3) 3Q = + ¥ - =

(ii) 3 33 1 4 43 2 1IQR Q Q= - = - =

5 values so far

5 7 12+ = values in total 5 7 15 27+ + = values now in total so the 13¼th

value is in here!

5 values so far

5 7 12+ = values in total 5 7 15 12 39+ + + = values in total

5 7 15 27+ + = values in total



Solution 3.8

Now there are 4 6 11 7 2 30n f= = + + + + =Â actuaries. The lower quartile is the

1 14 230 7 th¥ = value. Counting through the frequencies:

Height, h Frequency

150 160h£ < 4 160 170h£ < 6 170 175h£ < 11 175 180h£ < 7 180 195h£ < 2

We can see that the 127 th value is somewhere in the 160 170h£ < group. We counted

4 values before we got to the 160 170h£ < group. So the lower quartile is the 1 12 27 4 3 th- = value in this group out of the group’s 6 values altogether.

3½1 6160 (170 160) 165.83Q = + ¥ - = cm.

Now the upper quartile is the 3 14 230 22 th¥ = value. So counting through the

frequencies:

Height, h Frequency 150 160h£ < 4 160 170h£ < 6 170 175h£ < 11 175 180h£ < 7 180 195h£ < 2

We can see that the 1222 th value is somewhere in the 175 180h£ < group. We

counted 21 values before we got to the 175 180h£ < group. So the upper quartile is the 1 12 222 21 1 th- = value in this group out of the group’s 7 values altogether.

1½3 7175 (180 175) 176.07Q = + ¥ - = cm


3 1 176.07 165.83 10.24IQR Q Q= - = - = cm

4

4

4 6 10

4 6 10

4 6 11 21

4 6 11 7 28



Solution 3.9

Since the weights are rounded to the nearest kg, the 53 56- group ranges from 52.5 kg to (just below) 56.5 kg.

Now there are 1 2 5 10 6 4 2 30n f= = + + + + + + =Â actuarial students. The lower

quartile is the 1 14 230 7 th¥ = value. Counting through the frequencies:

Weight 53 – 56 57 – 60 61 – 64 65 – 68 69 – 72 73 – 76 77 – 88

Frequency 1 2 5 10 6 4 2

We can see that the 127 th value is somewhere in the 61 64- group (ie 60.5 64.5- ).

We counted 3 values before we got to the 61 64- group. The lower quartile is the 1 12 27 3 4 th- = value in this group out of the group’s 5 values altogether. So:

4½1 560.5 (64.5 60.5) 64.1Q = + ¥ - = kg

Now the upper quartile is the 3 14 230 22 th¥ = value. So counting through the

frequencies:

Weight 53 – 56 57 – 60 61 – 64 65 – 68 69 – 72 73 – 76 77 – 88

Frequency 1 2 5 10 6 4 2

We can see that the 1222 th value is somewhere in the 69 72- group (ie 68.5 72.5- ).

We counted 18 values before we got to the 69 72- group. So the upper quartile is the 1 12 222 18 4 th- = value in this group out of the group’s 6 values altogether.

4½3 668.5 (72.5 68.5) 71.5Q = + ¥ - = kg

3 1 71.5 64.1 7.4IQR Q Qfi = - = - = kg


1 2 5 10 6 24+ + + + =so the 22½th

value is in here!

1 2 3 1 2 5 8

1 2 5 10 18



Solution 3.10

Now there are 40 phone calls, so the lower quartile is the 14 40 10th¥ = value.


0.5t < 4 1t < 11 2t < 20 5t < 34

10t < 40 We can see that the 10th value is in the 1t < group. To estimate its value we need to think about the original frequency distribution. The 1t < group relates to the 0.5 1t£ < group, which has 11 4 7- = values in it. The lower quartile is the 10 4 6th- = value in this group out of its 7 values altogether. Using linear interpolation:

61 70.5 (1 0.5) 0.929Q = + ¥ - = mins

The upper quartile is the 34 40 30th¥ = value. We can see that this is in the 5t <

group. Now the 5t < group relates to the 2 5t£ < group in the original frequency distribution. It has 34 20 14- = values in it. The upper quartile is the 30 20 10th- = value in this group out of its 14 values altogether. Using linear interpolation:

103 142 (5 2) 4.143Q = + ¥ - = mins

Hence, the interquartile range is: 3 1 4.143 0.929 3.21IQR Q Q= - = - = mins

5th – 11th

values in here

21st – 34th

values in here



Solution 3.11

Recall that in Question 3.4 that:

15 20 25 25 30 50 50 60 75 85 100 100 120 125 150

1Q M 3Q

The boxplot for this set of data would be:

15 60 100 15025

In an exam, you would be expected to draw the boxplot accurately on graph paper. Solution 3.12

First, we need to find the mean:

0 1 3 8 8 105

6

xx

n

+ + + + += = =Â

Using the formula we get:

2 2 2 2 2 2 2

2 2 2 2 2 2

( ) (0 5) (1 5) (3 5) (8 5) (8 5) (10 5)

6

( 5) ( 4) ( 2) 3 3 5

6

88

6

3.8297

x xs

n

- - + - + - + - + - + -= =

- + - + - + + +=

=

=

Â



Solution 3.13

First we find the mean:

124 56 78 92 230£116

5

xx

n

+ + + += = =Â

Using the formula for the sample standard deviation, we get:

2

2 2 2 2 2

2 2 2 2 2

( )

1

(124 116) (56 116) (78 116) (92 116) (230 116)

4

8 ( 60) ( 38) ( 24) 114

4

18,680

4

£68.34

x xs

n

-=

-

- + - + - + - + -=

+ - + - + - +=

=

=

Â



Solution 3.14

The mean is:

124 56 78 92 230

£1165

xx

n

+ + + += = =Â

We also require the sum of squares:

2 2 2 2 2 2124 56 78 92 230 85,960x = + + + + =Â

Substituting this into the alternative formula:

2

2 85,960 5 1164,670

4s

- ¥= =

Hence, the sample standard deviation is:

4,670 £68.34s = =

Solution 3.15

Using the ‘shortcut method’, the sum of squares is:

2 2 2 2 2

2 2 2

(3 2 ) (5 4 ) (6 6 ) (3 8 )

(4 10 ) (2 12 ) 14 1,384

x = ¥ + ¥ + ¥ + ¥

+ ¥ + ¥ + =

Â



Solution 3.16

To calculate the mean, x

xn

= Â , we need the sum:

(74 0) (19 1) (5 2) (2 3) 35x = ¥ + ¥ + ¥ + ¥ =Â

We also need the total number of values, n. This is:

74 19 5 2 100n f= = + + + =Â

Hence:

35

0.35100

xx

n= = =Â

Now, to calculate the variance, { }2 2 21

1s x nx

n= -


2 2 2 2 2(74 0 ) (19 1 ) (5 2 ) (2 3 ) 57x = ¥ + ¥ + ¥ + ¥ =Â

Substituting:

{ } { }2 2 2 21 157 100 0.35 0.45202

1 99s x nx

n= - = - ¥ =

- Â

Therefore, the standard deviation is given by:

0.45202 0.67232s = =



Solution 3.17

The midpoints of the groups are: 155 165 172.5 177.5 187.5


xn


(4 155) (6 165) (11 172.5) (7 177.5) (2 187.5) 5,125 cmx = ¥ + ¥ + ¥ + ¥ + ¥ =Â

We also need the total number of values, n. This is:

4 6 11 7 2 30n f= = + + + + =Â

Hence:

5,125

170.8330

xx

n= = =Â cm

Since this is a grotty number it is worth storing this value on your calculator.


1s x nx

n= -


2 2 2 2 2 2(4 155 ) (6 165 ) (11 172.5 ) (7 177.5 ) (2 187.5 )

877,625

x = ¥ + ¥ + ¥ + ¥ + ¥

=

Â

Substituting:

{ } { }2 2 2 21 1877,625 30 170.83 72.557

1 29s x nx

n= - = - ¥ =

- Â


72.557 8.518s = = Had we not stored the full value of the mean (and used 170.83), we would have got a final answer of 8.587.



Solution 3.18

The 40 49- group ranges from 40 to (just below) 50, so the midpoint is 45 years. Similarly, the midpoints for the remaining groups are: 55 65 75 85 95 105


xn


45 (2 55) (5 65) (10 75) (8 85) (3 95) 105 2,300x = + ¥ + ¥ + ¥ + ¥ + ¥ + =Â

We are told that 30n = . Hence:

2,300

76.630

xx

n= = =Â years

Since this is a grotty number it is worth storing this value on your calculator.


1s x nx

n= -


2 2 2 2 2 2 2 245 (2 55 ) (5 65 ) (10 75 ) (8 85 ) (3 95 ) 105

181,350

x = + ¥ + ¥ + ¥ + ¥ + ¥ +

=

Â

Substituting:

{ } { }2 2 2 21 1181,350 30 76.6 172.99

1 29s x nx

n= - = - ¥ =

- Â


172.99 13.15s = = years Had we not stored the mean, we would have got a final answer of 12.95 (if we used a mean of 76.7) and 13.13 (if we used a mean of 76.67).



Solution 3.19

(i) We are given 12, 78n x= = and 4s = . Substituting these values into

{ }2 2 21

1 is x nxn

= -- Â gives:

{ }2 2 2 2 2 214 12 78 11 4 12 78 73,184

11x x= - ¥ fi = ¥ + ¥ =Â Â

(ii) We are given 6, 19n x= = and 5s = . This gives:

{ }2 2 2 2 2 215 6 19 5 5 6 19 2,291

5x x= - ¥ fi = ¥ + ¥ =Â Â

Now we need to include the new value of 16ºC. This gives a new sum of

squares of:

2 22,291 16 2,547x = + =Â

We also need to calculate the new mean of the 7 days like we did in Chapter 2.

For the 6 days we had 6 19 114x = ¥ =Â , so for the 7 days we have:

1307114 16 130x x= + = fi =Â

The variance for all 7 days is:

( ){ }22 1307

12,547 7 22.119

6s = - ¥ =

Therefore the standard deviation for all 7 days is:

22.119 4.70s = = It is important to store the full value of the mean on your calculator (rather than

rounding it to 18.57), otherwise your final answer will be 4.71.



(iii) Let x be the men and y be the women. This gives:

10 72 720

8 78 624

x

y

= ¥ =

= ¥ =

ÂÂ

So the grand total of their ages is 720 624 1,344+ = . This gives a mean of all 18

policyholders to be:

23

1,34474

18= years

Similarly:

{ }

{ }

2 2 2 2 2 2 2

2 2 2 2 2 2 2

17 10 72 9 7 10 72 52,281

9

19 8 78 7 9 8 78 49,239

7

x

y

s x x

s y y

= = - ¥ fi = ¥ + ¥ =

= = - ¥ fi = ¥ + ¥ =

Â Â

Â Â

So the grand total sum of squares is 52,281 49,239 101,520+ = . Using this and

our new mean of 2374 years we get the variance of all 18 policyholders to be:

( ){ }223

1101,520 18 74 68.706

17- ¥ =

Therefore the standard deviation of all 18 policyholders is:

68.706 8.29= years Note that the standard deviation of the two combined groups need not be a value

between their individual standard deviations. For example, both groups could have very small standard deviations but have very different means. The combined group would therefore be very spread out.



Solution 3.20

The third order central moment is given by:

31( )ix x

n-Â

So first we need to find the mean:

18 21 24 (3 25) 30 168

247 7

xx

n

+ + + ¥ += = = =Â

So the mean salary is £24,000. Substituting this into our formula gives:

{ }3 3 3 3 3

37

1(18 24) (21 24) (24 24) 3 (25 24) (30 24)

7

3

= - + - + - + ¥ - + -

= -

Since we were working in £1,000 units, our third central moment will be measured in

3(£1,000) units. Hence, the third central moment is:

3 9373 (£1,000) 3.428571 10- ¥ = - ¥

Given the strangeness of swapping units, it may be easier to work with the ‘correct’ figures throughout, ie we would use £18,000, £21,000, etc.



Solution 3.21

The dotplot for these data values is:

0 1 2 3 4 5

Negatively skew

We can see that it is negatively skewed, so we would expect a negative answer.

The mean is 258 3.125= and the third central moment is:

3 3 3 3 31

8

3 3 3 3 318

(0 3.125) (2 3.125) 2 (3 3.125) 3 (4 3.125) (5 3.125)

( 3.125) ( 1.125) 2 ( 0.125) 3 0.875 1.875

2.92

È ˘= - + - + ¥ - + ¥ - + -Î ˚

È ˘= - + - + ¥ - + ¥ +Î ˚

= - This gives a negative value for this negatively skewed distribution, as required. Note that we are unable to compare directly the ‘amount of skewness’ between this and our positive skewed example. In Chapter 7 we will meet a measure which will allow us to do such comparisons. Solution 3.22

The new standard deviation would be as . See Appendix B for a proof of this result. Solution 3.23

If 2var( )X s= , then 2 2var( )aX a s= .



Solution 3.24

The median of Group A is 8 whereas the median of B is 7. Therefore on average the values in Group A are higher than Group B. The IQR of Group A is a smaller than Group B, so the spread of Group A is smaller than Group B. Looking at the whole boxplot, we see that Group A is roughly symmetrical whereas Group B is positively skewed (as most of the data values are to the right of the median).


IFE: 2015 Examinations The Actuarial Education Company


P3.1 First, we place the claims in order: 198 200 203 208 209 210 210 212 215 215 216 220 221 223 224 There are 15 claims, so the median is the ½ 15 ½ 8th¥ + = value, which is £212. The lower quartile is the ¼ 15 ¼ 4th¥ + = value, which is £208. The upper quartile is the ¾ 15 ¾ 12th¥ + = value, which is £220. Hence, the interquartile range is: £220 £208 £12IQR = - =

P3.2 There are 100 values, so the lower quartile is the ¼ 100 ¼ 25¼th¥ + = value. Counting through the frequencies:

x 0 1 2 3 4 5≥ f 14 25 26 18 12 5

We can see that the lower quartile is 1. The upper quartile is the ¾ 100 ¾ 75¾th¥ + = value. Counting through the frequencies:

x 0 1 2 3 4 5≥ f 14 25 26 18 12 5

We can see that the upper quartile is 3. Hence, the interquartile range is: 3 1 2IQR = - =

14 values 39 values

14 values

39 values 65 values

83 values


The Actuarial Education Company IFE: 2015 Examinations

P3.3 There are 100 values, so the lower quartile is the ¼ 100 25th¥ = value. So counting through the frequencies:


£100 – 149.99 10 £150 – 199.99 25 £200 – 249.99 30 £250 – 299.99 15 £300 – 349.99 12 £350 – 399.99 4 £400 or over 0

The 25th value is the 11th value in the £150 £199.99- group. Using interpolation, we get the lower quartile to be:

11

150 49.99 £172.0025

+ ¥ =

The upper quartile is the ¾ 100 75th¥ = value. Counting through the frequencies:


£100 – 149.99 10 £150 – 199.99 25 £200 – 249.99 30 £250 – 299.99 15 £300 – 349.99 12 £350 – 399.99 4 £400 or over 0

The 75th value is the 6th value in the £250 £299.99- group. Using interpolation, we get the upper quartile to be:

6

250 49.99 £270.0015

+ ¥ =

Hence, the interquartile range is: £270 £172 £98IQR = - =

4

14

39

4

14

39

69

84



P3.4 Now there are 100 passengers, so the lower quartile is the 14 100 25th¥ = value.

Waiting time

(mins) Cumulative Frequency

– 4 7 – 5 15 – 6 27 – 7 45 – 8 70 – 9 88 – 10 97 – 15 100

We can see that the 25th value is in the – 6 group. To estimate its value we need to think about the original frequency distribution. Since the waiting times are rounded to the nearest minute, the – 6 group relates to the 5.5 6.5t£ < group, which has 27 15 12- = values in it. The lower quartile is the 25 15 10th- = value in this group out of its 12 values altogether. Using linear interpolation:

101 125.5 (6.5 5.5) 6.3Q = + ¥ - = mins

The upper quartile is the 34 100 75th¥ = value. We can see that this is in the – 9 group.

Since the waiting times are rounded to the nearest minute, the – 9 group relates to the 8.5 9.5t£ < group in the original frequency distribution. It has 88 70 18- = values in it. The upper quartile is the 75 70 5th- = value in this group out of its 18 values altogether. Using linear interpolation:

53 188.5 (9.5 8.5) 8.7Q = + ¥ - = mins


3 1 8.7 6.3 2.4IQR Q Q= - = - = mins

16th – 27th

values in here

71st – 88th

values in here



P3.5 First, we place the claims in order: 55 87 112 136 138 159 165 176 192 203 221 253 254 308 336 Now to draw a boxplot we need to calculate the median and the quartiles. There are 15 claims, so the median is the ½ 15 ½ 8th¥ + = value, which is £176. The lower quartile is the ¼ 15 ¼ 4th¥ + = value, which is £136. The upper quartile is the ¾ 15 ¾ 12th¥ + = value, which is £253. Therefore the boxplot is:

55 136 176 253 336

In an exam you would have been expected to draw the boxplot accurately using graph paper.

P3.6 First we calculate the mean:

3 8 0 2 43.4

5

xx

n

+ + + += = =Â

The formula for the sample variance is:

2 2 21

1s x nx

nÈ ˘= -Î ˚- Â

So we need 2xÂ :

2 2 2 2 2 23 8 0 2 4 93x = + + + + =Â

This gives:

2 2193 5 3.4 8.8

4s È ˘= - ¥ =Î ˚



P3.7 The sample mean is given by:

(44 0) (19 1) (10 2) (8 3) (7 4) (3 5) 1061.1648

44 19 10 8 7 3 91

xx

n

¥ + ¥ + ¥ + ¥ + ¥ + ¥= = = =+ + + + +

Â


2 2 21

1s x nx

nÈ ˘= -Î ˚- Â

So we need 2xÂ :

2 2 2 2 2 2 2(44 0 ) (19 1 ) (10 2 ) (8 3 ) (7 4 ) (3 5 ) 318x = ¥ + ¥ + ¥ + ¥ + ¥ + ¥ =Â

This gives:

2 21318 91 1.1648 2.1614

90s È ˘= - ¥ =Î ˚

Now don’t get caught out! The question asks for the sample standard deviation:

2.1614 1.47s = =



P3.8 Using the midpoints of 5, 15, 30, 50, 90 we get the mean of:

(4 5) (13 15) (21 30) (9 50) (3 90) 1,56531.3

4 13 21 9 3 50

xx

n

¥ + ¥ + ¥ + ¥ + ¥= = = =+ + + +

Â


2 2 21

1s x nx

nÈ ˘= -Î ˚- Â

So we need 2xÂ :

2 2 2 2 2 2(4 5 ) (13 15 ) (21 30 ) (9 50 ) (3 90 ) 68,725x = ¥ + ¥ + ¥ + ¥ + ¥ =Â

This gives:

2 2168,725 50 31.3 402.87

49s È ˘= - ¥ =Î ˚

Now don’t get caught out! The question asks for the sample standard deviation:

402.87 20.07s = =



P3.9 For the eight policies, we are given 8, 31,500n x= = and 37,236.70s = . This gives:

{ }2 2 2

2 2 2 10

137,236.70 8 31,500

7

7 37,236.70 8 31,500 1.7644 10

x

x

= - ¥

fi = ¥ + ¥ = ¥

Â

Â

We now need to include the ninth policy of £60,000. This gives a new sum of squares of:

2 10 2 101.7644 10 60,000 2.1244 10x = ¥ + = ¥Â

We also need to calculate the new mean for the nine policies. For the eight policies:

8 31,500 252,000x = ¥ =Â

When we add the ninth policy of £60,000 we get:

252,000 60,000 312,000x = + =Â

Therefore the sample mean of the nine policies is:

312,000

£34,666.679

xx

n= = =Â

Substituting the new mean and new sum of squares into the variance formula, we get:

2

2 10 91 312,0002.1244 10 9 1.3035 10

8 9s

Ï ¸Ô ÔÊ ˆ= ¥ - ¥ = ¥Ì ˝Á ˜Ë ¯Ô ÔÓ ˛

But the question requires the standard deviation:

91.3035 10 36,104s = ¥ =



P3.10 Using w for the salary of a white-collar worker and b for the salary of a blue-collar worker, we get:

28, 470 28, 470 47 1,338,09047

ww w= = fi = ¥ =Â Â

21, 420 21, 420 24 514,08024

bb b= = fi = ¥ =Â Â

So the overall mean is:

1,338,090 514,080 1,852,170£26,086.90

47 24 71

xx

n

+= = = =+

Â

Similarly:

{ }2 2 2 2

2 2 2 10

147 28,470 4,270

46

46 4,270 47 28,470 3.8934 10

ws w

w

= - ¥ =

fi = ¥ + ¥ = ¥

Â

Â

{ }2 2 2 2

2 2 2 10

124 21,420 3,020

23

23 3,020 24 21,420 1.1221 10

bs b

b

= - ¥ =

fi = ¥ + ¥ = ¥

Â

Â

Our new sum of squares for all 71 salaries is:

2 10 10 103.8934 10 1.1221 10 5.0155 10x = ¥ + ¥ = ¥Â

This gives the new variance for all 71 salaries of:

2 10 2 715.0155 10 71 26,086.90 2.6259 10

70s È ˘= ¥ - ¥ = ¥Î ˚

So the standard deviation is:

72.6259 10 £5,124s = ¥ =



P3.11 All we have to do is use our rule that if the data is transformed from x to ax b+ then the standard deviation changes from s to as . In this question the marks have been transformed from x to 1.3 10x + . So the new sample standard deviation of the adjusted mark is: 1.3 6 7.8¥ =

P3.12 (i) The stem-and-leaf diagram for Region A is:

4 50

5 50 60 95

6 00 27 37 45 49 60 60 64 82 82 96

7 04 05 05 16

The stem-and-leaf diagram for Region B is:

4 35 50 50 60 70 75 95

5 00 00 20 40 50

6 00 00 00 50

7

8 75

(ii) Region A (19 values)

The median is the ½ 19 ½ 10th¥ + = value, which is 6.60.

The lower quartile is the ¼ 19 ¼ 5th¥ + = value, which is 6.00. The upper quartile is the ¾ 19 ¾ 15th¥ + = value, which is 6.96. Hence the interquartile range is 6.96 6.00 0.96- = .

Region B (17 values)

The median is the ½ 17 ½ 9th¥ + = value, which is 5.00.

The lower quartile is the ¼ 17 ¼ 4½th¥ + = value, which is 4.65. The upper quartile is the ¾ 17 ¾ 13½th¥ + = value, which is 6.00. Hence the interquartile range is 6.00 4.65 1.35- = .

The rates in Region A are higher than Region B on average. There is slightly more variation in the rates for Region B than for Region A.

Stats Pack-04: Probability Page 1


Chapter 4

Probability

Links to CT3: Chapter 2 Syllabus objectives: (ii)2. Define probability as a set function on a collection of events, stating basic

axioms. (ii)3. Derive basic probabilities satisfied by the probability of occurrence of an event,

and calculate probabilities of events in simple situations. (ii)4. Derive the addition rule for the probability of the union of two events, and use

the rule to calculate probabilities. (ii)5. Define the conditional probability of one event given the occurrence of another

event, and calculate such probabilities. (ii)7. Define independence for two events, and calculate probabilities in situations

involving independence.

0 Introduction

In this chapter we look at calculating simple probabilities using the addition and multiplication rules of probability.

Stats Pack-04: Probability


1 Basic probability

The whole notion of probability is built upon the fact that nothing in life is certain (apart from death and taxes). As actuaries, we are particularly concerned with the uncertainties involved in finance and insurance.

1.1 Terminology

Suppose we roll a die. An outcome is simply a result that we could get, eg 3. The sample space is the complete set of outcomes that we could get, ie 1, 2, 3, 4, 5 and 6.

Question 4.1

List all the possible outcomes (ie give the sample space) when a 10p and a 50p coin are both tossed.

An event is a group or set of possible outcomes that we are interested in. For example, when rolling a die we might be interested in any of the following events: “rolling a 2”, “rolling an odd number”, “rolling a number greater than 3” The probability of an event is a simply a measure of how likely it is that the event happens (ie the chance of the event occurring). All probabilities must lie between 0 and 1 inclusive, where 0 is the probability of an impossible event and 1 is the probability of a certain event. We use the notation ( )P A to stand for the probability of event A occurring.

In summary: 0 ( ) 1P A£ £

( ) 0 if is an impossible event

( ) 1 if is a certain event

P A A

P A A

=

=



1.2 Calculating probabilities

Suppose we wish to calculate the probability of rolling a 3 on a die. We use the fact that each of the outcomes is equally likely. Of the six outcomes (1, 2, 3,

4, 5 and 6) only one of them is a 3. Therefore only 16 of the total outcomes is a 3, hence

the probability is 16 .

Question 4.2

A die is rolled. What is the probability that a number greater than 4 is rolled?

Definition The probability that event A occurs is given by:

number of ways event can happen( )

total number of all possible outcomes

AP A =

We shall now look at an example where several of the outcomes are the same. Suppose that in an ActEd tutorial there are 8 male and 4 female students. What is the probability that a student picked at random is male? Let M be the event “a male student is picked”. Since there are 8 (different) male students there are 8 ways that event M could happen. There are 12 (different) students altogether so there are 12 possible outcomes in total. So:

8 2

( )12 3

P M = =

Question 4.3

A pile of 15 scripts contains two CT3’s, three CT4’s, four CA1’s and six ST5’s. A marker picks a script from the pile at random. What is the probability that the marker picks a CT Series script?



Question 4.4

In a CT3 tutorial there are 11 students of which 6 are female. Three of the women and 2 of men are also taking CT4. What is the probability that a student picked at random: (i) is a male who is also taking CT4 (ii) is not taking CT4?

1.3 Complementary events

If we have an event A then the complementary event is the event of A not happening. For example, when rolling a die:

516 6(4) and (not 4)P P= =

Note that these probabilities add up to 1. This is because together the events cover all of the possible outcomes so it is certain that one or the other occurs. This gives us an easy way to find the probability of a complementary event: (not 4) 1 (4)P P= -

Question 4.5

The probability that a car claim to a certain company is in excess of £1,000 is 0.6. What is the probability that a claim is not in excess of £1,000?

Sometimes the notation A¢ is used to denote the complement of event A. In which case: ( ) ( ) 1 or ( ) 1 ( )P A P A P A P A+ = = -¢ ¢

You may also see the notation A instead.



2 The addition rule

We are now going to calculate the probability of one or another event happening.

2.1 Mutually exclusive events

Two events are said to be mutually exclusive if only one or the other can occur. Mutually means together, so mutually exclusive means together they exclude each other (ie we cannot have both of them happening). For example, when we roll a die we can roll a 3 or a 4, but we cannot roll a number which is both 3 and 4! So the event “roll a 3” is mutually exclusive to the event “roll a 4”. Similarly, an insurance policy has either given rise to a claim or it hasn’t, it can’t have done both! So the event “a policy has given rise to a claim” is mutually exclusive to the event “a policy has not given rise to a claim”.

Question 4.6

State which of the following pairs of events are mutually exclusive: (i) win a football match or lose a football match (ii) wear a red tie or wear black shoes (iii) pick a diamond from a pack of cards or pick an ace from a pack of cards (iv) roll an even number on a die or an odd number on a die (v) roll an even number on a die or a prime number on a die.

This idea can be extended to more than two events. For example, for a pension scheme during the next year, the events of a particular active member dying, withdrawing or retiring are all mutually exclusive.



2.2 The addition rule for mutually exclusive events

If I asked you to calculate the probability of rolling a 3 or a 4 on a die you would

probably (excuse the pun) say 26 without thinking. Breaking this down:

Event Outcomes Probability

“roll a 3” 3 16

“roll a 4” 4 16

“roll a 3 or a 4” 3, 4 26

We can see that:

1 1 26 6 6(3 or 4) (3) (4)P P P= + = + =

Trying this for another example: a pile of 15 scripts contains two CT3’s, three CT4’s, four CA1’s and six ST5’s. A marker picks a script from the pile at random. What is the probability that the marker picks a CT Series or a CA Series script?


“pick a CT Series script” 2×CT3’s, 3×CT4’s 515

“pick a CA Series script” 4×CA1’s 415

“pick a CT or a CA Series script”

2×CT3’s, 3×CT4’s, 4×CA1’s

915

We can see that:

5 9415 15 15(CT or CA Series) (CT Series) (CA Series)P P P= + = + =

Addition Rule For any two mutually exclusive events A and B:

( ) ( ) ( )P A B P A P B= +or



Sometimes the notation A B» is used to denote the event A or B. Using this notation for mutually exclusive events A and B we have: ( ) ( ) ( )P A B P A P B» = +

This notation comes from set theory, which you will look at in Subject CT3.

Question 4.7

In a portfolio of 50 car insurance policyholders, 6 have “4 years no claims bonus”, 15 have “3 years no claims bonus”, 18 have “2 years no claims bonus”, 7 have just “one year no claims bonus” and the rest have none. A policyholder is picked at random, what is the probability that they have: (i) 3 or 4 years no claims bonus (ii) 0 or 1 years no claims bonus (iii) neither 2 nor 3 years no claims bonus?



We can easily extend the addition rule to more than two mutually exclusive events. For example, if events A, B and C are mutually exclusive then: ( or or ) ( ) ( ) ( )P A B C P A P B P C= + +

Question 4.8

In a traffic survey the probabilities of observing various types of cars are given in the table below:

Feature Probability blue 0.25 white 0.3 silver 0.15 Ford 0.3

Renault 0.2 (i) What is the probability that the next car:

(a) is a Ford or a Renault

(b) is blue, white or silver? (ii) Why is the probability that the next car is a Ford or blue not 0.3 0.25+ ?



2.3 The addition rule for non-mutually exclusive events

In part (ii) of Question 4.8 our addition rule broke down. This was because the events were not mutually exclusive, ie they could both happen at the same time. We are now going to extend our addition rule to cover non-mutually exclusive events. Since these events can both happen at the same time, when we talk about the probability of events A or B occurring we mean the probability that either A or B or both occur. Suppose we want to calculate the probability of rolling an odd number or a number greater than 4 on a die. Looking at the outcomes:


“roll an odd number” 1, 3, 5 36

“roll a number greater than 4” 5, 6 26

“roll an odd number or a number greater than 4”

1, 3, 5, 6 46

We can see that the addition rule does not work:

3 526 6 6(odd greater than 4) (odd) (greater than 4)P P P= + = + =/or

Why is this? Well if we just ‘added up the outcomes’ we would have 1, 3, 5, 5, 6. We have counted the 5 twice as it is in both groups! We can ‘fix’ our addition rule by taking off the probability of getting a 5 – as we counted it twice.

3 2 1 46 6 6 6(odd greater than 4) (odd) (greater than 4) (5)P P P P= + - = + - =or

But wait! The number 5 is in both groups – it is an odd number and greater than 4. So we could write:

(odd greater than 4) (odd) (greater than 4) (odd greater than 4)P P P P= + -or and



Addition Rule For any two events A and B:

( ) ( ) ( ) ( )P A B P A P B P A B= + -or and

Sometimes the notation A B« is used to denote the event A and B. Recall that A B» is used to denote the event A or B (or both). Hence, we have: ( ) ( ) ( ) ( )P A B P A P B P A B» = + - «

Again this notation comes from set theory, which you will look at in Subject CT3. The addition rule for mutually exclusive events is a special version of this more general rule. This is because if events A and B are mutually exclusive then: ( ) 0P A B =and

Question 4.9

In a group of students it is known that 45% watch Alias and 50% watch West Wing and 30% watch Alias and West Wing on television. Calculate the probability that a student watches Alias or West Wing.



3 The multiplication rule

We are now going to calculate the probability of two events both happening, ie the probability of event A and event B occurring.

3.1 Independent events

Two events are said to be independent if they do not affect each other’s probability of occurring. If events A and B are independent then the probability of B happening or not happening does not depend in any way on event A, ie event A has no influence whatsoever on event B and vice versa. For example, rolling a die and flipping a coin will not affect each other in any way. The results of each are therefore independent of each other. However getting an exemption from Subject CT3 (you wish!) and taking the Subject CT3 exam are not independent as if you have got an exemption you won’t take the exam (unless you’re a glutton for punishment).

Question 4.10

State which of the following pairs of events are independent: (i) catching the early bus and getting to work on time (ii) one policyholder dying and a second policyholder dying (iii) passing an exam on the first sitting and passing it on the second sitting (iv) rolling a 6 on one die and rolling a 6 on a second die (v) Team A winning a football match and Team B winning the same football match.

This idea can be extended to more than two events. For example, getting a tail on a coin, getting a tail on a second coin and getting a tail on a third coin are all independent events.



3.2 The multiplication rule for independent events

Consider rolling a die and flipping a coin. We are going to calculate the probability of rolling a number greater than 4 and getting a tail. We can calculate the probability of each of these events happening on their own using our basic rules:

2 16 3

12

(roll greater than 4)

(get a tail)

P

P

= =

=

Now to calculate the probability of both happening at the same time we need to look at all the possible outcomes from rolling a die and flipping a coin. Using H to stand for head and T for tail we get: H1 H2 H3 H4 H5 H6 T1 T2 T3 T4 T5 T6 We can see that there are 12 possible outcomes altogether, of which two of them have a tail and are more than 4. So:

2 112 6(greater than 4 tail)P = =and

But we can see that:

1 1 13 2 6(greater than 4 tail) (greater than 4) (tail)P P P= ¥ = ¥ =and

Similarly, if the probability that an employee is right handed is 0.9 and the probability that they wear glasses is 0.4 then the probability that they are right handed and wear glasses is:

(right handed wear glasses) (right handed) (wear glasses)

0.9 0.4 0.36

P P P= ¥

= ¥ =

and

Question 4.11

What is the probability that an employee is left handed and wears glasses?



Multiplication Rule For any two independent events A and B:

( ) ( ) ( )P A B P A P B= ¥and

Recall that the notation A B« is used to denote the event A and B. In which case for independent events A and B we have: ( ) ( ) ( )P A B P A P B« =


Question 4.12

A fruit machine has two wheels. The wheels spin and then stop to reveal an object.

The probability of getting a cherry on the first wheel is 310 , whereas the probability of

getting a cherry on the second wheel is 25 . Find the probability of getting:

(i) a cherry on the first wheel and on the second wheel (ii) a cherry on the second wheel but not on the first (iii) no cherries on both of the two wheels?

We can easily extend the multiplication rule to more than two independent events. For example, if events A, B and C are independent then: ( ) ( ) ( ) ( )P A B C P A P B P C= ¥ ¥and and



Question 4.13

The number 9 bus is never on time – it’s always either early or late! The probability that the bus is late on any day is 0.6, independent of which day it is. What is the probability that the number 9 bus is: (i) late on Monday, Tuesday and Wednesday (ii) early on Monday, Tuesday, Wednesday, Thursday and Friday (iii) early on Monday, Wednesday and Friday and late on Tuesday and Thursday.

We can also work backwards – given the answer, calculate the probabilities or the number of events we need:

Question 4.14

The probability that a missile misses a target is 3%. How many missiles would have to be fired at a target to ensure that the probability the target is missed by all the missiles is less than 1 in a million.

3.3 The multiplication rule for dependent events

We will now extend our multiplication rule to cover dependent events (ie those that are not independent). First of all we need to define some new notation. For example, when going outside the probability of taking an umbrella is unlikely to be independent of it raining. So we might have the following:

(taking umbrella if it's raining) 0.8

(taking umbrella if it's not raining) 0.1

P

P

=

=

Since the probability of taking an umbrella is conditional (ie dependent) on whether or not it is raining we call it (unsurprisingly) a conditional probability.



We use the notation ( | )P B A to stand for the probability of B happening conditional (ie

dependent) on A having happened. We read this as “the probability of B happening given that A has happened”. So for our example we have:

(taking umbrella | raining) 0.8

(taking umbrella | not raining) 0.1

P

P

=

=

This is read “the probability of taking an umbrella given that it is raining”.

Question 4.15

If an application for an ActEd tutorial is received early, the probability that the student gets their first choice tutorial is 0.8. If the application is received late, the probability that the student gets their first choice tutorial is 0.3. Write down: (i) (gets 1st choice | early application)P

(ii) (gets 1st choice | late application)P

(iii) (does not get 1st choice | early application)P .

In the previous question we were given the conditional probabilities, however in some questions we can calculate it. For example, suppose I have a packet of 2 lemon and 3 strawberry sweets. I choose a sweet randomly and eat it and then I choose another sweet randomly and eat it. If the 1st sweet I eat is lemon, then there are now only 4 sweets left in the packet (1 lemon and 3 strawberry) so:

14

34

(2nd sweet is lemon |1st sweet is lemon)

(2nd sweet is strawberry |1st sweet is lemon)

P

P

=

=

Similarly, if the 1st sweet I eat is strawberry, then there are only 4 sweets left (2 lemon and 2 strawberry) so:

2 14 2

2 14 2

(2nd sweet is lemon |1st sweet is strawberry)

(2nd sweet is strawberry |1st sweet is strawberry)

P

P

= =

= =



Question 4.16

There are 3 jam doughnuts and 4 iced doughnuts on a table. I eat one of these doughnuts and then eat a second doughnut at random. Find: (i) (2nd doughnut is iced |1st doughnut is jam)P

(ii) (2nd doughnut is jam |1st doughnut is jam)P

(iii) (2nd doughnut is iced |1st doughnut is iced)P .

We now look at how we can calculate probabilities with dependent events. Returning to our umbrella example: The probability that it rains on any day is 0.2. The probability that I take an umbrella when it’s raining is 0.8. Find the probability that it rains and I take an umbrella. All we do is multiply the probabilities together:

(raining take umbrella) (raining) (take umbrella | raining)

0.2 0.8

0.16

P P P= ¥

= ¥

=

and

Similarly, consider a packet of 2 lemon and 3 strawberry sweets. I choose a sweet randomly and eat it and then I choose another sweet randomly and eat it. What is the probability that I eat a lemon sweet and then a strawberry sweet?

32

5 4

6 320 10

(lemon and then strawberry)

(1st sweet is lemon) (2nd sweet is strawberry |1st sweet is lemon)

P

P P= ¥

= ¥

= =



Question 4.17

I catch the bus to work. The probability that the bus is late on any day is 0.6. If the bus is late, the probability that I am late to work is 0.8, otherwise it is 0.3. Find: (i) the probability that the bus is late and I am late to work (ii) the probability that the bus is early and I am late to work (iii) the probability that I am late to work?

Question 4.18

My sock drawer contains 4 blue socks and 4 black socks. (i) In the morning I randomly choose two socks from the drawer. What is the

probability that:

(a) I pick a blue sock followed by a black sock

(b) I pick a pair of black socks? (ii) To be sure that I get a matching pair of socks I decide to randomly choose three

socks from the drawer. What is the probability that: (a) I pick 3 blue socks (b) I pick a blue sock followed by 2 black socks?

The rule that we have been using is:

Multiplication Rule For any two events A and B:

( then ) ( ) ( | )P A B P A P B A= ¥and



Recall that A B« is used to denote the event A and B. In which case for events A and B we have: ( ) ( ) ( | )P A B P A P B A« =


Question 4.19

Write down the formula for: ( then )P B Aand

The multiplication rule for independent events is a special version of this more general rule. Since if events A and B are independent then B does not depend on A so the probability of B happening given that A has happened is unchanged: ( | ) ( )P B A P B=

Hence, if A and B are independent, we get ( ) ( ) ( | ) ( ) ( )P A B P A P B A P A P B= =and .

We have now covered all of the types of events and the probability rules. A helpful way to summarise the four kinds of events met in this chapter is as follows:

mutually exclusive

non-mutually exclusive

independent dependent



Extra practice questions Section 1: Basic probability and complementary events

P4.1 A bag of jelly babies has 4 blackcurrant, 3 orange and 2 strawberry jelly babies left. A jelly baby is picked at random. What is the probability that: (i) a blackcurrant jelly baby is picked (ii) an orange jelly baby is picked?

P4.2 When Allstars play their next match they could win, lose or draw. The probability of them winning is 0.5 and the probability of them losing is 0.3. What is the probability that Allstars: (i) draw in the next match (ii) do not win? Section 2: The addition rule

P4.3 The probability of a student obtaining each grade on a particular Subject CT3 exam is shown below:

Grade Pass FA FB FC FD Probability 0.4 0.3 0.15 0.1 0.05

What is the probability that a student receives: (i) a Pass, FA, or an FB grade (ii) a fail grade (ie FA, FB, FC or FD)?

P4.4 A box of 20 chocolates contains 15 milk and 5 plain chocolates. Three chocolates in the box have a caramel filling. Explain why the probability of a milk chocolate or a caramel filled chocolate is not 15 3 1820 20 20+ = .



P4.5 In a group of home insurance claims, 30% of them arise from burglaries and 60% are more than £2,000. Given that 80% of the claims are for either burglaries or over £2,000 calculate the probability that a claim is for both. Section 3: The multiplication rule

P4.6 In a typical day at work the probability that I get an email is 0.9 and the probability that I get a phone call is 0.4. What is the probability that on a typical day: (i) I get both an email and a phone call (ii) I get neither?

P4.7 The probability that a car claim received by a company is for reversing into something or someone is 0.35. What is the probability that: (i) the next three claims are all due to reversing (ii) none of the next four claims are due to reversing (iii) of the next 5 claims, the middle three are for reversing and the rest aren’t?

P4.8 The probability that a student revises for the CA3 exam is 0.6. If a student revises, the probability that they pass is 0.7, otherwise it is only 0.1. A student sitting CA3 is selected at random, find the probability that: (i) they revised and passed the CA3 exam (ii) they didn’t revise and failed the CA3 exam.

P4.9 Two male and four female candidates are waiting in a room to be called for interview. They are to be called randomly one after the other. Calculate the probability that: (i) a male and then a female candidate are called (ii) the next three candidates called are all female.



P4.10 Subject C1, Specimen 1993, Q1 (adapted) The portfolio of a private investor includes investments in 12 unit trusts, 8 of which are UK trusts and 4 of which are overseas trusts. Suppose the investor decides to check the prices of units in 3 of the 12 trusts selected at random. What is the probability that the 3 selected trusts are all UK trusts? [2]

P4.11 Subject 101, September 2002, Q2 The probability that a component in a rocket motor will fail when the motor is fired is 0.02. To achieve a greater reliability several similar components are to be fitted in parallel; the motor will then fail only if all the individual components fail simultaneously. Determine the minimum number of components required to ensure that the probability

the motor fails is less than one in a billion (ie less than 910- ), assuming that components fail independently. [2]



Chapter 4 Summary Basic probability An outcome is a result we could get, the sample space is the complete set of outcomes we could get and an event is any group of possible outcomes that we are interested in. If all outcomes are equally likely, then the probability of event A happening is:

Number of ways event can happen

( )Total number of events that can happen

AP A =

where 0 ( ) 1P A£ £ .

Addition Rule For any two events A and B: ( ) ( ) ( ) ( and )P A B P A P B P A B= + -or

Mutually exclusive events A and B cannot both happen together, hence

( and ) 0P A B = which gives:

( ) ( ) ( )P A B P A P Bfi = +or

Multiplication Rule For any two events A and B: ( ) ( ) ( | )P A B P A P B A=and

where ( | )P B A is the probability of B happening given that A has happened.

Independent events A and B do not affect each other’s probability of happening, hence

( | ) ( )P B A P B= which gives:

( ) ( ) ( )P A B P A P Bfi =and




The outcomes that we could get (ie the sample space) are:

10p coin 50p coin head head tail tail

head tails head tails

Solution 4.2

Of the six outcomes (1, 2, 3, 4, 5 and 6) two of them (5 and 6) are greater than 4. So:

2 16 3(roll more than 4)P = =

Solution 4.3

There are 5 (different) CT Series scripts (two CT3’s and three CT4’s) out of 15 scripts altogether. Hence:

5 115 3(choose CT Series script)P = =

Solution 4.4

(i) There are 2 males who are taking 104 out of 11 students altogether. Hence:

211(male taking CT4)P =

(ii) 3 women and 3 men are not taking CT4. Hence:

611(not taking CT4)P =

Solution 4.5

(not in excess of £1,000) 1 (in excess of £1,000) 1 0.6 0.4P P= - = - =



Solution 4.6

(i) mutually exclusive (we can’t win and lose a football match) (ii) not mutually exclusive (since we can wear both a red tie and black shoes) (iii) not mutually exclusive (since we could pick the ace of diamonds) (iv) mutually exclusive (we can’t have an even and an odd number) (v) not mutually exclusive (since 2 is both an even number and a prime number) Solution 4.7

(i) 15 6 2150 50 50(3 or 4 years no claims)P = + =

(ii) 74 1150 50 50(0 or 1 years no claims)P = + =

(iii) 7 6 17450 50 50 50(neither 2 nor 3 years) (0,1 or 4 years)P P= = + + =

Solution 4.8

(i) (a) (Ford or Renault) 0.3 0.2 0.5P = + =

(b) (blue, white or silver) 0.25 0.3 0.15 0.7P = + + =

(ii) This is not the correct probability because the events “Ford” and “blue” are not

mutually exclusive (since we could have a blue Ford). Hence we cannot use the addition rule.

Solution 4.9

Watching Alias and West Wing are not mutually exclusive (since 30% of students watch both), therefore using the new rule we get:

( ) ( ) ( ) ( )

0.45 0.50 0.3

0.65

P Alias West Wing P Alias P West Wing P Alias West Wing= + -

= + -

=

or and



Solution 4.10

(i) not independent (if I catch the early bus I am more likely to be on time) (ii) independent (unless they’re married or involved in some disaster together) (iii) not independent (if I pass on the first sitting I won’t have to sit it again!) (iv) independent (the dice do not affect each other!) (v) not independent (if one wins the other can’t!) Solution 4.11

People are either left or right handed so:

(left handed) 1 (right handed) 1 0.9 0.1P P= - = - =

Hence, we have:

(left handed wear glasses) (left handed) (wear glasses)

0.1 0.4 0.04

P P P= ¥

= ¥ =

and

Solution 4.12

Since the wheels are independent this gives:

(i) 3 6 3210 5 50 25(cherry on 1st cherry on 2nd)P = ¥ = =and

(ii) 7 72 1410 5 50 25(no cherry on 1st cherry on 2nd)P = ¥ = =and

(iii) 7 3 2110 5 50(no cherry on 1st no cherry on 2nd)P = ¥ =and



Solution 4.13

Since the probability the bus is late on any day is independent, we get: (i) 0.6 0.6 0.6 0.216¥ ¥ = (ii) 0.4 0.4 0.4 0.4 0.4 0.01024¥ ¥ ¥ ¥ = (iii) 0.4 0.6 0.4 0.6 0.4 0.02304¥ ¥ ¥ ¥ = Solution 4.14

Now:

(all missiles miss) (1st missile misses) (2nd missile misses)

0.03 0.03

P P P= ¥ ¥

= ¥ ¥

We want 11,000,000(all missiles miss)P < so for n missiles we want:

11,000,0000.03 0.000001n < =

Now since 30.03 0.000027= and 40.03 0.00000081= , we need to fire 4 missiles. Solution 4.15

(i) (get 1st choice | early application) 0.8P =

(ii) (get 1st choice | late application) 0.3P =

(iii) Since the probability of a student getting their 1st choice given that they submit

it early is 0.8, we have: (not get 1st choice | early application) 0.2P =



Solution 4.16

If the first doughnut I eat is a jam one then there are 2 jam and 4 iced doughnuts left. Hence:

(i) 4 26 3(2nd doughnut is iced |1st doughnut is jam)P = =

(ii) 2 16 3(2nd doughnut is jam |1st doughnut is jam)P = =

If the first doughnut I eat is an iced one then there are 3 jam and 3 iced doughnuts left. Hence:

(iii) 3 16 2(2nd doughnut is iced |1st doughnut is iced)P = =

Solution 4.17

(i) (bus late and late to work) (bus late) (late to work | bus late)

0.6 0.8

0.48

P P P= ¥

= ¥

=

(ii) (bus early and late to work) (bus early) (late to work | bus early)

0.4 0.3

0.12

P P P= ¥

= ¥

=

(iii) Either the bus is early and I’m late to work or the bus is late and I’m late to

work. These are mutually exclusive events, so we can use the addition rule ( or ) ( ) ( )P A B P A P B= + . Using the results from parts (i) and (ii) as events A

and B gives: (late to work) 0.48 0.12 0.6P = + =



Solution 4.18

(i) (a) If I pick a blue sock then there are 3 blue and 4 black socks remaining.

4 48 7

16 256 7

(1st sock blue 2nd sock black)

(1st sock blue) (2nd sock black |1st sock blue)

P

P P= ¥

= ¥

= =

and

(b) If I pick a black sock then there are 4 blue and 3 black socks remaining.

34

8 7

31256 14

(1st sock black 2nd sock black)

(1st sock black) (2nd sock black |1st sock black)

P

P P= ¥

= ¥

= =

and

(ii) (a) Similarly, we get:

34 2 18 7 6 14

(1st sock blue 2nd sock blue 3rd sock blue)P

= ¥ ¥ =

and and

(b) And:

34 4 1

8 7 6 7

(1st sock blue 2nd sock black 3rd sock black)P

= ¥ ¥ =

and and

Solution 4.19

( then ) ( ) ( | )P B A P B P A B= ¥and




P4.1 (i) There are 9 jelly babies of which 4 are blackcurrant. Hence:

49(blackcurrant)P =

(ii) The are 9 jelly babies of which 3 are orange. Hence:

3 19 3(orange)P = =

P4.2 (i) (draw) 1 (win) (lose) 1 0.5 0.3 0.2P P P= - - = - - =

(ii) (not win) 1 (win) 1 0.5 0.5P P= - = - =

P4.3 (i) Since you can only get one grade on an exam, the events are mutually exclusive:

(pass, FA FB) (pass) (FA) (FB)

0.4 0.3 0.15

0.85

P P P P= + +

= + +

=

or

(ii) We can either use the addition rule:

(FA, FB, FC FD) 0.3 0.15 0.1 0.05 0.6P = + + + =or

or we could use the fact that it is a complementary event:

(fail grade) 1 (pass grade) 1 0.4 0.6P P= - = - =

P4.4 Because the events “pick milk chocolate” and “pick caramel filled chocolate” are not mutually exclusive (since we could have a caramel filled milk chocolate). Hence we cannot use the simple addition rule.



P4.5 We are given:

(burglary claim) 0.3

(claim over £2,000) 0.6

(burglary claim claim over £2,000) 0.8

P

P

P

=

=

=or

Since these are not mutually exclusive events (as we could have a burglary claim for over £2,000) we use:

(burglary claim over £2,000) (burglary claim) (claim over £2,000)

(burglary claim over £2,000)

P P P

P

= +

-

or

and

This gives:

0.8 0.3 0.6 (burglary claim over £2,000)

(burglary claim over £2,000) 0.1

P

P

= + -

fi =

and

and

P4.6 Assuming that receiving an email and a phone call are independent we get: (i) (get email phone call) (get email) (get phone call)

0.9 0.4

0.36

P P P= ¥

= ¥

=

and

(ii) (no emails no phone calls) (no emails) (no phone calls)

0.1 0.6

0.06

P P P= ¥

= ¥

=

and

P4.7 Since the probability of claims are independent we get: (i) 0.35 0.35 0.35 0.042875¥ ¥ = (ii) 0.65 0.65 0.65 0.65 0.179 (3 SF)¥ ¥ ¥ =

(iii) 0.65 0.35 0.35 0.35 0.65 0.0181 (3 SF)¥ ¥ ¥ ¥ =



P4.8 (i) (revise pass exam) (revise) (pass exam | revised)

0.6 0.7

0.42

P P P= ¥

= ¥

=

and

(ii) (not revise fail exam) (not revise) (fail exam | not revise)

0.4 0.9

0.36

P P P= ¥

= ¥

=

and

P4.9 (i) If a male candidate is picked first then there are 1 man and 4 women remaining. Hence:

2 46 5

415

(male and then female) (male) (female | male)

=

P P P= ¥

¥

=

(ii) 34 2 16 5 4 5(female then female then female)P = ¥ ¥ =and and

P4.10 Be careful! The probability is not 8 8 812 12 12¥ ¥ since once a trust is selected we are not

going to select it again. Hence once a trust is selected it is removed from the ‘pile’.

8 7 612 11 10

1455

(UK and UK and UK)P = ¥ ¥

=

After we have removed the first UK trust from the ‘pile’ there are 11 trusts left, of which 7 are UK trusts. Then when we have removed the second UK trust there are 10 trusts left, of which 6 are UK trusts.



P4.11 Now:

(motor fails) (component 1 fails) (component 2 fails)

0.02 0.02

P P P= ¥ ¥

= ¥ ¥

We want 9(motor fails) 10P -< so for n components we want:

90.02 10n -< Using trial and improvement gives:

5 components: 5 90.02 3.2 10-= ¥

6 components: 6 110.02 6.4 10-= ¥ Or using logs gives:

9 ln10

ln 0.02 9ln10 5.30ln 0.02

n n< - fi > - =

Hence we require 6 components.

Stats Pack-05: Advanced probability Page 1


Chapter 5

Advanced probability

Links to CT3: Chapter 2 Syllabus objectives: (ii)2. Define probability as a set function on a collection of events, stating basic

axioms. (ii)3. Derive basic probabilities satisfied by the probability of occurrence of an event,

and calculate probabilities of events in simple situations. (ii)4. Derive the addition rule for the probability of the union of two events, and use

the rule to calculate probabilities. (ii)5. Define the conditional probability of one event given the occurrence of another

event, and calculate such probabilities. (ii)7. Define independence for two events, and calculate probabilities in situations

involving independence.

0 Introduction

In the previous chapter we met mutually exclusive events A and B, for which: ( ) ( ) ( )P A B P A P B= +or

We also met independent events A and B, for which: ( ) ( ) ( )P A B P A P B= ¥and

Stats Pack-05: Advanced probability


Finally, we met dependent events A and B, for which: ( ) ( ) ( | )P A B P A P B A= ¥and

where ( | )P B A was the probability of event B happening given that event A has already

happened. We are now going to apply these rules to more complicated problems using tree diagrams to help display the outcomes more clearly. We will also look at conditional probabilities, which often result in short 3-mark exam questions.



1 Probability diagrams

1.1 Listing outcomes

The probability that a train is late on any day is 0.15 independently of what day it is. What is the probability that the train is late on exactly one of the next two days? Since the train being late on any day is independent of any other day we can say: (late not late) (late) (not late) 0.15 0.85 0.1275P P P= ¥ = ¥ =and

However this is not the answer! Why? Because we need to consider the order the events occur (since we did not specify which of the two days the train was late on). It is helpful to list all the outcomes to ensure that we do not miss any possibilities. Using L to stand for “late” and N to stand for “not late” we get:

1st day 2nd day Probability L L 0.15 0.15 0.0225¥ = L N 0.15 0.85 0.1275¥ = N L 0.85 0.15 0.1275¥ = N N 0.85 0.85 0.7225¥ =

1 Note that since we have listed all of the (mutually exclusive) possibilities their probabilities sum to 1. The options where the train is late on just one of the 2 days are:

1st day 2nd day Probability L N 0.15 0.85 0.1275¥ = N L 0.85 0.15 0.1275¥ =

Hence, since these are mutually exclusive options (the train is either late or not late on the first day) we get:

(train late on just one day) (LN NL)

(LN) (NL)

0.1275 0.1275

0.255

P P

P P

=

= +

= +

=

or



Question 5.1

A ‘Lucky Wheel’ at a summer fair stops on a WIN with probability 0.3. The wheel is spun twice. (i) List all the possible outcomes together with their probabilities. (ii) Hence, calculate the probability that on two spins we get: (a) no wins (b) exactly one win (c) at least one win.

Note that in Question 5.1 parts (ii)(a) and (ii)(c) satisfy: (at least 1 win) 1 (no wins)P P= -

Why? Well there are 3 possible (mutually exclusive) outcomes: no wins, 1 win and 2 wins. So (no wins) (1 win) (2 wins) 1P P P+ + = . Hence:

(at least 1 win) (1 win) (2 wins)

1 (no wins)

P P P

P

= +

= -

This is a useful shortcut to calculating ‘at least’ problems.

Question 5.2

In a darts tournament, the probability that a player hits a triple twenty with a single dart

is 58 . The player throws three darts on his go.

(i) List all the possible outcomes together with their probabilities. (ii) Hence, calculate the probability that on three throws he gets: (a) exactly two triple twenty’s (b) at least one triple twenty.



1.2 Tree diagrams

We now look at using tree diagrams as an alternative (and often quicker) way of listing the outcomes and finding probabilities. Consider the train example again: The probability that a train is late on any day is 0.15 independently of what day it is. What is the probability that the train is late on exactly one of the next two days? We draw a set of branches for each of the choices on the 1st event (Late or Not on the 1st day). Then from each of these choices we draw a further set of branches for the choices on the 2nd event. The final outcomes are obtained by following the various routes through the tree.

L

N

L

N

L

N

1st day 2nd day outcomes

LL

LN

NN

NL

The probabilities are written along the branches. The probabilities for the final outcomes can be obtained by multiplying the probabilities along the routes through the tree.

0.15

0.85

0.15

0.85

0.15

0.85

L

N

L

N

L

N


LL

LN

NN

NL

probabilities

0.15 0.15 = 0.0225

0.15 0.85 = 0.1275

0.85 0.15 = 0.1275

0.85 0.85 = 0.7225

We can answer the question by again choosing the relevant outcomes of LN and NL. These give a total probability of 0.1275 0.1275 0.255+ = as before.

total 1=



Question 5.3

In a penalty shoot out, the probability that a particular player scores a goal is 0.8 independent of which penalty he is taking. This player takes two penalties. (i) Using G for goal and N for no goal, complete this probability tree:

0.8

0.8

G

N

G

N

1st penalty 2nd penalty outcomes

GG

probabilities

0.8 0.8 = 0.64

(ii) Hence, calculate the probability that on two shots the player scores: (a) exactly one goal (b) at least one goal.

Question 5.4

The probability that a car insurance claim involves a 3rd party is 0.6. If a claim involves a 3rd party, the probability that it is in excess of £2,000 is 0.95, otherwise it is only 0.25. (i) Draw a tree diagram showing all the possible outcomes together with their

probabilities. (ii) Hence, calculate the probability that: (a) a claim does not involve a 3rd party and is under £2,000 (b) a claim is in excess of £2,000?



Some problems do not give a ‘proper’ tree. For example, a triple bypass operation has a success rate of 70%. If the operation is unsuccessful it can be repeated once, but this time the probability of success is reduced to only 45%. Using S to represent a successful operation and F to represent a failure, the first branch is:

0.7

0.3

S

F

1st operation

But we only need a second operation if the first one is a failure. Hence, we will continue the tree for this branch only.

0.7

0.30.45

0.55

S

F

S

F

1st operation 2nd operation outcomes

S

FF

FS

probabilities

0.7

0.3 0.45 = 0.135

0.3 0.55 = 0.165

So if we required the probability that one of the operations is a success, this would be the S and FS outcomes with total probability of 0.835.

Question 5.5

A man aged exactly 70 has a policy with a life assurance company, which will pay his wife £50,000 if he dies in the next two years. The probability that he dies in the next year is 0.025, whereas the probability that he dies in the year after that is 0.028. Calculate the probability that he dies within the next two years.



We can also use probability trees to calculate probabilities involving messy dependent events. For example, in a group of 12 actuarial students, 8 of them are members of the Institute and 4 are members of the Faculty. Two different students are chosen at random one after the other. Find the probability that they both are members of the same organisation. Using obvious notation, the first branch on the tree presents no problem:

I

F

1st student

8

12

4

12

Now, suppose the first student we had chosen was a member of the Institute then there would be 11 students left (7 Institute and 4 Faculty). So the branches would look like:

I

F

8

12

4

12

7

11

4

11

I

F

II

IF

8

12

7

11= 14

33

8

12

4

11=

8

33

1st student 2nd student outcomes probabilities



However, if the first student selected was a member of the Faculty, then there would be 11 students left (8 Institute and 3 Faculty). So the remaining branches are:

I

F

1st student

8

12

4

12

7

11

4

11

I

F

2nd student outcomes probabilities

II

IF

8

12

7

11=

14

33

8

12

4

11= 8

33

I

F

FI

FF

4

12

8

11=

8

33

4

12

3

11=

3

33

8

11

3

11

Hence to calculate the probability that both students belong to the same organisation we require the outcomes II and FF. This gives a probability of:

3 171433 33 33+ =

Question 5.6

An actuary has 5 “Section 32” pensions and 10 “personal” pensions to review. (i) She selects two pensions at random. Calculate the probability that she selects: (a) two different types of pensions (b) at least one “Section 32” pension. (ii) Suppose the actuary selects three pensions at random. Draw a new probability

tree (or extend your old one) and calculate the probability that she selects: (a) two “personal” pensions (b) at least one “personal” pension.



2 Conditional Probability

We have already met conditional probabilities when we had dependent events. In Question 5.4, we were given the conditional probabilities in the question: “If a claim involves a 3rd party the probability that it is in excess of £2,000 is 0.95 otherwise it is only 0.25.”, ie: (claim exceeds £2,000 | involves 3rd party) 0.95

(claim exceeds £2,000 | does not involve 3rd party) 0.25

P

P

=

=

Whereas when the actuary was choosing pensions in Question 5.6, we could calculate simple conditional probabilities by counting the number and types of pensions left in the pile after choosing the first one, eg we had 15 pensions altogether of which 10 of them were “personal” pensions:

914(2nd pension is "personal" |1st pension is "personal")P =

In this section we will now calculate more complex conditional probabilities. Recall that our multiplication rule for dependent events A and B was: ( ) ( ) ( | )P A B P A P B A= ¥and

Rearranging this we get:

( )

( | )( )

P A BP B A

P A= and

This is the rule we will use, but it is more usual to have the events A and B the other way around:

Conditional probabilities For any two events A and B, the probability of event A happening given that event B has already happened is:

( )( | )

( )

P A BP A B

P B= and



To see how this works we shall consider the following example: The probability that a student revises for the CA3 exam is 0.4. If a student revises, the probability that they pass is 0.7, otherwise it is only 0.1. Given that a student passes the CA3 exam, find the probability that the student revised. Using R for revise, N for not revise, P for pass and F for fail we get the following tree diagram:

0.4

0.6

0.7

0.3

0.1

0.9

R

N

P

F

P

F

revise? pass CA3? outcomes

RP

RF

NF

NP

probabilities

0.4 0.7 = 0.28

0.4 0.3 = 0.12

0.6 0.1 = 0.06

0.6 0.9 = 0.54

OK, we want the probability that the student revised, which would just be 0.4. But wait! We are told that the student passed – so we have a conditional probability: the probability the student revised given that they passed. The formula is:

(revised passed)

(revised | passed)(passed)

PP

P= and

The numerator is simply the RP outcome: (revised passed) ( ) 0.28P P RP= =and

The probability that they passed is slightly trickier as there are two outcomes where the student passes (depending on whether they revised or not): (passed) ( ) 0.28 0.06 0.34P P RP NP= = + =or

Hence:

0.28(revised | passed) 0.824 (3 SF)

0.34P = =



Question 5.7

On a Friday night the probability that a driver has been drinking is 0.2. If a driver has been drinking the probability that they have an accident is 0.05; otherwise it is 0.0001. (i) Calculate the probability that a driver chosen at random on a Friday night has an

accident. (ii) At an accident on a Friday night the police carry out a breath test. What is the

probability that the driver has been drinking?

A more intuitive way of thinking about conditional probabilities, ( | )P A B , is to

consider our original definition of the probability of event A happening from Chapter 4:

number of ways event can happen( )

total number of all possible outcomes

AP A =

We know that the event B has happened, so the numerator becomes the number of ways event A and B can happen. Similarly, since B has happened we no longer total over all of the possible outcomes but just over all the possible outcomes where B has happened.

Question 5.8

A blood test for a particular type of cancer is 95% accurate for a patient with the cancer and 98% accurate for a healthy patient. If only 6% of those actually tested have the cancer, calculate the probability that: (i) a patient tests positive (ii) the wrong result is given (iii) a patient who gets a positive result actually has the cancer.

Common Error: Students often confuse ( and )P A B with ( | )P A B . Check to see if event B as happened

already or not.



Extra practice questions Section 1: Probability diagrams

P5.1 A new motorist starts on a “0% no claims bonus” with an insurance company. If they make no claims in the next year they will then advance to “20% no claims bonus”. Once at “20% no claims bonus” if they make a claim they will return to “0% no claims bonus” otherwise they will advance “30% no claims bonus”. If the probability that the motorist makes a claim in any year is always 0.2, find the probability that after two years they have: (i) “30% no claims” (ii) “20% no claims bonus” (iii) “0% no claims bonus”.

P5.2 65% of the policyholders of a general insurance company are male. Of these male policyholders, 85% are aged over 55. The percentage of female policyholders that are aged over 55 is 68%. Calculate the probability that a randomly selected policyholder is: (i) male and aged under 55 (ii) aged over 55.

P5.3 The probability of a student passing his Subject CT3 exam first time is 50%. If he fails his first attempt, the probability that he passes on his second attempt is 60%. What is the probability that: (i) he fails on both attempts (ii) he passes on either his first or second attempt?



P5.4 Two male and four female candidates are waiting in a room to be called for interview. Before lunch two candidates are called randomly one after the other. Calculate the probability that: (i) two candidates of the same sex are called before lunch (ii) at least one male candidate is called before lunch. Section 2: Conditional probabilities

P5.5 The probability that a tutorial application is received early by ActEd is 0.45. If an application is received early, the probability that the student gets their first choice tutorial is 0.8. If the application is received late, the probability that the student gets their first choice tutorial is 0.3. An application is randomly selected, calculate the probability that: (i) a student doesn’t receive their first choice tutorial (ii) a student who gets their first choice applied early.

P5.6 On the way to work I pass one set of traffic lights. The probability that these lights are

red is 35 . If the lights are red, the probability that I will be late to work is 3

8 otherwise

the probability that I will be late to work is 15 . What is the probability that the lights

were red if I arrive to work early?

P5.7 Subject C1, April 1994, Q1 (adapted) In a certain constituency, 30% of the voters are “blue collar” workers, of whom 46% voted Conservative at the last United Kingdom election. Of the remaining voters, 36% voted Conservative. Consider a voter selected at random from those who voted Conservative in this constituency. What is the probability that this voter is a “blue collar” worker? [2]



P5.8 Subject C1, April 1998, Q2 (adapted) Two students are selected at random, one after the other and without replacement, from a group of ten students of whom six are men and four are women. What is the probability that the first student is a man, given that the second selected is a man? [2]

P5.9 Subject 101, April 2003, Q3 The probability that a car accident is due to faulty brakes is 0.02, the probability that a car accident is correctly attributed to faulty brakes is 0.95, and the probability that a car accident is incorrectly attributed to faulty brakes is 0.01. Calculate the probability that a car accident, which is attributed to faulty brakes, was due to faulty brakes. [3]

P5.10 Subject 101, April 2002, Q1 Let t xp denote the probability that a life aged x survives for at least a further t years,

and consider three independent lives aged 40, 50 and 60 years such that

10 40 10 50 10 600.95, 0.85, 0.70p p p= = = .

(i) Determine the probability that exactly one of these three lives survives ten years. [2] (ii) Determine the probability that it is the youngest life that survives, given that

exactly one life survives ten years. [1] [Total 3]



Chapter 5 Summary Probability diagrams You may find it helpful to list outcomes or draw a probability tree when you have several probabilities in a question. For a train that is late on any day with probability 0.15, we get the following for any two days considered:

1st day 2nd day Probability L L 0.15 0.15 0.0225¥ = L N 0.15 0.85 0.1275¥ = N L 0.85 0.15 0.1275¥ = N N 0.85 0.85 0.7225¥ =

1

0.15

0.85

0.15

0.85

0.15

0.85

L

N

L

N

L

N


LL

LN

NN

NL

probabilities

0.15 0.15 = 0.0225

0.15 0.85 = 0.1275

0.85 0.15 = 0.1275

0.85 0.85 = 0.7225

We can solve ‘at least’ problems by using complementary events. Conditional Probability For any two events A and B, the probability of event A happening given that event B has already happened is:

( )( | )

( )

P A BP A B

P B= and




(i) Using W for ‘win’ and N for ‘not win’, the outcomes are:

1st spin 2nd spin Probability W W 0.3 0.3 0.09¥ = W N 0.3 0.7 0.21¥ = N W 0.7 0.3 0.21¥ = N N 0.7 0.7 0.49¥ =

1 (ii) (a) (no wins) ( ) 0.49P P NN= =

(b) (exactly 1 win) ( )

( ) ( )

0.21 0.21 0.42

P P WN NW

P WN P NW

=

= +

= + =

or

(c) (at least 1 win) ( ) ( ) ( )

0.09 0.21 0.21 0.51

P P WW P NW P WN= + +

= + + =

Solution 5.2

(i) Using T to represent ‘triple twenty’ and N to represent ‘not triple twenty’:

1st dart 2nd dart 3rd dart Probability

T T T 5 5 5 1258 8 8 512¥ ¥ =

T T N 5 5 3 758 8 8 512¥ ¥ =

T N T 5 3 5 758 8 8 512¥ ¥ =

N T T 3 5 5 758 8 8 512¥ ¥ =

T N N 5 3 3 458 8 8 512¥ ¥ =

N T N 3 5 3 458 8 8 512¥ ¥ =

N N T 3 3 5 458 8 8 512¥ ¥ =

N N N 3 3 3 278 8 8 512¥ ¥ =

1



(ii) (a)

75 75 75 225512 512 512 512

(exactly 2 triple twenty's) ( )

( ) ( ) ( )

P P TTN TNT NTT

P TTN P TNT P NTT

=

= + +

= + + =

or or

(b) Using the shortcut rule, we get:

27 485512 512

(at least 1 triple twenty) 1 (no triple twentys)

1 ( )

1

P P

P NNN

= -

= -

= - =

Solution 5.3

(i) The completed tree diagram is:

0.8

0.2

0.8

0.2

0.8

0.2

G

N

G

N

G

N

1st shot 2nd shot outcomes

GG

GN

NN

NG

probabilities

0.8 0.8 = 0.64

0.8 0.2 = 0.16

0.2 0.8 = 0.16

0.2 0.2 = 0.04

(ii) (a) (exactly 1 goal) ( )

( ) ( )

0.16 0.16 0.32

P P GN NG

P GN P NG

=

= +

= + =

or

(b) (at least 1 goal) 1 (no goals)

1 ( )

1 0.04 0.96

P P

P NN

= -

= -

= - =



Solution 5.4

(i) Using T and N to represent ‘involves a 3rd party’ and ‘does not involve a 3rd party’, respectively. Then using > and < to represent ‘more than £2,000’ and ‘less than £2,000’, respectively gives:

0.6

0.4

0.95

0.05

0.25

0.75

T

N

>

<

>

<

3rd party? exceeds £2,000? outcomes

T >

T <

N <

N >

probabilities

0.6 0.95 = 0.57

0.6 0.05 = 0.03

0.4 0.25 = 0.1

0.4 0.75 = 0.3

(ii) (a) ( ) 0.3P N < =

(b) (exceeds £2,000) ( )

( ) ( )

0.57 0.1 0.67

P P T N

P T P N

= > >

= > + >

= + =

or



Solution 5.5

Be careful! It’s tempting to give the probability as:

(die in 1st year) (die in 2nd year) 0.025 0.028 0.053P P+ = + =

But this ignores the fact that to for the man to in the 2nd year he must survive the 1st year! Using D to represent ‘dies’ and N for ‘does not die’ we get:

0.025

0.9750.028

0.972

D

N

D

N

1st year 2nd year outcomes

D

NN

ND

probabilities

0.025

0.975 0.028 = 0.0273

0.975 0.972 = 0.9477

So the probability that the man dies within the next 2 years is given by:

(die in 1st year) (die in 2nd year) ( ) ( )

0.025 0.0273

0.0523

P P P D P ND+ = +

= +

=



Solution 5.6

(i) Using P for ‘personal pension’ and S for ‘Section 32 pension’, the tree diagram is:

P

S

1st pension

10

15

5

15

9

14

5

14

P

S

2nd pension outcomes probabilities

PP

PS

10

15

9

14=

3

7

10

15

5

14= 5

21

P

S

SP

SS

5

15

10

14=

5

21

5

15

4

14=

2

21

10

14

4

14

(a) For two different types of pensions we require the PS and SP outcomes:

5 5 1021 21 21

(2 different pensions) ( or )

( ) ( )

P P PS SP

P PS P SP

=

= +

= + =

(b) Using the ‘at least’ shortcut, we get:

37

47

(at least 1 Section 32 pension) 1 (no Section 32 pensions)

1 ( )

1

P P

P PP

= -

= -

= -

=



(ii) (a) The probability tree is:

P

S

P

S

P

S

P

P

P

P

S

S

S

S

PPP

PPS

PSP

PSS

SPP

SPS

SSP

SSS

10

15

5

15

9

14

5

14

10

14

4

14

8

13

5

13

4

13

4

13

3

13

9

13

9

13

10

13

10

15

9

14

8

13= 24

91

10

15

9

14

5

13= 15

91

10

15

5

14

4

13=

20

273

10

15

5

14

9

13=

15

91

5

15

10

14

4

13= 20

273

5

15

4

14

3

13=

2

91

5

15

10

14

9

13= 15

91

5

15

4

14

10

13=

20

273

15 15 1591 91 91

4591

(2 personal pensions) ( )

( ) ( ) ( )

P P PPS PSP SPP

P PPS P PSP P SPP

=

= + +

= + +

=

or or

(b)

291

8991

(at least 1 personal pension) 1 (no personal pensions)

1 ( )

1

P P

P SSS

= -

= -

= -

=



Solution 5.7

Using D for ‘drinking’, S for ‘sober’, A for ‘accident’ and N for ‘no accident’ we get:

0.2

0.8

0.05

0.95

0.0001

0.9999

D

S

A

N

A

N

DA

DN

SN

SA

0.2 0.05 = 0.01

0.2 0.95 = 0.19

0.8 0.0001 = 0.00008

0.8 0.9999 = 0.79992

(i) We just require a ‘normal’ probability: (accident) ( or ) ( ) ( ) 0.01 0.00008 0.01008P P DA SA P DA P SA= = + = + =

(ii) We are told that an accident has occurred so we require:

(drinking accident) 0.01

(drinking | accident) 0.992(accident) 0.01008

PP

P= = =and



Solution 5.8

It can be easy to get confused about which way round the branches go in the tree diagram as the information is given back-to-front. The key is to notice that we need to know whether the patient has the cancer or not before we are able to calculate whether their test result is positive or not. Using C for ‘cancer’, H for ‘healthy’, P for ‘positive result’ and N for ‘negative result’ we get the following:

0.06

0.94

0.95

0.05

0.02

0.98

C

H

P

N

P

N

CP

CN

HN

HP

0.06 0.95 = 0.057

0.06 0.05 = 0.003

0.94 0.02 = 0.0188

0.94 0.98 = 0.9212

(i) (positive result) ( )

( ) ( )

0.057 0.0188 0.0758

P P CP HP

P CP P HP

=

= +

= + =

or

(ii) (wrong result) ( )

( ) ( )

0.003 0.0188 0.0218

P P CN HP

P CN P HP

=

= +

= + =

or

(iii) (cancer positive) 0.057

(cancer | positive) 0.752(positive) 0.0758

PP

P= = =and

A rather worrying result!




P5.1 The tree diagram is:

0.2

0.8

0.2

0.8

0.2

0.8

0%

20%

0%

20%

0%

30%

0.2 0.2 = 0.04

0.2 0.8 = 0.16

0.8 0.2 = 0.16

0.8 0.8 = 0.64

0%

Notice how we are just interested in the last state rather than the journey there! (i) (30%) 0.64P =

(ii) (20%) 0.16P =

(iii) (0%) 0.04 0.16 0.2P = + =

P5.2 Using M for ‘male’, F for ‘female’, O for ‘over 55’ and N for ‘not over 55’ we get:

0.65

0.35

0.85

0.15

0.68

0.32

M

F

O

N

O

N

MO

MN

FN

FO

0.65 0.85 = 0.5525

0.65 0.15 = 0.0975

0.35 0.68 = 0.238

0.35 0.32 = 0.112

(i) (male under 55) ( ) 0.0975P P MN= =and

(ii) (over 55) ( ) ( ) ( ) 0.5525 0.238 0.7905P P MO FO P MO P FO= = + = + =or



P5.3 Since the student will not want to take the exam again if he’s passed it (unless he’s a masochist) we will have an ‘incomplete’ tree diagram. Using P for ‘pass’ and F for ‘fail’ we get:

0.5

0.50.6

0.4

P

F

P

F

P

FF

FP

0.5

0.5 0.6 = 0.3

0.5 0.4 = 0.2

(i) (fail both times) ( ) 0.2P P FF= =

(ii) (pass on 1st 2nd time) ( ) ( ) ( ) 0.5 0.3 0.8P P P FP P P P FP= = + = + =or or

Alternatively, we could just calculate 1 ( ) 1 0.2 0.8P FF- = - = .

P5.4 Using M for ‘male’ and F for ‘female’, we get:

M

F

2

6

4

6

1

5

4

5

M

F

MM

MF

2

6

1

5=

1

15

2

6

4

5= 4

15

M

F

FM

FF

4

6

2

5= 4

15

4

6

3

5=

2

5

2

5

3

5

(i) 71 215 5 15(same sex) ( ) ( ) ( )P P MM FF P MM P FF= = + = + =or

(ii) 325 5(at least 1 male) 1 (no males) 1 ( ) 1P P P FF= - = - = - =



P5.5 Using E for ‘early application’, L for ‘late application’, F for ‘get first tutorial’ and N for ‘does not get first tutorial’ we get:

0.45

0.55

0.8

0.2

0.3

0.7

E

L

F

N

F

N

EF

EN

LN

LF

0.45 0.8 = 0.36

0.45 0.2 = 0.09

0.55 0.3 = 0.165

0.55 0.7 = 0.385

(i) (not get 1st choice) ( )

( ) ( )

0.09 0.385 0.475

P P EN LN

P EN P LN

=

= +

= + =

or

(ii) We are told that the student gets their first choice – so we have a conditional

probability:

(applied early gets 1st choice)

(applied early | gets 1st choice)(gets 1st choice)

PP

P= and

So we require:

(gets 1st choice) ( ) ( ) ( )

0.36 0.165 0.525

P P EF LF P EF P LF= = +

= + =

or

or we could have used our result from part (i) and:

(gets 1st choice) 1 (not get 1st choice) 1 0.475 0.525P P= - = - =

Hence:

0.36

(applied early | gets 1st choice) 0.6860.525

P = =



P5.6 Using R for ‘red’, N for ‘not red’, E for ‘early’ and L for ‘late’ we get:

R

N

3

5

2

5

5

8

3

8

E

L

RE

RL

3

5

5

8=

3

8

3

5

3

8= 9

40

E

L

NE

NL

2

5

4

5= 8

25

2

5

1

5=

2

25

4

5

1

5

We are told that we arrive to work early so we have a conditional probability:

(red arrive early)

(red | arrive early)(arrive early)

PP

P= and

So we require:

3 8 1398 25 200(arrive early) ( ) ( ) ( )P P RE NE P RE P NE= = + = + =or

Hence:

3 8 75139 200 139(red | arrive early)P = =



P5.7 Using B for ‘blue collar’, N for ‘not blue collar’, C for ‘vote Conservative’ and V for ‘vote something else’ we get:

0.3

0.7

0.46

0.54

0.36

0.64

B

N

C

V

C

V

BC

BV

NV

NC

0.3 0.46 = 0.138

0.3 0.54 = 0.162

0.7 0.36 = 0.252

0.7 0.64 = 0.448

We want:

(blue collar vote conservative)

(blue collar | vote conservative)(vote conservative)

PP

P= and

So we need to calculate:

(vote conservative) ( )

( ) ( )

0.138 0.252 0.39

P P BC NC

P BC P NC

=

= +

= + =

or

Hence:

0.138

(blue collar | vote conservative) 0.3540.39

P = =



P5.8 Using M for ‘male’ and F for ‘female’ we get:

M

F

6

10

4

10

5

9

4

9

M

F

MM

MF

6

10

5

9=

1

3

6

10

4

9= 4

15

M

F

FM

FF

4

10

6

9= 4

15

4

10

3

9=

2

15

6

9

3

9

We want:

( )

(1st student male | 2nd student male)(2nd student male)

P MMP

P=

So we need to calculate:

31 43 15 5(2nd student male) ( ) ( ) ( )P P MM FM P MM P FM= = + = + =or

Hence:

1 3 53 5 9(1st student male | 2nd student male)P = =



P5.9 Using B to denote “due to faulty brakes”, F to denote “attributed to faulty brakes” and N to denote “not faulty brakes”, we get:

0.02

0.98

0.95

0.05

0.01

0.99

B

N

F

N

F

N

BF

BN

NN

NF

0.02 0.95 = 0.019

0.02 0.05 = 0.001

0.98 0.01 = 0.0098

0.98 0.99 = 0.9702

accident due to... accident attributed to...

We require:

(accident due to faulty brakes | attributed to faulty brakes)

(accident due to faulty brakes attributed to faulty brakes)

(attributed to faulty brakes)

P

P

P= and

So we require:

(attributed to faulty brakes) ( )

( ) ( )

0.019 0.0098 0.0288

P P BF NF

P BF P NF

=

= +

= + =

or

Hence:

0.019

(accident due to faulty brakes | attributed to faulty brakes) 0.6600.0288

P = =



P5.10 (i) We can draw a 3 part tree diagram for the three lives but it’s messy. It’s easier to just consider the options we need (we’ll look at this in greater depth in the next chapter).

Using L for ‘live’ and D for ‘die’ and considering the 40, 50 and 60 year olds in order, the options for one life surviving is:

the 40-year old surviving and the other two not surviving LDD

or the 50-year old surviving and the other two not surviving DLD

or the 60-year old surviving and the other two not surviving DDL

So:

(exactly 1 life survives) ( )

( ) ( ) ( )

(0.95 0.15 0.3) (0.05 0.85 0.3)

(0.05 0.15 0.7)

0.04275 0.01275 0.00525

0.06075

P P LDD DLD DDL

P LDD P DLD P DDL

=

= + +

= ¥ ¥ + ¥ ¥

+ ¥ ¥

= + +

=

or or

(ii) The youngest life is the 40 year old, so we want:

( )(40 year old survives | exactly 1 life survives)

(exactly 1 life survives)

0.04275

0.06075

0.704

P LDDP

P=

=

=

Stats Pack-06: Permutations and combinations Page 1


Chapter 6

Permutations and combinations

Links to CT3: The mathematical skills that are assumed knowledge for the CT Series Subjects are detailed in the Student Handbook. This chapter covers the following aspect of that assumed knowledge: permutations and combinations

0 Introduction

For certain questions a list or a probability tree might be too cumbersome. In these circumstances we can use combinations to quickly identify the number of relevant results. Combinations are also used in the binomial, hypergeometric and negative binomial distributions. An introduction to the first two of these distributions is included in this chapter. The binomial and negative binomial distributions will be met in more detail in Chapter 9.

Stats Pack-06: Permutations and combinations


1 Simple Choosing

Consider two spinners:

A B1 2

3

If we spin each of them once we can get the following results:

1st spinner 2nd spinner A 1 A 2 A 3 B 1 B 2 B 3

There are 2 possible choices on the first spinner and each of these can go with the 3 possible choices on the second spinner. Hence, there are: 2 3 6¥ = results

Question 6.1

How many different results can we get with one spin of each of these:

A B

C

21

3 4



We can extend this to more than 2 spinners:

A B1 2

3

For each of the 6 results we got previously we can get an a or a b on the end. Hence

there are: 2 3 2 12¥ ¥ = results All we are doing is multiplying together the number of choices for each option.

Question 6.2

At a restaurant the menu has:

2 starters (soup or prawn cocktail)

4 main courses (beef, veal, lamb or nut risotto)

3 desserts (chocolate cake, fruit pie or ice-cream) How many different meals can be ordered?



2 Permutations

2.1 Permutations of all objects

We will now extend our method to where choosing an option reduces the number of choices left. Consider a race with 3 runners: Alfie, Belinda and Charlie. We wish to calculate the number of ways the 1st, 2nd and 3rd prizes can be awarded. We could list all of the different arrangements:

1st 2nd 3rd A B C A C B B A C B C A C A B C B A

However, it is quicker to use the ‘multiplying method’. Considering how many possibilities there are for each position in turn: There are 3 choices for the first position (Alfie, Belinda or Charlie). Suppose Belinda is in first place, there are now only two people left. So there are 2 choices left for the second position (Alfie and Charlie). Suppose Charlie is in second place, there is now only one person left. Hence there is only one choice left for the third position (Alfie). Hence the total number of arrangements is: 3 2 1 6¥ ¥ =

Question 6.3

Four people (Alfie, Belinda, Charlie and Delilah) run a race. How many different ways are there for the four of them to cross the finish line?

Each different arrangement (or order) is called a permutation. For 3 people there were 3 2 1¥ ¥ permutations. We use the notation 3! (pronounced 3 factorial) to represent this calculation. Similarly for 4 people there were 4 3 2 1¥ ¥ ¥ permutations, so we would write this as 4!



Definition

!n (pronounced n factorial) is defined by: ! ( 1) 3 2 1n n n= ¥ - ¥ ¥ ¥ ¥

In general, for n objects there are !n different permutations. You should find a factorial button on your calculator.

Question 6.4

For a phone call from a telephone box I have the following coins in my pocket: £2 £1 50p 20p 10p How many different orders are there of putting the coins into the slot if I use them all during my phone call?

2.2 Permutations of some of the objects

We now look at cases where we are only interested in the order of some of the objects or people we are looking at. Consider the following example: A race has 8 competitors. There are gold, silver and bronze medals for the first three places. How many different ways are there of allocating the medals to the competitors? The gold medal can go to any of the eight competitors. However once the gold is awarded, the silver medal can only go to one of the remaining seven competitors. Finally the bronze can then only be awarded to one of the six remaining competitors. Hence, the total number of ways the medals can be awarded is:

8 7 6 336¥ ¥ =

Question 6.5

In another race there are 10 competitors and there are prizes given for the first four places. How many different ways are there of allocating the prizes amongst the competitors?



Now we can’t write 8 7 6¥ ¥ as 8! since we are missing the 5 4 3 2 1¥ ¥ ¥ ¥ . However, we can write it as:

8 7 6 5 4 3 2 1 8!

8 7 65 4 3 2 1 5!

¥ ¥ ¥ ¥ ¥ ¥ ¥¥ ¥ = =¥ ¥ ¥ ¥

Note where the 8 and the 5 come from in our formula – we have 8 competitors altogether, we choose 3 and 8 3 5- = .

Question 6.6

Write your result for Question 6.5 in terms of factorials.

So in general if we have n objects of which we find arrangements (permutations) for just r of them, we get:

!

( )!

n

n r-

We use the notation nrP to stand for the number of permutations. You will find this

button on your calculator.

Permutations The number of permutations (arrangements) of r objects chosen from n objects altogether is:

!

( )!n

rn

Pn r

=-

Question 6.7

Calculate:

(i) 95P

(ii) 60P



Question 6.8

Thirty people buy one raffle ticket each to win one of four prizes. The prizes are a car, a holiday, a bicycle and a hamper. In how many different ways can the prizes be allocated?

Earlier we found that the number of permutations of all n objects was !n However, using our formula with r n= we get:

!

0!n

nn

P =

This means that we must define 0! to be:

Definition 0! 1=

This is just a definition so that our general formula for nrP works for all values of r.

There is no reason why this definition should make sense! Before we go onto the next section, do note that when we look at the different permutations (arrangements) the order is important. For example in the race, Alfie 1st, Belinda 2nd and Charlie 3rd is different to Alfie 1st, Charlie 2nd and Belinda 3rd.



3 Combinations

So far we have considered problems where the order is important. We now look at the number of ways of picking (or selecting) different combinations of where the order that they get selected in doesn’t matter. Hannah has 2 spare tickets to see Munchester Utd play. She has 5 friends to choose from: Alicia, Belinda, Charli, Delilah and Emily. How many different ways of picking 2 of her friends are there? Here the order that she picks the friends does not matter, since choosing Alicia and Belinda is just the same as picking Belinda and Alicia since the same two friends go. From first principles the combinations are: AB AC AD AE BC BD BE CD CE DE 10 ways Now if we had considered the order (ie the permutations) there would have been:

52 5 4 20P = ¥ = ways

However, we have only 10 ways so we have divided by 2. Why? Because for each of the 2 friends chosen there are 2 ways that are the same, eg AB and BA.

Question 6.9

Hannah now has 3 spare tickets (after finding one down the side of the sofa). (i) List all the different combinations of her friends that could go with her. (ii) How can we get this figure from the number of permutations?

Each different selection of friends is called a combination.



When we had 5 friends and were choosing 2, we took the number of permutations, 52P ,

and divided it by 2 2!= which was the number of different orders of the 2 friends chosen. In Question 6.9 we had 5 friends and were choosing 3, we took the number of

permutations, 53P and divided it by 6 3!= which was the number of different orders of

the 3 friends chosen. In general, if we have n objects from which we are choosing r of them (when the order does not matter), we have:

!

! ( )! !

nrP n

r n r r=

- combinations

We use the notation nrC to stand for the number of combinations. You will find this

button on your calculator. The Core Reading for Subject CT3 uses the alternative

notation of n

r

Ê ˆÁ ˜Ë ¯

.

Combinations The number of ways of choosing r objects from n when order doesn’t matter (ie the number of combinations) is:

!

( )! !n

rn n

Cr n r r

Ê ˆ= =Á ˜ -Ë ¯

Question 6.10

A football team manager has a squad of 20 players and wishes to pick a team of 11 for Saturday’s game. How many different teams could she pick?

Question 6.11

In the National Lottery the machine chooses 6 winnings balls from 49. (i) How many different combinations are there of choosing the 6 winning balls? (ii) What is the probability of winning the jackpot (ie choosing all 6 balls)?



We now look at problems where we are choosing from more than one group. A committee has to choose 3 women and 2 men from a group of 5 women and 8 men. How many different committees can be chosen?

There are 53 10C = ways of picking the 3 women from the 5 women.

There are 82 28C = ways of picking the 2 men from the 8 men.

Hence there are 5 83 2 10 28 280C C¥ = ¥ = ways of picking this committee altogether.

Question 6.12

A student research group of 6 people must contain equal numbers of each sex. If it is to be chosen from a group of 10 male and 8 female students, how many different ways of picking this group are there?

Question 6.13

In the National Lottery the machine chooses 6 winnings balls from 49. Choosing 3 of the winning balls wins the £10 prize. (i) How many ways are there of choosing 3 correct balls?

Hint: Be careful it’s not just 63C .

(ii) What is the probability of winning the £10 prize?

Hint: You will need to use your answer from Question 6.11 (i).



4 Using combinations to calculate probabilities

We now look at how we can use combinations to calculate probabilities that would be cumbersome to work out using lists or probability trees.

4.1 Combinations of independent events

For example, on the way to work I pass through 4 sets of traffic lights. The probability that each set is green when I reach them is 0.6. What is the probability that exactly 2 sets of traffic lights are green? Using G to stand for “green light” and N to stand for “not green light”, the probability tree for this question is:

0.6

0.4

0.6

0.4

0.6

0.4

0.6

0.4

0.6

0.6

0.6

0.4

0.6

0.40.6

0.40.6

0.4

0.6

0.4

0.6

0.4

0.6

0.40.6

0.40.6

0.4

0.4

0.4

G

N

G

N

G

N

G

G

G

G

N

N

N

N

G

G

G

G

G

G

G

G

N

N

N

N

N

N

N

N

GGGG

GGNG

GNGG

GNNG

NGGG

NGNG

NNGG

NNNG

GGGN

GGNN

GNGN

GNNN

NGGN

NGNN

NNGN

NNNN

The options I require are: GGNN 0.6 0.6 0.4 0.4 0.0576¥ ¥ ¥ = GNGN 0.6 0.4 0.6 0.4 0.0576¥ ¥ ¥ = GNNG 0.6 0.4 0.4 0.6 0.0576¥ ¥ ¥ = NGGN 0.4 0.6 0.6 0.4 0.0576¥ ¥ ¥ = NGNG 0.4 0.6 0.4 0.6 0.0576¥ ¥ ¥ = NNGG 0.4 0.4 0.6 0.6 0.0576¥ ¥ ¥ = Hence the total probability is 6 0.0576 0.3456¥ = .



A much quicker way of calculating this is to first note that each option has the same probability since each has 2 green and 2 not green lights:

2 20.6 0.4 0.0576¥ = Then we need to calculate how many different ways there are of choosing 2 green sets from the 4 sets of traffic lights. This is:

42 6C =

Hence the probability is given by:

4 2 22 0.6 0.4 6 0.0576 0.3456C ¥ ¥ = ¥ =

Question 6.14

On the way to work I pass through 10 sets of traffic lights. The probability that each set is green when I reach them is 0.7. What is the probability that exactly 4 sets of traffic lights are green on my journey to work?

In general, if we have n ‘trials’, each of which has a probability of success of p, then the probability of x successes is:

(1 )n x n xxC p p -¥ ¥ -

This kind of calculation comes from the binomial distribution, which is covered in Chapter 8. This formula is given in the Tables on page 6.

Question 6.15

In a portfolio of 30 life assurance policies, the probability that any one of them leads to a claim in the next year is 0.02. Calculate the probability that: (i) there is exactly 1 claim in the next year (ii) there are at least 2 claims in the next year.

number of ways of choosing x successes

from n ‘trials’

probability of x successes

probability of n x- failures



4.2 Combinations of dependent events

We now look at questions which involve dependent events. For example, a box of chocolates contains 6 milk and 4 plain chocolates. I eat 3 chocolates. What is the probability that 2 of them are milk chocolates? The probability tree for this question is:

M

P

M

P

M

P

M

M

M

M

P

P

P

P

MMM

MMP

MPM

MPP

PMM

PMP

PPM

PPP

6

10

4

10

5

9

4

9

6

9

3

9

4

8

4

8

3

8

3

8

2

8

5

8

5

8

6

8

The probabilities are changing between each set of branches as each time I eat a chocolate there are less left! The options I require are:

MMP 6 5 4 110 9 8 6¥ ¥ =

MPM 6 54 110 9 8 6¥ ¥ =

PMM 6 54 110 9 8 6¥ ¥ =

Hence the total probability is 1 16 23¥ = .

Again, we note that each option has the same probability since each has 2 milk and 1 plain chocolate. So we work out the probability of just one of these, say MMP. Then all we need to calculate is how many different ways there are of choosing 2 milk chocolates from the 3 chocolates eaten. This is:

32 3C =



So the method is to calculate the number of combinations (just like we did for dependent events) then work out the probability for just one of the combinations and multiply these together.

Question 6.16

Another box of chocolates contains 15 milk and 5 plain chocolates. I eat 4 chocolates. What is the probability that 3 of them are plain chocolates?

Question 6.17

I own 9 ordinary shares and 6 preference shares. (i) If I pick 3 shares at random, what is the probability that I pick 2 ordinary shares? (ii) If I pick 5 shares at random, what is the probability that I pick 3 preference

shares?

This kind of calculation comes from the hypergeometric distribution, which is covered in Chapter 4 of Subject CT3. The general formula is rather messy and is not included here.



Extra Practice Questions Section 1: Simple Choosing

P6.1 A drinks machine contains:

coffee tea water cola orange soup (i) How many different ways are there of two people each ordering a drink? (ii) What is the probability that they both choose the same? Section 2: Permutations

P6.2 A competition on a Fruity Malt Loaf packet involves ranking 10 features of a car from the most to the least important. If your ranking agrees with the judges you win a car!

economical reliable central locking power steering

5 speed gearbox electric windows air bag boot space

value for money alarm (i) How many different ways are there of ranking these features? (ii) What is the probability of winning this competition?

P6.3 Ten employees (Alan, Bob and eight others) apply for 2 jobs (assistant manager, floor supervisor). The jobs are allocated randomly between two of the employees. (i) How many different ways of allocating the jobs are there? (ii) What is the probability that Bob is the manager and Alan is the floor supervisor? (iii) What is the probability that Alan is the manager?



Section 3: Combinations

P6.4 A committee of fourteen people needs to select a sub-committee of three people to represent them. In how many different ways can the three people be selected?

P6.5 A football manager has a squad of 20 players which includes 2 goalkeepers, 5 defenders, 6 midfielders and 7 strikers. She wishes to pick a team of 11 players for Saturday’s game consisting of 1 goalkeeper, 3 defenders, 4 midfielders and 3 strikers. How many different teams could she pick? Section 4: Using combinations to calculate probabilities

P6.6 The probability that a component produced by a factory is faulty is 0.01. In a box of 500 components, what is the probability that exactly 3 are faulty?

P6.7 In a chess match Alicia plays 5 games against Baldeep and 5 games against Charlotte.

The probability that Alicia wins a game against Baldeep is 35 and the probability that

she wins a game against Charlotte is 23 . What is the probability that Alicia wins exactly

2 of the games against Baldeep and 3 of the games against Charlotte?

P6.8 A pack of 30 jelly babies contains 6 blackcurrant jelly babies. If I eat 5 of the jelly babies, what is the probability that 2 are blackcurrant

P6.9 Subject C1, April 1996, Q2 (adapted) In a simple lottery the organiser chooses three numbers at random without replacement† from the numbers 1 to 5. Players also choose three numbers without replacement from the numbers 1 to 5. What is the probability that a player matches two or three of the organiser’s numbers? [3] † ‘without replacement’ means that the number picked is not then included in the choices left (ie it is not then replaced back with the

other numbers).



P6.10 Subject C1, September 1998, Q2 (adapted) The probability of suffering a side effect from a certain flu vaccine is 0.005. If 1,000 persons are inoculated, calculate the probability that at most 1 person suffers a side effect. [2]

P6.11 Subject C1, September 1999, Q2 (adapted) Five students are selected at random, one after the other and without replacement, from a group of twenty students of whom twelve are men and eight are women. What is the probability that the fifth student selected is a man? [2]



Chapter 6 Summary Permutations The number of ways of picking r objects from n when order matters (ie the number of permutations) is:

!

( )!n

rn

Pn r

=-

where 0! 1= . Combinations The number of ways of picking r objects from n when order doesn’t matter (ie the number of combinations) is:

!

( )! !n

rn n

Cr n r r

Ê ˆ= =Á ˜ -Ë ¯




There are 3 possible choices on the first spinner and each of these can go with the 4 possible choices on the second spinner. Hence there are: 3 4 12¥ = results Solution 6.2

Multiplying the choices together for the starters, main courses and desserts, we get: 2 4 3 24¥ ¥ = different meals Solution 6.3

There are 4 possible choices for first place. Once this person is chosen, say Belinda, then there are only 3 possible choices for the second place (Alfie, Charlie and Delilah). Suppose Delilah is in second place then there are only 2 possible choices for third place (Alfie and Charlie). Finally, suppose Alfie is in third place then there is only 1 possible choice left for fourth place (Charlie). Hence the total number of different ways for them to cross the finish line is: 4 3 2 1 24¥ ¥ ¥ = Solution 6.4

There are five coins so there are: 5! 5 4 3 2 1 120= ¥ ¥ ¥ ¥ = different ways This is because there are 5 choices for the 1st coin, then 4 choices for the 2nd coin and so on.



Solution 6.5

There are 10 choices for the prize for 1st place. Once that prize has been awarded there are only 9 possible choices left for the 2nd place prize. Then once that prize is awarded there are only 8 possible choices left for the 3rd place prize. Finally after this has been awarded there will only be 7 possible choices left for the 4th place prize. Hence, the number of different ways the prizes can be awarded is: 10 9 8 7 5,040¥ ¥ ¥ =

Solution 6.6

We can write it as:

10 9 8 7 6 5 4 3 2 1 10!

10 9 8 76 5 4 3 2 1 6!

¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥¥ ¥ ¥ = =¥ ¥ ¥ ¥ ¥

Again, note where the 10 and the 6 come from in our formula – we have 10 competitors altogether, we choose 4 and 10 4 6- = . Solution 6.7

Using the formula !

( )!n

rn

Pn r

=-

we get:

(i) 95

9! 9!15,120

(9 5)! 4!P = = =

-

(ii) 60

6! 6!1

(6 0)! 6!P = = =

-

It is fine to just use the nrP button on your calculator, and this is what you would be

expected to do in an exam.



Solution 6.8

There are 30 people altogether, so 30n = , and we are looking at the orders of 4 of

them, so 4r = . Hence there will be 304P different ways that the prizes can be

allocated.

304 657,720P = ways

Solution 6.9

(i) From first principles the combinations are: ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE 10 ways (ii) Now if we had considered the order (ie the permutations) there would have been:

53 5 4 3 60P = ¥ ¥ = ways

However, we have only 10 ways so we have divided by 6. Why? Because for each of the 3 friends chosen there are 6 ways that are the same, eg ABC, ACB, BAC, BCA, CAB and CBA.

Solution 6.10

There are 20 players altogether from which we are choosing 11, so there are:

2011

20!167,960

(20 11)!11!C = =

- ways

It is totally acceptable to use the nrC button on your calculator, and this is what you

would be expected to do in an exam.



Solution 6.11

(i) There are 49 balls altogether from which we are choosing 6 of them. So there are:

496 13,983,816C = ways of choosing the winning balls

(ii) Hence, the probability is:

1

13,983,816

Which is smaller than the probability of getting run over by a car! Solution 6.12

We require 3 men from the 10 available and 3 women from the 8 available.

There are 103 120C = ways of picking the 3 men from the 10 men.

There are 83 56C = ways of picking the 3 women from the 8 women.

Hence there are:

10 83 3 120 56 6,720C C¥ = ¥ = ways of picking this group altogether



Solution 6.13

(i) We need to choose 3 of the 6 winning balls and 3 of the remaining 43 balls. Hence there are:

6 433 3 20 12,341 246,820C C¥ = ¥ = ways of choosing 3 winning balls

(ii) Now from Question 6.11 we had:

496 13,983,816C = ways of choosing the 6 winning balls

Therefore the probability of winning the £10 prize (ie picking 3 correct and 3 incorrect balls) is:

246,820

1.765%13,983,816

=

This is about 2% so you’d expect to win about once a year if you enter one draw per week.

Solution 6.14

The number of ways of choosing 4 green lights from 10 traffic lights altogether is:

104 210C =

The probability of each of the 4 green and 6 not green light options is:

4 60.7 0.3 0.000175 (3 SF)¥ =

So the total probability is given by:

10 4 64 0.7 0.3 210 0.000175 0.0368 (3 SF)C ¥ ¥ = ¥ =



Solution 6.15

(i) Since each policy can make at most one claim (as you can’t die more than once!) we require the number of ways of choosing exactly 1 policy from 30 policies altogether.

For one policy having a claim (and 29 policies having no claims) the probability is given by:

30 291 0.02 0.98 0.334 (3 SF)C ¥ ¥ =

(ii) We require the number of ways of choosing at least 2 policies from 30 policies

altogether. First we note that:

(at least 2) 1 (less than 2) 1 (0) (1)P P P P= - = - -

For 0 policies having claims (and 30 policies having no claims) the probability is given by:

30 0 300(0) 0.02 0.98 0.545 (3 SF)P C= ¥ ¥ =

From part (i) we know that (1) 0.334 (3 SF)P = .

Hence the probability of at least two claims is given by:

1 0.545 0.334 0.121 (3 SF)- - =



Solution 6.16

Calculating the problem from first principles: The options we require are:

PPPM 5 3 15 5420 19 18 17 646¥ ¥ ¥ =

PPMP 5 15 3 5420 19 18 17 646¥ ¥ ¥ =

PMPP 5 15 3 5420 19 18 17 646¥ ¥ ¥ =

MPPP 15 5 3 5420 19 18 17 646¥ ¥ ¥ =

Hence the total probability is 5 10646 3234 0.0310 (3 SF)¥ = = .

Using the shortcut method: Each option will have 3 plain chocolates and 1 milk chocolate. The probability of each one of these will be the same as:

PPPM 5 3 15 5420 19 18 17 646¥ ¥ ¥ =

The number of ways of choosing 3 plain chocolates from the 4 chocolates eaten is:

43 4C =


From now on in the solutions we shall just use the shortcut method.



Solution 6.17

(i) Using the shortcut method: Each option will have 2 ordinary shares and 1 preference share. The probability of each one of these will be the same as:

OOP 9 8 6 7215 14 13 455¥ ¥ =

The number of ways of choosing 2 ordinary shares from the 3 shares picked is:

32 3C =


(ii) Using the shortcut method: Each option will have 3 preference shares and 2 ordinary shares. The probability of each one of these will be the same as:

PPPOO 6 5 9 84 2415 14 13 12 11 1,001¥ ¥ ¥ ¥ =

The number of ways of choosing 3 preference shares from the 5 shares picked:

53 10C =

Hence the total probability is 240241,001 1,00110 0.240 (3 SF)¥ = = .



Solutions to the extra practice questions

P6.1 (i) There are 6 possible choices for the first person and 6 possible choices for the second person. Hence there are:

6 6 36¥ = different ways

(ii) There are 6 ways that both people could choose the same (both coffee, both tea,

etc). Hence the probability is:

6 136 6=

P6.2 (i) There are 10 features so there are: 10! 10 9 8 7 6 5 4 3 2 1 3,628,800= ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ = different ways

This is because there are 10 choices for the 1st feature, then 9 choices for the 2nd feature and so on.

(ii) Since the judges choose just one way, the probability of winning is:

13,628,800

P6.3 (i) There are 10 possible choices for the first job and once this job has been filled there are then only 9 possible choices for the second job. Hence there are:

10

2 10 9 90P = ¥ = ways

(ii) There is only one way that Bob is the manager and Alan is the supervisor

(ie BA). Hence the probability is:

190

(iii) We could have Alan as the manager and any of the remaining 9 employees for

the supervisor job. Hence there are 9 ways that this could happen, so the probability is:

9 190 10=



P6.4 There are 14 people from which we are choosing 3, so there is:

143 364C = ways

Note that we are just considering the people and not the order that they come in – hence we are considering combinations.

P6.5 We require 1 goalkeeper from 2, 3 defenders from 5, 4 midfielders from 6 and 3 strikers from 7 altogether.

There are 21 2C = ways of picking the 1 goalkeeper from the 2 on the squad.

There are 53 10C = ways of picking the 3 defenders from the 5 on the squad.

There are 64 15C = ways of picking the 4 midfielders from the 6 on the squad.

There are 73 35C = ways of picking the 3 strikers from the 7 on the squad.

The total number of teams that can be picked is given by:

2 5 6 71 3 4 3 2 10 15 35 10,500C C C C¥ ¥ ¥ = ¥ ¥ ¥ =

P6.6 The number of ways of choosing 3 components from 500 components altogether is:

5003 20,708,500C =

The probability of each of the 3 faulty and 497 working options:

3 497 90.01 0.99 6.77 10 (3 SF)-¥ = ¥

So the total probability is given by:

500 3 4973 0.01 0.99 0.140 (3 SF)C ¥ ¥ =



P6.7 Considering Alicia’s games against Baldeep first. The probability that she wins 2 games out of the 5 games is:

( ) ( )2 35 3 22 5 5 0.2304C ¥ ¥ =

Now considering Alicia’s games against Charlotte. The probability that she wins 3 games out of the 5 games is:

( ) ( )3 25 2 13 3 3 0.329 (3 SF)C ¥ ¥ =

Hence the probability of both of these events happening is: 0.2304 0.329 0.0759 (3 SF)¥ =

P6.8 We require 2 blackcurrant from 5 jelly babies eaten. Using the shortcut method: Each option will have 2 blackcurrant and 3 non-blackcurrant jelly babies. The probability of each one of these will be the same as:

BBNNN 6 5 2324 2230 29 28 27 26 0.0213 (3 SF)¥ ¥ ¥ ¥ =

The number of ways of choosing 2 blackcurrant from the 5 jelly babies eaten is:

52 10C =

Hence the total probability is 10 0.0213 0.213 (3 SF)¥ = .



P6.9 The number of ways the lottery organiser choosing 3 balls from 5 is:

53 10C =

To match 3 numbers we must choose all 3 of the 3 correct numbers. The number of ways of doing this is:

33 1C =

To match 2 numbers we must choose 2 from the 3 correct numbers and 1 from the 2 incorrect numbers. The number of ways of doing this is:

3 22 1 3 2 6C C¥ = ¥ =

So there are 7 ways of matching 2 or 3 of the organiser’s numbers. Hence the probability is:

710

P6.10 To find the probability that at most 1 person suffers a side effect, we note that: (at most 1) (1 or less) (0) (1)P P P P= = +

For 1 person suffering a side effect (and 999 not suffering a side effect) the probability is given by:

1,000 1 9991 0.005 0.995 0.0334 (3 SF)C ¥ ¥ =

For 0 people suffering a side effect (and 1,000 not suffering a side effect) the probability is given by:

1000 1,0000 0.995 0.00665 (3 SF)C ¥ =

Therefore the total probability for at most 1 side effect is: 0.0334 0.00665 0.0401 (3 SF)+ =



P6.11 This question involves many of the branches from a probability tree. We will have to work through each of the possibilities in turn. The 5th person selected must be a man, so we are going to look at the different combinations for the 4 people chosen before that man. Then we will multiply the number of combinations by the probability of one of these options (just like we did in Section 4.2). 4 females out of the 4 places before the man (ie FFFFM)

4 8 7 6 5 7124 20 19 18 17 16 646C ¥ ¥ ¥ ¥ ¥ =

3 females out of the 4 places before the man (eg FFFMM)

4 8 7 6 15412 113 20 19 18 17 16 1,615C ¥ ¥ ¥ ¥ ¥ =

2 females out of the 4 places before the man (eg FFMMM)

4 8 7 10 7712 112 20 19 18 17 16 323C ¥ ¥ ¥ ¥ ¥ =

1 female out of the 4 places before the man (eg FMMMM)

4 8 10 9 6612 111 20 19 18 17 16 323C ¥ ¥ ¥ ¥ ¥ =

0 females out of the 4 places before the man (eg MMMMM)

4 10 9 8 3312 110 20 19 18 17 16 646C ¥ ¥ ¥ ¥ ¥ =

Hence the total probability is the sum of these:

7 154 77 66 33 3646 1,615 323 323 646 5+ + + + =

Stats Pack-07: Discrete random variables Page 1


Chapter 7

Discrete random variables

Links to CT3: Chapter 3 Syllabus objectives: (ii)1. Explain what is meant by a discrete random variable, define the distribution

function and the probability function of such a variable, and use these functions to calculate probabilities.

(ii)3. Define the expected value of a function of a random variable, the mean, the

variance, the standard deviation, the coefficient of skewness and the moments of a random variable, and calculate such quantities

0 Introduction

So far we have looked at samples and probabilities. In this chapter we will combine these concepts by looking at the first kind of random variables we can meet. In Chapters 2 and 3 we found the mean, spread and skewness of a sample that we had collected. However, actuaries deal with predicting what could happen in the future (eg the number of deaths or claims over the next year). In this case we don’t have a sample, as the event of interest hasn’t happened yet! What we do is model what we think will happen using probabilities (like those we met in Chapters 4 and 5). This theoretical model is called the population.

Stats Pack-07: Discrete random variables


For example, when rolling a die, instead of doing it 60 times and saying here’s our sample:

Number rolled on a die 1 2 3 4 5 6 Frequency 11 11 13 9 8 8

We now say that we would expect each number to come up 16 of the time:

Number rolled on a die 1 2 3 4 5 6

Probability 16 1

6 16 1

6 16 1

6

This is our theoretical model of the future die rolls (ie our model of the die roll population). So how are the sample and population related? For our 60 die rolls tabulated above, we would expect 10 of each number under the theoretical model. We can see that our sample results are fairly close to this. The more times we throw the die (and hence the bigger the sample) the closer we would

expect each number to appear about 16 of the time.

This graph shows the proportion of 4’s obtained when a die was thrown 1,000 times:

0

0.1

0.2

0.3

0.4

0 200 400 600 800 1000

number of die rolls

pro

por

tion

We can see that as the number of die rolls increases, the closer the proportion is to our

theoretical result of 16 .

theoretical proportion 16=



1 Random variables

You will have already met examples of a variable in your studies – it is simply a number that varies. For example, if we have 10x y+ = then x and y are variables –

their numerical values are not fixed and can vary (as long as their total is ten). A random variable is a variable that uses probabilities to decide its value. For example, if we roll a fair die we know that each of the values 1, 2, 3, 4, 5 or 6 occurs

with a probability of 16 .

1.1 Discrete random variables

Recall from Chapter 1 that discrete data could only take certain values. For example, the number of claims can only be whole numbers ( 0,1,2,3, ). We certainly can’t

have 2- claims or 3.8 claims! A discrete random variable is a random variable that can only take certain numerical values (ie discrete values). For example, if we roll a die we can only get the discrete values 1, 2, 3, 4, 5 or 6. Each

of these values occurs randomly with a probability of 16 .

Question 7.1 Subject C1, April 1994, Q2 (adapted)

Consider the following list of items: I the number of months in a year II the number of objectives in the Subject CT3 syllabus III the number of candidates who pass Subject CT3 at the next sitting IV the number of fountains in Trafalgar Square in London V the number of claims received by an insurance company tomorrow Which pair of items are discrete random variables?

We use a capital letter, eg X, to stand for the random variable and its lower case equivalent, eg x, to stand for a value that it takes.



1.2 Probability distributions

Now recall in Chapter 1 how we recorded data in a table:

Claims reported each day 0 1 2 3 4 5 Frequency 2 5 8 6 5 2

This table was called a frequency distribution as it shows the distribution of the frequencies between the data values (ie how the frequencies are shared out amongst the data values). We can now record our random variables in a similar way, but instead of using the frequencies we use the probabilities of each value occurring in the future:

Number rolled on a die 1 2 3 4 5 6

Probability 16 1

6 16 1

6 16 1

6

This table is called a probability distribution as it shows the distribution of the probabilities between the values (ie how the total probability is shared out amongst the values).

Question 7.2

Write down the probability distribution for the number of heads obtained when flipping two fair coins.

Also, in the same way that we drew simple bar charts for discrete data, we can now draw simple bar charts for our discrete random variables. Taking our die roll once more:

1

6

42 531 6x

Probability



2 Probability functions

2.1 Definition

A probability function, ( )P X x= , of a random variable X is the function that assigns

probabilities to each of the values that X can take. For example, when rolling a die our probability function is:

16( )P X x= = where 1,2,3,4,5 or 6x =

Question 7.3

Write down the probability function for the number of heads obtained when flipping two coins.

This probability function (often abbreviated to PF) can either be a statement of probabilities given as a probability distribution or an actual mathematical formula. For example:

( 1) 0.6

( 2) 0.4

P X

P X

= =

= =

or: 1

( ) (8 ) 2,4 or 612

P X x x x= = - =

Recall from Chapter 4 that probabilities are between 0 (impossible) and 1 (certain). This means that our probability function must give values between 0 and 1: 0 ( ) 1P X x£ = £ for all x

Recall also that the probabilities of all possible outcomes add up to 1 (since it is certain that at least one of them must occur). This means that the sum of our probabilities from the probability function is 1:

( ) 1x

P X x= =Â



For example, adding up all the probabilities for rolling a die:

1 1 1 1 1 16 6 6 6 6 6

( ) ( 1) ( 2) ( 3) ( 4) ( 5) ( 6)

1

x

P X x P X P X P X P X P X P X= = = + = + = + = + = + =

= + + + + +

=

Â

Since the total of all the probabilities is 1, this means that any ( )P X x= must be less

than or equal to 1. So if we were feeling lazy (which we are) we could simplify our first condition to: 0 ( )P X x£ =

These two conditions give us the formal definition:

Definition

( )P X x= is a probability function if:

(i) ( ) 0P X x= ≥

(ii) ( ) 1x

P X x= =Â

Question 7.4

Show that the following function is a probability function:

1

( ) (8 ) 2,4 or 612

P X x x x= = - =

ie show that ( ) 0P X x= ≥ and that ( ) 1P X x= =Â .

Technically, the probability function is defined for all values of x, with ( ) 0P X x= =

for the values that we can’t get. However, because mathematicians are lazy we tend to just assume this is the case without explicitly writing it down.



2.2 Solving probability function problems

We can use the probability function to solve problems where a value is missing. For example, suppose we have the following probability function for a random variable Y: ( ) 1, 2, 3 or 4P Y y ky y= = =

What is k? First, let’s write out the probability distribution of Y to make things a little clearer:

y 1 2 3 4 P(Y = y) k 2k 3k 4k

We can find the value of k by using the fact that the probabilities must sum to 1:

1

( ) 2 3 4 1 10 110

P Y y k k k k k k= = + + + = fi = fi =Â

Question 7.5

The probability function of a random variable X is given by:

2( ) 1, 2, 3 or 4P X x cx x= = =

Find the value of c.

Question 7.6

The number of claims in a year on a particular policy arise according to the following probability distribution:

Claims 0 1 2

Probability 58 q- 1

4 q- 18 2q+

Give the range of possible values that q could take.



2.3 Using probability functions to find probabilities

Taking our example from the previous page:

y 1 2 3 4

P(Y = y) 110 2

10 310 4

10

We can read off simple probabilities, such as ( 2)P Y = , from the probability

distribution. But what if we want to find ( 1 or 3)P Y = ?

Well since Y can only take one value at a time this means that the values that Y can take are mutually exclusive. Recall from Chapter 4 that for any two mutually exclusive events A and B:

( ) ( ) ( )P A B P A P B= +or

ie we simply add up the probabilities. Hence:

31 410 10 10( 1 or 3) ( 1) ( 3)P Y P Y P Y= = = + = = + =

Question 7.7

The probability function of a random variable X is given by:

1 3

( ) 0,1, 2, 3,4 4

x

P X x xÊ ˆ= = =Á ˜Ë ¯

Find: (i) ( 2)P X =

(ii) ( 2)P X £ .



3 The cumulative distribution function

3.1 The cumulative distribution

Recall how in Chapter 1 we found the cumulative frequencies by accumulating (ie adding up) the frequencies as we went through each of the data values.

Claims reported each day

Frequency Cumulative Frequency

0 2 2 1 5 7 2 8 15 3 6 21 4 5 26 5 2 28

In the same way we can add up the probabilities (from the probability function) to obtain the cumulative probabilities:

Die roll Probability Cumulative probability

1 16 1

6

2 16 2

6

3 16 3

6

4 16 4

6

5 16 5

6

6 16 6

6 1=

What do each of the cumulative probabilities represent? Well the 16 is the probability

of rolling a 1, the 26 is the probability of rolling a 1 or a 2 (ie up to and including 2), the

36 is the probability of rolling a 1, 2 or a 3 (ie up to and including 3) and so on.

2 5 7

7 8 15

15 6 21

1 1 26 6 6

32 16 6 6

3 1 46 6 6



Therefore it would make sense to label the cumulative probability as follows:

Die roll Cumulative probability

up to and including 1 16( 1)P X £ =





up to and including 6 ( 6) 1P X £ =

The table showing the distribution of the probabilities was called the probability distribution (ie how the total probability is shared out amongst the values). So this table showing the distribution of the cumulative probabilities is called the cumulative (probability) distribution. Now we could draw a bar chart for this as follows:

x 6

P(X x)

1 3 52 4

2

6

4

6

1

0

However, rather than just calculating our cumulative probabilities at the achievable values (eg 1, 2,3,4,5 or 6x = for our die), we will generalise and calculate the

cumulative probabilities at any value of x.



3.2 The cumulative distribution function

The probability function, ( )P X x= , of a random variable X is the function that assigns

probabilities to each of the values that X can take (and gives zero for all the values it can’t take). The cumulative distribution function, ( )XF x , gives the cumulative

probability for all values of x (ie the total probability so far up to x):

Definition The cumulative distribution function (CDF) of a random variable X is:

( ) ( )XF x P X x= £ for all x

So how does this work? Looking at rolling a die again, we had:

x 1 2 3 4 5 6

P(X x) 16 2

6 36 4

6 56 1

This table shows where the cumulative probability ‘jumps’ up. But the cumulative distribution function is defined for all values of x. So we need to work out what it is for values below 1, in between the values listed and above 6. If 1x < , we haven’t yet reached any values that the die can take! So our total probability must be zero. eg (0.3) 0XF =

If 6x > , we have already met all of the possible values that the die can take and so have reached the cumulative probability of 1. So the CDF must stay at 1. eg (7) 1XF =

What about if x is between the values the die can take? Well if 1 2x< < nothing interesting is happening as there is no value that the die can take. So the total probability remains the same as it was at 1x = until we get to the next value ( 2x = ) the die can take.

eg 16(1.6)XF =



So the graph showing how the cumulative probability increases for our die is:

0 0 2 4 61 3 5

1

6

2

6

3

6

4

6

5

6

1 F X (x )

x

The graph starts at zero and ‘jumps up’ only at the values the die can take (ie only for the values of x where ( ) 0P X x= π ). Once all the possible values the die can take have

been included the graph stays at 1. Functions that produce this kind of ‘stepped’ graph are called step functions. So technically we should write the CDF as:

16

26

0 1

1 2

( ) 2 3

1 6

X

x

x

F x x

x

<ÏÔÔ £ <ÔÔ= £ <ÌÔÔÔÔ £Ó

Question 7.8

The random variable W has a probability function:

w 2 4 5 P(W = w) 0.2 0.5 0.3

(i) Sketch the graph of the cumulative distribution function. (ii) Give the cumulative distribution function for W. (iii) Find (1), (4.5)W WF F and (10)WF .



3.3 Using cumulative distribution functions to find probabilities

We can use the cumulative distribution function to find the probabilities for a random variable, either by directly using ( )XF x or by working backwards to find the original

probability function ( )P X x= and then using this. For example:

0 0

0.15 0 1

( ) 0.5 1 2

0.8 2 3

1 3

X

x

x

F x x

x

x

<ÏÔ

£ <ÔÔÔ= £ <ÌÔ

£ <ÔÔ

£ÔÓ

Now suppose we want to find: (i) ( 1)P X £ (ii) ( 2)P X > (iii) ( 3)P X =

Using ( )XF x to find the probability function

We could just find the original probability function for X by subtracting the cumulative probabilities (since these were found by adding up the probabilities). For example, to obtain (2) ( 2) 0.8XF P X= £ = we would have used:

(2) ( 2) ( 0) ( 1) ( 2)

( 1) ( 2) (1) ( 2)

X

X

F P X P X P X P X

P X P X F P X

= £ = = + = + =

= £ + = = + =

Hence, to get back to ( 2)P X = we just subtract the cumulative probabilities:

( 2) ( 2) ( 1) (2) (1) 0.8 0.5 0.3X XP X P X P X F F= = £ - £ = - = - =

Using the same idea for ( 0), ( 1)P X P X= = and ( 3)P X = we get:

x 0 1 2 3 P(X = x) 0.15 0.35 0.3 0.2

It is now easy to get the probabilities: (i) ( 1) 0.5P X £ = (ii) ( 2) 0.2P X > = (iii) ( 3) 0.2P X = =



Using ( )XF x to find probabilities directly

Alternatively, we could just work with the CDF: (i) Since the CDF already gives less than or equal probabilities, we get: ( 1) (1) 0.5XP X F£ = =

(ii) Since probabilities of all possible values add up to 1, we have: ( 2) ( 2) 1P X P X£ + > =

Hence:

( 2) 1 ( 2)

1 (2)

1 0.8 0.2

X

P X P X

F

> = - £

= -

= - =

(iii) Using the idea that subtracting cumulative probabilities gives an ‘equals’

probability, we get:

( 3) ( 3) ( 2)

(3) (2)

1 0.8 0.2

X X

P X P X P X

F F

= = £ - £

= -

= - =



Question 7.9

The cumulative distribution function of a random variable V is given by:

0 1

0.216 1 2

( ) 0.648 2 3

0.936 3 4

1 4

V

v

v

F v v

v

v

<ÏÔ

£ <ÔÔÔ= £ <ÌÔ

£ <ÔÔ

£ÔÓ

Find: (i) ( 2)P V = (ii) ( 1)P V > (iii) ( 3)P V <



4 Measures of location

In Chapter 2 we summarised sample data using the mean, mode and median. We will now look at how we can summarise a population modelled by a random variable. In this case we will be giving a summary of the values that the distribution could take based on their probabilities.

4.1 Mean

Recall from Chapter 2 that we found the sample mean for a frequency distribution using the formula:

fx

xf

= ÂÂ

The following frequency distribution shows the number of tea-breaks had by a sample of 25 students reading this chapter:

tea-breaks (x) frequency (f) 0 3 1 5 2 11 3 6

This gives a sample mean of:

(3 0) (5 1) (11 2) (6 3) 45

1.83 5 11 6 25

fxx

f

¥ + ¥ + ¥ + ¥= = = =+ + +

ÂÂ

tea-breaks

We are now going to do the same thing to calculate the mean for a probability distribution, but instead of using the frequencies, f , of each value we will use

probabilities, ( )P X x= :

( )

mean( )

xP X x

P X x

==

=ÂÂ

But wait! We know that for a probability function the sum of probabilities is 1. So:

( )

mean ( )1

xP X xxP X x

== = =Â Â



Let’s look at how this works for a random variable X with probability distribution:

x P(X = x)

1 0.1 2 0.2 3 0.4 4 0.3

mean ( ) (1 0.1) (2 0.2) (3 0.4) (4 0.3) 2.9xP X x= = = ¥ + ¥ + ¥ + ¥ =Â

Since this result gives us the mean of the values we would expect to get (ie the theoretical mean), it is called the expectation (or expected value) of X and is denoted

( )E X .

We also use m to stand for the mean of a population modelled by a random variable.

This distinguishes it from the mean of a sample, x .

Definition The mean (or expectation) of a random variable X is:

( ) ( )E X xP X xm = = =Â

Throughout statistics we use Greek letters for population results (eg 2, ,m s r ) and

English letters for sample results (eg 2, ,x s r ).

The bigger our sample the closer our sample results will be to the population results. Question 7.10

The random variable, X, is the number rolled on an ordinary fair die:

x 1 2 3 4 5 6

P(X = x) 16 1

6 16 1

6 16 1

6

Calculate the mean, ( )E X .



0.2

0.6

0 2 41 3 5

0.81

0.4

F(x)

x

4.2 Median

In Chapter 2, we obtained the median by counting through the frequencies until we found the value in the middle. Now we have probabilities instead of frequencies. So we will count through the probabilities to find the value in the middle. Since the probabilities vary continuously between 0 and 1, if we count through to a probability of ½, we will have reached the median. Below is the probability distribution of a random variable, X. To find the median value we count through the probabilities until we get to ½.

x 0 1 2 3 P(X = x) 0.1 0.2 0.4 0.3

Hence, the median of the random variable X is 2. Although counting through the probabilities seems to work fine, there are cases where it will fail us. A better way is to use the graph of the cumulative distribution function. For our random variable, we have:

0 0

0.1 0 1

( ) 0.3 1 2

0.7 2 3

1 3

x

x

F x x

x

x

<ÏÔ

£ <ÔÔÔ= £ <ÌÔ

£ <ÔÔ

£ÔÓ

Reading the probability of 0.5 from the graph, we can see that the median is 2.

0.1

so 0.5 is in here!0.1 0.2 0.3

0.1 0.2 0.4 0.7



The next question shows how using the graph of the CDF is superior: Question 7.11

The random variable, Z, is the number rolled on a fair four-sided die:

z 1 2 3 4

P(Z = z) 14 1

4 14 1

4

(i) Find the median by counting through the probabilities. (ii) (a) Sketch the graph of the cumulative distribution function. (b) Use this graph to find the median.

4.3 Mode

In Chapter 2, the mode was the value with the highest frequency (ie the value which occurs most). Now we have probabilities instead of frequencies. Hence, the mode will be the value with the highest probability (ie the value which is most likely to occur). So for our earlier example:

x 0 1 2 3 P(X = x) 0.1 0.2 0.4 0.3

The mode of this random variable is 2 as this value has the highest probability. Question 7.12

The random variable, X, is the number rolled on an ordinary fair die:

x 1 2 3 4 5 6

P(X = x) 16 1

6 16 1

6 16 1

6

What is the mode of X ?



5 Expectation of functions

5.1 General rule for expectation of any function

Earlier we defined the expectation (mean) of a random variable, X, to be:

( ) ( )E X xP X x= =Â

So if X is the number rolled on an ordinary die:

x 1 2 3 4 5 6

P(X = x) 16 1

6 16 1

6 16 1

6

1 1 1 16 6 6 2( ) ( ) 1 2 6 3E X xP X x= = = ¥ + ¥ + + ¥ =Â

What happens if we wanted the expectation (mean) of the square of the number we rolled? Our probability distribution would be:

2x 21 22 23 24 25 26

P(X = x) 16 1

6 16 1

6 16 1

6

Notice that the probabilities are unchanged. It’s still the same probability of getting each number on the die, all we are doing is finding the expectation (mean) of the numbers squared. This gives:

2 2 21 1 1 16 6 6 61 2 6 15¥ + ¥ + + ¥ =

What we have calculated is:

2 ( )x P X x=Â

Since we have found the expectation (mean) of the square of each of the numbers that

the random variable, X , can take, we write this as 2( )E X . Hence:

2 2( ) ( )E X x P X x= =Â



Similarly, if we wanted the expectation (mean) of the cube of each of the numbers that the random variable, X, can take:

3 3( ) ( )E X x P X x= =Â

We can repeat this for any function we like. For example:

2 2

2 2

(2 ) 2 ( )

(2 1) (2 1) ( )

( 3 2) ( 3 2) ( )

( 5) ( 5) ( )

E X xP X x

E X x P X x

E X X x x P X x

E X x P X x

= =

- = - =

- + = - + =

È ˘- = - =Î ˚

ÂÂÂ

Â

In general, we get the following rule:

Important result If X is a random variable and ( )g X is a function of that random variable, then:

[ ( )] ( ) ( )E g X g x P X x= =Â

Question 7.13

The random variable W has a probability function:

w 2 4 5 P(W = w) 0.2 0.5 0.3

Calculate:

(i) 2( )E W (ii) (5 2)E W - (iii) ( )1WE



Common Error

When calculating 2( )E X , students often square the probabilities by mistake. It is only

the x values that are squared.

5.2 Expectation of linear functions of a random variable

We can find the expectation of a linear function of X, say 2 1X - , using our rule:

(2 1) (2 1) ( )E X x P X x- = - =Â

However, there is a cunning shortcut we can use to work this out if we know ( )E X .

Recall from Chapter 2 how the sample mean, x , changed when we transformed a set of data. Given a sample mean of x , if we multiply each sample value by a and then add b the new mean is ax b+ . Does this work for the expectation (mean) of a population modelled by a random variable? If we multiply the distribution (ie each of the values the distribution can take) by a and then add b, is the new expectation (mean) just the old one multiplied by a and with b added? Let’s try it! If the random variable X has probability distribution:

x 1 3 5 P(X = x) 0.1 0.6 0.3

( ) ( ) (1 0.1) (3 0.6) (5 0.3) 3.4E X xP X x= = = ¥ + ¥ + ¥ =Â

(2 1) (2 1) ( ) (1 0.1) (5 0.6) (9 0.3) 5.8E X x P X x- = - = = ¥ + ¥ + ¥ =Â

Does (2 1) 2 ( ) 1E X E X- = - ?

2 ( ) 1 2 3.4 1 5.8E X - = ¥ - =

Yes!



We can show that this works for any linear function of X, ie:

1 13 3

(3 ) 3 ( )

(5 1) 5 ( ) 1

( 2) ( ) 2

E X E X

E X E X

E X E X

=

+ = +

- = -

In general:

Important result If X is a random variable and a and b are constants, then:

( ) ( )E aX b aE X b+ = +

The proof of this result can be found in Appendix A.

Question 7.14

A random variable, Z , has a mean of 6. Find: (i) (5 7)E Z + (ii) (9 4 )E Z-

(iii) 3 1

8

ZE

+Ê ˆÁ ˜Ë ¯

(iv) 2( )E Z

An interesting consequence of our rule can be found when we substitute 0a = : ( )E b b=

That is to say the expectation of a constant is just the constant. Since the constant doesn’t change with what the random variable is, we would expect the mean to be unchanged. Alternatively, we could see that it is equal to:

( ) ( ) ( )E b bP X x b P X x b= = = = =Â Â



5.3 Expectation of linear combinations of a random variable

We’ve now got a handy shortcut rule for finding expectations of linear functions of a

random variable, X. What about linear functions of, say 2X ?

2(3 1)E X -

It can be shown that the shortcut rule applies here too (see Appendix A). Hence:

2 2(3 1) 3 ( ) 1E X E X- = -

However, we can’t express 2( )E X in terms of ( )E X . We will just have to work it out

using 2 2( ) ( )E X x P X x= =Â .

OK, so what about even messier linear combinations such as:

2( 2 )E X X- ?

By considering the definition of the expectation we can see that:

2 2( 2 ) ( 2 ) ( )E X X x x P X x- = - =Â

Expanding the bracket and splitting up the summation gives:

{ }2 2

2

2

( 2 ) ( ) 2 ( )

( ) 2 ( )

( ) (2 )

E X X x P X x xP X x

x P X x xP X x

E X E X

- = = - =

= = - =

= -

Â

Â Â

We can split it up into lots of little expectations! We can then simplify this further, by taking the constants out:

2 2( 2 ) ( ) 2 ( )E X X E X E X- = -

This ‘splitting up’ idea can simplify grotty expectations, such as:

2 2(3 5 1) (3 ) (5 ) (1)E X X E X E X E+ - = + -



Once split up, we can then deal with the constants using our ( )E aX b+ shortcut:

2 2(3 5 1) 3 ( ) 5 ( ) 1E X X E X E X+ - = + -

Question 7.15

Simplify:

(i) 2(7 1)E X + (ii) 2(9 4 2)E X X- +

(iii) 2( 1)E XÈ ˘+Î ˚ (iv) ( )3 2XE +

Students are often tempted to think that [ ]22( ) ( )E X E X= . To show that this is not

generally the case, consider the random variable, X, with the following distribution:

x 1 3 5 P(X = x) 0.1 0.6 0.3

[ ]2 2

( ) ( ) (1 0.1) (3 0.6) (5 0.3) 3.4

( ) 3.4

E X xP X x

E X

= = = ¥ + ¥ + ¥ =

fi = =

Â11.56

2 2 2 2 2( ) ( ) (1 0.1) (3 0.6) (5 0.3)E X x P X x= = = ¥ + ¥ + ¥ =Â 13

We can see that they are clearly not equal!

Common Error

We cannot get 2( )E X from ( )E X :

[ ]22( ) ( )E X E Xπ



6 Measures of Spread

In Chapter 3, we looked at how we could measure the spread of a set of sample data using the range, the IQR and the standard deviation. Recall that the variance was the standard deviation squared. We will now look at how we can measure the spread of a population modelled by a random variable (ie the spread of the results we would expect to occur). However, for Subject CT3 we only use the standard deviation and the variance of a random variable.

6.1 Variance and standard deviation

Recall from Chapter 3 that the sample variance was:

2 21( )

1s x x

n= -

- Â

Now recall that ( 1)n - was used instead of n to adjust for the fact that the spread of a

sample would be less than the spread of the population. So for a population, we should really use:

21( )x x

n-Â

Suppose we wanted the variance of these numbers (which have mean 1.9): 1 1 1 2 2 2 2 2 3 3 Rather than calculating the variance as follows:

2 2 2 2 2 21(1 1.9) (1 1.9) (1 1.9) (2 1.9) (3 1.9) (3 1.9)

10È ˘- + - + - + + - + - + -Î ˚

we would multiply each 2( )x x- by its corresponding frequency:

2 2 213 (1 1.9) 5 (2 1.9) 2 (3 1.9)

10È ˘¥ - + ¥ - + ¥ -Î ˚



ie we are using:

21( )f x x

n-Â

For a random variable, we now have probabilities, ( )P X x= , for each x instead of their

frequencies, f . So our variance becomes:

21( ) ( )x x P X x

n- =Â

Recall that n was the total of the frequencies, so now n will be the total of the probabilities, which is 1. Hence, our variance is:

2( ) ( )x x P X x- =Â

Wait a minute! x was the sample mean. We now want the population mean (the mean of the random variable). This is m or ( )E X :

2( ) ( )x P X xm- =Â

Hold on! Recall that [ ]( ) ( ) ( )E g X g x P X x= =Â . So we have:

2 2( ) ( ) ( )E X x P X xm mÈ ˘- = - =Î ˚ Â

This is our definition of the population variance (the variance of a random variable).

We use 2s or var( )X to stand for the variance of a population modelled by the random

variable X . The standard deviation, s , is just the square root of this.

Definition The variance of a random variable X is:

2 2var( ) ( )X E Xs mÈ ˘= = -Î ˚

The standard deviation is given by:

var( )Xs =



So how do we use this formula? Let’s look at rolling an ordinary fair die, where the random variable, X is the number rolled:

x 1 2 3 4 5 6

P(X = x) 16 1

6 16 1

6 16 1

6

Now:

1 1 16 6 2( ) ( ) 1 6 3E X xP X xm = = = = ¥ + + ¥ =Â

So:

2 2

2 2 21 1 1 1 1 12 6 2 6 2 6

var( ) ( ) ( )

(1 3 ) (2 3 ) (6 3 )

2.916

X x P X xs m= = - =

= - ¥ + - ¥ + + - ¥

=

Â

Hence, the standard deviation is:

2.916 1.7078s = = This value tells us how spread out we would expect the values to be when they occur. Question 7.16

The random variable, X, has the following probability distribution:

x 0 1 2 3 P(X = x) 0.1 0.2 0.4 0.3

Calculate var( )X .



Whilst this formula gives us the right answer, it is a bit long winded to use, just like we

found 2 21( )

1s x x

n= -

- Â to be. The way we got round this in Chapter 3 was by

rearranging the formula to get:

{ }2 2 21

1s x nx

n= -

- Â

Similarly, we can rearrange our formula for the variance of a population modelled by a random variable. If we do, we find that:

2 2 2( )E Xs m= -

Since ( )E Xm = , we have 2 2[ ( )]E Xm = . In the same way that we wrote 2(sin )x as 2sin x at school, we write 2[ ( )]E X as 2 ( )E X :

Definition The variance of a random variable X is also given by:

2 2 2var( ) ( ) ( )X E X E Xs = = -

The proof of this result can be found in Appendix B. Taking our die roll random variable again:

x 1 2 3 4 5 6

P(X = x) 16 1

6 16 1

6 16 1

6

1 1 16 6 2

2 2 2 21 1 16 6 6

( ) ( ) 1 6 3

( ) ( ) 1 6 15

E X xP X x

E X x P X x

= = = ¥ + + ¥ =

= = = ¥ + + ¥ =

ÂÂ

2 2 21 16 2var( ) ( ) ( ) 15 (3 ) 2.916X E X E Xfi = - = - =

We can see that this method is much quicker (and gives the right answer)!



Question 7.17

A company invests a lump sum in Maximin™ shares. The value of the company’s investment (in £m) after one year is given by the random variable Y :

y 0.5 0.8 1.3 2.5 P(Y = y) 0.15 0.3 0.5 0.05

Find: (i) ( )E Y

(ii) the standard deviation of Y.

6.2 Variance of linear functions of a random variable

In Section 5.2, we used a cunning shortcut to work out the mean of a linear combination of X , if we knew ( )E X :

( ) ( )E aX b aE X b+ = +

We are now going to find a similar shortcut rule for var( )aX b+ .

Recall from Chapter 3 how the sample variance, 2s , changed when we transformed a

set of data. Given a sample variance of 2s , if we multiply each sample value by a and

then add b the new variance is 2 2a s . Does this work for variances of populations modelled by random variables? The short answer is yes!


2var( ) var( )aX b a X+ =

and standard deviation( ) standard deviation( )aX b a X+ = ¥

The proof of this result can be found in Appendix C.



The reasoning behind this result is that adding a constant to each of the values does not alter the spread of the distribution:

spread

X

0

0.2

0.4

0.1

0.3

0.5

P(X = x)

x 1 2 3

0

0.2

0.4

0.1

0.3

0.5

P(X = x)

x 2 3 41

spread

X + 1

Whereas multiplying each of the values by a constant does alter the spread:

spread

X

0

0.2

0.4

0.1

0.3

0.5

P(X = x)

x 1 2 3

0

0.2

0.4

0.1

0.3

0.5

P(X = x)

x 2 3 41

spread

2X

5 6

Hence, since variance measures spread squared – we multiply by the constant squared.

Question 7.18

A random variable, Z , has a mean of 6 and a standard deviation of 2. Find: (i) var(5 7)Z + (ii) var(9 4 )Z-

(iii) 3 1

var8

Z +Ê ˆÁ ˜Ë ¯

(iv) 2( )E Z



7 Skewness

Recall that the skewness reflects the shape of the distribution (or more accurately it is a measure of how asymmetrical the distribution is). The three types were:


In Chapter 3, we measured the skewness of a sample using:

31( )ix x

n-Â

The cube ensures that we get a negative answer for a negatively skewed sample, a positive answer for a positively skewed sample and zero for a symmetrical sample.

7.1 Skewness of a population

Using a similar method to the one we used in Section 6.1, we can develop an equivalent measure of skewness for a population modelled by a random variable, X :

Definition The skewness of a random variable X is given by:

33 ( ) [( ) ]Skew X E Xm m= = -

where:

( ) 0 is positively skew

( ) 0 is symmetrical

( ) 0 is negatively skew

Skew X X

Skew X X

Skew X X

> fi

= fi

< fi



So how do we use this formula? Suppose X has probability distribution:

x 1 2 3 4 P(X = x) 0.25 0.5 0.2 0.05

A look at the graph of the PF shows that X is positively skew:

0

0.2

0.4

0.6

0.1

0.3

0.5

P(X = x)

x 1 2 3 4

Now:

( ) ( ) (1 0.25) (2 0.5) (3 0.2) (4 0.05) 2.05E X xP X xm = = = = ¥ + ¥ + ¥ + ¥ =Â

So:

3

3

3 3 3 3

( ) [( ) ]

( ) ( )

(1 2.05) 0.25 (2 2.05) 0.5 (3 2.05) 0.2 (4 2.05) 0.05

0.25275

Skew X E X

x P X x

m

m

= -

= - =

= - ¥ + - ¥ + - ¥ + - ¥

=

Â

We have a positive value, so X is positively skew. Question 7.19

The random variable, X, has this symmetrical probability distribution:

x 1 2 3 P(X = x) 0.2 0.6 0.2

Show that 3 2 2 3( ) ( ) (3 ) (3 ) ( )Skew X E X E X E X Em m m= - + - .



Earlier, we rearranged the variance into an alternative (and slightly easier) form:

2 2 2var( ) [( ) ] ( )X E X E Xm m= - = -

In the same way we can rearrange our formula for the skewness:

Definition The skewness of a random variable X is also given by:

3 2 33 ( ) ( ) 3 ( ) 2Skew X E X E Xm m m= = - +

The proof of this result can be found in Appendix D. Taking our random variable, X , with probability distribution:

x 1 2 3 4 P(X = x) 0.25 0.5 0.2 0.05

2 2 2 2 2 2

3 3 3 3 3 3

( ) ( ) (1 0.25) (2 0.5) (3 0.2) (4 0.05) 2.05

( ) ( ) (1 0.25) (2 0.5) (3 0.2) (4 0.05) 4.85

( ) ( ) (1 0.25) (2 0.5) (3 0.2) (4 0.05) 12.85

E X xP X x

E X x P X x

E X x P X x

m = = = = ¥ + ¥ + ¥ + ¥ =

= = = ¥ + ¥ + ¥ + ¥ =

= = = ¥ + ¥ + ¥ + ¥ =

ÂÂÂ

3 2 3

3

( ) ( ) 3 ( ) 2

12.85 (3 2.05 4.85) (2 2.05 )

0.25275

Skew X E X E Xm mfi = - +

= - ¥ ¥ + ¥

=

This gives us the same answer as before. This alternative formula tends to be quicker if there are a lot of values. However, since neither formula is given in the Tables – this one might be harder to remember! Learning its derivation given in Appendix D might help you find the formula in an exam.



7.2 Coefficient of skewness

In a small arboretum there are two types of leaves: short or long. The probability distribution for the lengths of these leaves is:

length, x 5 cm 8 cm P(X = x) 0.6 0.4

( ) (5 0.6) (8 0.4) 6.2E Xm = = ¥ + ¥ = cm

3 3 3( ) [( ) ] (5 6.2) 0.6 (8 6.2) 0.4 1.296Skew X E X m= - = - ¥ + - ¥ = cm³

Now suppose that we measured the length of the leaves in millimetres instead:

length, x 50 mm 80 mm P(X = x) 0.6 0.4

( ) (50 0.6) (80 0.4) 62E Xm = = ¥ + ¥ = mm

3 3 3( ) [( ) ] (50 62) 0.6 (80 62) 0.4 1, 296Skew X E X m= - = - ¥ + - ¥ = mm³

Eek! We can see that the units used affect the skewness. This makes it hard to compare the skewness of different distributions – does the bigger number mean that it is more skewed or that it has different units? To get round this problem we will standardise the skewness by getting rid of the units so we just have a number (ie a coefficient). Since the skewness is measured in cubic units, eg cm³, we need to divide by something measured in cm³ to get rid of the units. To cut a long story short, we divide by the standard deviation cubed:

Definition The coefficient of skewness of a random variable X is given by:

3

( )Skew X

s

The reasoning behind why dividing by the standard deviation ‘standardises’ a measure is explored more fully in Chapters 11 and 12.



We now find that for our leaves measured in cm:

32

2 2 2 2var( ) ( ) ( ) 40.6 6.2 2.16

1.296coefficient of skewness 0.408

2.16

X E X E Xs = = - = - =

fi = =

And for our leaves measured in mm:

32

2 2 2 2var( ) ( ) ( ) 4,060 62 216

1,296coefficient of skewness 0.408

216

X E X E Xs = = - = - =

fi = =

Both give the same answer! So this is the measure of skewness we will use. Question 7.20

The random variable, X, has this negatively skewed probability distribution:

x 1 2 3 P(X = x) 0.2 0.3 0.5

Show that the coefficient of skewness is 0.579- . This coefficient can take any value depending on the skewness of the distribution. For example, the exponential distribution (which we will meet in Chapter 10) has a coefficient of skewness of 2.



8 Population moments

8.1 Moments

In Chapter 2, we defined the kth order sample moment to be:

1 k

ixnÂ

We are now going to develop the kth order moments for a population modelled by a random variable, say X.

Suppose we wanted the 3rd order sample moment, 31x

nÂ , for this set of data:

1 1 1 2 2 2 2 2 3 3

To calculate 3xÂ , quickly for this list we would multiply each 3x by its corresponding

frequency:

3 3 3(3 1 ) (5 2 ) (2 3 )¥ + ¥ + ¥

In shorthand, we are using 3fxÂ . For a random variable, we have probabilities,

( )P X x= , for each x instead of their frequencies, f . This gives 3 ( )x P X x=Â . But

this is the definition of 3( )E X ! So we have:

31( )E X

n

Recall that n was the total of the frequencies, so now n will be the total of the probabilities, which is 1. So the 3rd order moment of a random variable is:

3( )E X



In general:

Definition The kth order moment of a random variable X is:

( )kE X

The first order moment, ( )E X , is the mean of the random variable.

Sometimes these are called moments about zero (or moments about the origin) to distinguish them from central moments (which we will now define).

8.2 Central moments

In Chapter 3, we defined the kth order central sample moment to be:

1

( )kix x

n-Â

Using a similar method to the one we used in Section 6.1, we can develop the kth order central moment for a population modelled by a random variable, X.

Definition The kth order central moment of a random variable X is:

[( ) ]kk E Xm m= -

The second order central moment, 2[( ) ]E X m- , is var( )X .

The third order central moment, 3[( ) ]E X m- , is ( )Skew X .

Sometimes central moments are called moments about the mean to distinguish them from the most general case:



Definition The kth order moment about c of a random variable X is:

[( ) ]kE X c-

Question 7.21

The random variable, V, has the following probability distribution:

v 1- 2 P(V = v) 0.4 0.6

Calculate: (i) the second moment of V (ii) the third order central moment of V (iii) the second order moment of V about 1.



9 Miscellaneous questions

We have now covered all the questions you can be asked on generic discrete random variables. However, exam questions tend to: require us to calculate the probability distribution first lump several areas of this chapter together

This next question gives practice of these important skills:

Question 7.22

An actuary has three pensions to review, one of which is known to be under-funded. The actuary goes through each pension in turn until she reaches the under-funded one, after reviewing this one she stops. (i) Obtain the probability distribution for the number of pensions the actuary

reviews before she stops. (Hint: Use a probability tree) (ii) What is the expected number of pensions she reviews? (iii) The actuary charges £50 administration expenses plus £300 per pension.

Calculate the mean and standard deviation of the amount she charges in total for the review.



10 Appendix A – proof of + = +( ) ( )E aX b aE X b

Using the fact that the expectation of any function of X , say ( )g X , is given by:

[ ( )] ( ) ( )E g X g x P X x= =Â

We have:

( ) ( ) ( )E aX b ax b P X x+ = + =Â

Expanding the brackets gives:

{ }( ) ( ) ( )E aX b axP X x bP X x+ = = + =Â

We can split up the summation:

( ) ( ) ( )E aX b axP X x bP X x+ = = + =Â Â

Since a and b are constants we can take them out of the summations:

( ) ( ) ( )E aX b a xP X x b P X x+ = = + =Â Â

But we know that the sum of all probabilities, ( )P X x=Â , is 1:

( ) ( )E aX b a xP X x b+ = = +Â

Recalling the definition of the mean, ( ) ( )E X xP X x= =Â , we get:

( ) ( )E aX b aE X b+ = +

By replacing all the x’s in the proof with ( )g x ’s, this result can be generalised to show

that: [ ( ) ] [ ( )]E ag X b aE g X b+ = +



11 Appendix B – proof that = -2 2var( ) ( ) ( )X E X E X

We can obtain this result either rearranging 2var( ) ( )X E X mÈ ˘= -Î ˚ or by working from

the rearranged sample variance result, { }2 2 21

1s x nx

n= -

- Â .

Rearranging We have:

2var( ) ( )X E X mÈ ˘= -Î ˚

Expanding the bracket gives:

2 2var( ) 2X E X Xm mÈ ˘= - +Î ˚

Splitting up the expectation gives:

2 2var( ) ( ) (2 ) ( )X E X E X Em m= - +

Since m is a constant, we can use ( ) ( )E aX b aE X b+ = + :

2 2

(2 ) 2 ( )

( )

E X E X

E

m m

m m

=

=

Hence:

2 2var( ) ( ) 2 ( )X E X E Xm m= - +

But ( )E Xm = ! So we get:

2 2 2

2 2

2 2

var( ) ( ) 2

( )

( ) ( )

X E X

E X

E X E X

m m

m

= - +

= -

= -



Using the sample variance In Chapter 3, we rearranged the sample variance to get:

{ }2 2 21

1s x nx

n= -

- Â

Now recall that 1n - was used instead of n to adjust for the fact that the spread of a sample would be less than the population. So for a population, we should use:

{ }2 21x nx

n-Â

Recall that to calculate the sum of squares, 2xÂ , quickly for this list:

1 1 1 2 2 2 2 2 3 3

We would multiply each 2x by its corresponding frequency:

2 2 2(3 1 ) (5 2 ) (2 3 )¥ + ¥ + ¥

In shorthand, we are using 2fxÂ . For a random variable, we now have probabilities,

( )P X x= , for each x instead of their frequencies, f . So our sum of squares is 2 ( )x P X x=Â . This is the definition of 2( )E X ! So we have:

{ }2 21( )E X nx

n-

Now n was the total of the frequencies, so now n will be the total of the probabilities, which is 1. This gives:

2 2( )E X x-

Hold on! We should be using the population mean, m instead of the sample mean, x .

This gives:

2 2 2 2( ) or ( ) ( )E X E X E Xm- -



12 Appendix C – proof of + = 2var( ) var( )aX b a X

This result can be proved by considering either definition of the variance. However, we will prove it using the simpler form of:

2 2var( ) ( ) ( )X E X E X= -

Replacing X with aX b+ in our variance formula gives:

2 2var( ) [( ) ] [ ( )]aX b E aX b E aX b+ = + - +

Expanding the brackets in the first term gives:

2 2 2 2var( ) [ 2 ] [ ( )]aX b E a X abX b E aX b+ = + + - +

Splitting up the first expectation:

2 2 2 2var( ) ( ) (2 ) ( ) [ ( )]aX b E a X E abX E b E aX b+ = + + - +

Using ( ) ( )E aX b aE X b+ = + gives:

2 2 2 2var( ) ( ) 2 ( ) [ ( ) ]aX b a E X abE X b aE X b+ = + + - +

Expanding the brackets in the last term and simplifying:

{ }

2 2 2 2 2 2

2 2 2 2

2 2 2

2

var( ) ( ) 2 ( ) [ ( ) 2 ( ) ]

( ) ( )

( ) ( )

var( )

aX b a E X abE X b a E X abE X b

a E X a E X

a E X E X

a X

+ = + + - + +

= -

= -

=



13 Appendix D – proof that = - +3 2 3( ) ( ) 3 ( ) 2Skew X E X E Xm m

We have:

3( ) ( )Skew X E X mÈ ˘= -Î ˚


[ ]2 2

3 2 2 3

( ) ( )( )( )

( )( 2 )

3 3

Skew X E X X X

E X X X

E X X X

m m m

m m m

m m m

= - - -

È ˘= - - +Î ˚

È ˘= - + -Î ˚


3 2 2 3( ) ( ) (3 ) (3 ) ( )Skew X E X E X E X Em m m= - + -


2 2

2 2

3 3

(3 ) 3 ( )

(3 ) 3 ( )

( )

E X E X

E X E X

E

m m

m m

m m

=

=

=

Hence:

3 2 2 3( ) ( ) 3 ( ) 3 ( )Skew X E X E X E Xm m m= - + -


3 2 3 3

3 2 3

( ) ( ) 3 ( ) 3

( ) 3 ( ) 2

Skew X E X E X

E X E X

m m m

m m

= - + -

= - +



Extra practice questions Section 2: Probability functions

P7.1 Subject 101, April 2000, Q9 (part) The discrete random variable X has the following probability function: ( ) 0.2P X i ai= = + : 2, 1, 0, 1, 2i = - -

State the possible values that a can take. [1]

P7.2 Subject C1, April 1998, Q4 The discrete variable X takes 4 distinct values with probabilities:

( ) ( ) ( ) ( )1 3 4, 1 4, 1 2 4, 1 4 4q q q q+ - + -

This defines a probability distribution if:

A 312 2q- £ £ B 1 1

2 8q- £ £

C 3 14 4q- £ £ D 1 1

3 4q- £ £ [3]

P7.3 A discrete random variable, W , has the following probability function:

22( ) 0,1,2,

!

w

P W w e ww

-= = =

Calculate: (i) ( 3)P W =

(ii) ( 1)P W > .



Section 3: Cumulative distribution functions

P7.4 The random variable X has the following probability function:

1( ) (8 ) 2,4 or 6

12P X x x x= = - =

(i) Sketch the graph of the cumulative distribution function (ii) State the cumulative distribution function of X . Sections 4,5 and 6: Measures of location and spread

P7.5 The random variable Y has probability function:

( ) 1, 2, 3, 4k

P Y y yy

= = =

Find the mean, mode and median of Y .

P7.6 Subject C1, September 1995, Q2 (part) A private investor has capital of £16,000. He divides this into eight units of £2,000, each of which he invests in a separate one-year investment. Each of these investments has three possible outcomes at the end of the year: 1. total loss of capital probability 0.1 2. capital payment of £2,000 probability 0.7 3. capital payment of £5,000 probability 0.2 The investments behave independently of one another, and there is no other return from them. What is the expected payment received by the investor at the end of the year? [2]



P7.7 The number of deaths amongst three life assurance policyholders is given by the random variable X :

3 3( ) 0.1 0.9 0,1,2,3x xxP X x C x-= = ¥ ¥ =

Calculate the mean and standard deviation of the number of deaths.

P7.8 Subject 101, April 2001, Q8 (part) Consider two independent lives A and B. The probabilities that A and B die within a specified period are 0.1 and 0.2 respectively. If A dies you lose £50,000, whether or not B dies. If B dies you lose £30,000, whether or not A dies. (i) Calculate the mean and standard deviation of your total losses in the period. [4]

P7.9 Subject C1, September 1997, Q3 (adapted) Customer electricity charges C are calculated according to the formula 7.00 0.0742C N= + where N denotes the number of units used. In a particular area, N is modelled as a random variable with mean 600 and variance 250. Calculate the mean and variance of the charges C in their respective units. [2]

P7.10 The mean and standard deviation of the random variable U are 8 and 3, respectively. Find: (i) (3 6 )E U+ (ii) the standard deviation of 8 2U-

(iii) 8

var3

U -Ê ˆÁ ˜Ë ¯

(iv) 2( 4 7)E U U- +



Section 7: Skewness

P7.11 Subject C1, April 1999, Q4 (adapted) A simple discrete random variable, X , has probability function given by ( 0) 0.4P X = = ( 1) 0.6P X = = .

Calculate the coefficient of skewness. [3]

P7.12 Subject C1, September 1998, Q9 Calculate the coefficient of skewness for the following discrete probability distribution:

x 2 1 0 1 P(X = x) 0.1 0.1 0.5 0.3

[4]



Chapter 7 Summary Discrete random variables A random variable uses probabilities to decide its numerical value. A discrete random variable can only take certain numerical values, eg 1,2,3, …

( )P X x= is a probability function (PF) of X if:

( ) 0 and ( ) 1P X x P X x= ≥ = =Â

The cumulative distribution function (CDF) of X is: ( ) ( )XF x P X x= £

This gives a stepped graph. Measures of location The mean (or expectation) of a population modelled by a random variable X is:

( ) ( )E X xP X xm = = =Â

The median of X is the value(s) that correspond to a cumulative probability of 0.5. It is best found by drawing a graph of the CDF. The mode of X is the value with the greatest probability. Variance and standard deviation The variance of a population modelled by a random variable X is:

2 2

2 2

var( ) ( )

( ) ( )

X E X

E X E X

s mÈ ˘= = -Î ˚

= -

It measures the spread squared of the distribution.



The standard deviation of a population modelled by a random variable X is:

var( )Xs =

It measures the spread of the distribution. Linear functions of a random variable We can simplify means and variances of linear functions using:

2

( ) ( )

var( ) var( )

E aX b aE X b

aX b a X

+ = +

+ =

We can also split up expectations, such as 2(3 5 1)E X X- + , as follows:

2 2(3 5 1) 3 ( ) 5 ( ) 1E X X E X E X- + = - +

Skewness The skewness of a population modelled by a random variable X is:

33

3 2 3

( ) ( )

( ) 3 ( ) 2

Skew X E X

E X E X

m m

m m

È ˘= = -Î ˚

= - +

The coefficient of skewness is the best way to compare distributions:

3

( )coefficient of skewness

Skew X

s=

Moments

th moment [ ]

th central moment [( ) ]

th moment about [( ) ]

k

k

k

k E X

k E X

k c E X c

m

=

= -

= -




To be a discrete random variable, the item must randomly take a discrete value. I the number of months in a year is fixed II the number of objectives in the Subject CT3 syllabus is fixed III the number of candidates who pass will vary randomly and can only take

discrete values (0, 1, 2, …) IV the number of fountains is fixed V the number of claims received tomorrow will vary randomly and can only take

discrete values (0, 1, 2, …) Therefore items III and V are discrete random variables. Solution 7.2

Drawing a tree diagram:

0.5

0.5

0.5

0.5

0.5

0.5

H

T

H

T

H

T

1st coin 2nd coin outcomes

HH (2 heads)

HT (1 head)

TT (0 heads)

TH (1 head)

probabilities

0.5 0.5 = 0.25

0.5 0.5 = 0.25

0.5 0.5 = 0.25

0.5 0.5 = 0.25

We obtain:

Number of heads 0 1 2 Probability 0.25 0.5 0.25



Solution 7.3

In Question 7.2, we obtained the following probability distribution:

Number of heads 0 1 2 Probability 0.25 0.5 0.25

So using X for the number of heads, our probability function, ( )P X x= , is:

( 0) 0.25

( 1) 0.5

( 2) 0.25

P X

P X

P X

= =

= =

= =

Solution 7.4

Substituting the values that X could take into our probability function, we get:

6112 12

1 412 12

1 212 12

( 2) (8 2)

( 4) (8 4)

( 6) (8 6)

P X

P X

P X

= = - =

= = - =

= = - =

So we can see that ( ) 0P X x= ≥ for each value of x.

Now adding up the probabilities:

6 4 212 12 12( ) ( 2) ( 4) ( 6) 1P X x P X P X P X= = = + = + = = + + =Â

Hence, 112( ) (8 )P X x x= = - where 2,4 or 6x = is a probability function.



Solution 7.5

Writing out the probability distribution of X , we get:

x 1 2 3 4 P(X = x) c 4c 9c 16c

Since the sum of the probabilities must be 1, we have:

130

( ) 0 4 9 16 1

30 1

P X x c c c c

c c

= = + + + + =

fi = fi =

Â

Solution 7.6

Using the fact that the probabilities sum to 1 is not helpful here:

5 1 18 4 8 2 1q q q- + - + + =

So, we need to use the fact that each of the probabilities must lie between 0 and 1:

5 5 3 5 3 3 58 8 8 8 8 8 80 1q q q q£ - £ fi - £ - £ fi ≥ ≥ - fi - £ £

3 3 31 1 1 14 4 4 4 4 4 40 1q q q q£ - £ fi - £ - £ fi ≥ ≥ - fi - £ £

7 71 1 18 8 8 16 160 2 1 2q q q£ + £ fi - £ £ fi - £ £

Now we have to find the range of values that q can take, so that all of these conditions are satisfied. The easiest way to do this is to sketch a number line:

–3

4 –

3

8 –

1

16

1

4

5

8

7

16 0

The shaded region shows the values that satisfy all three inequalities. Hence:

1 116 4q- £ £



Solution 7.7

We have:

1 3

( ) 0,1, 2,3,4 4

x

P X x xÊ ˆ= = =Á ˜Ë ¯

(i) 21 3 9

( 2) 0.1406254 4 64

P XÊ ˆ= = = =Á ˜Ë ¯

(ii) Since X can only take the values 0,1, 2, 3, … this means that: ( 2) ( 0,1 or 2)P X P X£ = =

Since a random variable can only take one value at a time, the values are

mutually exclusive. Hence:

( 2) ( 0,1 or 2)

( 0) ( 1) ( 2)

P X P X

P X P X P X

£ = =

= = + = + =

Working out each of these probabilities:

0

1 3 1( 0) 0.25

4 4 4P X

Ê ˆ= = = =Á ˜Ë ¯

11 3 3

( 1) 0.18754 4 16

P XÊ ˆ= = = =Á ˜Ë ¯

9

( 2) 0.14062564

P X = = = from part (i)

Hence:

1 3 9 37

( 2) 0.5781254 16 64 64

P X £ = + + = =



Solution 7.8

(i) The graph will be:

0 2 4 61 3 50

FW (w)

w

0.2

0.4

0.6

0.8

1

(ii) From this we get the cumulative distribution function to be:

0 2

0.2 2 4( )

0.7 4 5

1 5

W

w

wF w

w

w

<ÏÔ

£ <ÔÔ= Ì£ <Ô

Ô£ÔÓ

(iii) Either reading off the graph or using the CDF we get:

(1) 0

(4.5) 0.7

(10) 1

W

W

W

F

F

F

=

=

=



Solution 7.9

We can solve this by obtaining the probability function or by using the cumulative function directly. Obtaining the probability function First we notice that the CDF jumps up at 1, 2,3 and 4x = . To obtain the probabilities of

obtaining each of these values, we simply subtract the cumulative probabilities:

v 1 2 3 4 P(V = v) 0.216 0.432 0.288 0.064

We can now obtain the probabilities: (i) ( 2) 0.432P V = =

(ii) ( 1) ( 2) ( 3) ( 4) 0.432 0.288 0.064 0.784P V P V P V P V> = = + = + = = + + =

(iii) ( 3) ( 1) ( 2) 0.216 0.432 0.648P V P V P V< = = + = = + =

Using the cumulative distribution function directly. (i) Using the fact that subtracting cumulative probabilities gives the original

probabilities:

( 2) (2) (1) 0.648 0.216 0.432V VP V F F= = - = - =

(ii) Since probabilities of all possible values sum to 1, we get:

( 1) 1 ( 1) 1 (1) 1 0.216 0.784VP V P V F> = - £ = - = - =

(iii) Reading directly from the cumulative distribution function:

( 3) 0.648P V < =



Solution 7.10

Using the formula, we get:

( ) ( ) ( ) ( ) ( ) ( )1 1 1 1 1 16 6 6 6 6 6

12

( ) ( )

1 2 3 4 5 6

3

E X xP X x= =

= ¥ + ¥ + ¥ + ¥ + ¥ + ¥

=

Â

Note that, just like the sample mean, we can get ‘impossible’ results – I’ve never rolled

a 123 on a die!

Solution 7.11

(i) Counting through the probabilities until we get to ½:

z 1 2 3 4

P(Z = z) 14 1

4 14 1

4

We can see that the median is 2. (ii) (a) The graph of the cumulative distribution function is the bold line:

0.2

0.6

0 2 41 3 5

0.81

0.4

F(z)

z

median (b) We can see that not only does 2 correspond to a probability of 0.5, but so do all

the values in the range 2 3z£ < ! These are all the median!

so 12 is in here!

14

1 1 14 4 2



Solution 7.12

Since all the values have the same probability of occurring – no value has a higher probability than the rest. Hence, there is no mode. Solution 7.13

(i) The probability distribution of 2W is:

2w 22 24 25 P(W = w) 0.2 0.5 0.3

2 2

2 2 2

( ) ( )

(2 0.2) (4 0.5) (5 0.3) 16.3

E W w P W wfi = =

= ¥ + ¥ + ¥ =

Â

We don’t actually need to write out the probability distribution – we can simply

use the formula. We’ve only included the distribution here to make it clearer what’s happening.

(ii) The probability distribution of 5 2W - is:

5w - 2 8 18 23 P(W = w) 0.2 0.5 0.3

(5 2) (5 2) ( )

(8 0.2) (18 0.5) (23 0.3) 17.5

E W w P W w- = - =

= ¥ + ¥ + ¥ =

Â

(iii) The probability distribution of 1W is:

1

W 1

2 14 1

5

P(W = w) 0.2 0.5 0.3

( )

( ) ( ) ( )

1 1

1 1 12 4 5

( )

0.2 0.5 0.3 0.285

W wE P W w= =

= ¥ + ¥ + ¥ =

Â



Solution 7.14

We are told that ( ) 6E Z = .

Using our rule we get: (i) (5 7) 5 ( ) 7

(5 6) 7

37

E Z E Z+ = +

= ¥ +

=

(ii) (9 4 ) 9 4 ( )

9 (4 6)

15

E Z E Z- = -

= - ¥

= -

(iii) Since:

3 1 3 1

8 8 8

ZZ

+ = +

We get:

3 1 3 1

8 8 8

3 1( )

8 8

3 16

8 8

2.375

ZE E Z

E Z

+Ê ˆ Ê ˆ= +Á ˜ Á ˜Ë ¯ Ë ¯

= +

Ê ˆ= ¥ +Á ˜Ë ¯

=

(iv) We cannot get 2( )E Z from ( ).E Z Our rule only works for linear functions.



Solution 7.15

(i) 2 2(7 1) 7 ( ) 1E X E X+ = +

(ii) 2 2

2

(9 4 2) (9 ) (4 ) (2)

9 ( ) 4 ( ) 2

E X X E X E X E

E X E X

- + = - +

= - +

(iii) Multiplying out the brackets first:

2 2

2

2

( 1) [ 2 1]

( ) (2 ) (1)

( ) 2 ( ) 1

E X E X X

E X E X E

E X E X

È ˘+ = + +Î ˚

= + +

= + +

(iv) ( ) ( )( )

3 1

1

2 3 2

3 2

X X

X

E E

E

+ = ¥ +

= +

We cannot simplify ( )1XE to 1

( )E X ! We would have to work it out from first

principles:

( )1 1 ( )X xE P X x= =Â



Solution 7.16

Our formula for the variance, 2var( ) ( )X E X mÈ ˘= -Î ˚ , requires the mean:

( ) ( )

(0 0.1) (1 0.2) (2 0.4) (3 0.3)

1.9

E X xP X xm = = =

= ¥ + ¥ + ¥ + ¥

=

Â

Substituting this, we get:

{ } { } { } { }

2

2

2 2 2 2

var( ) ( )

( ) ( )

(0 1.9) 0.1 (1 1.9) 0.2 (2 1.9) 0.4 (3 1.9) 0.3

0.361 0.162 0.004 0.363

0.89

X E X

x P X x

m

m

È ˘= -Î ˚

= - =

= - ¥ + - ¥ + - ¥ + - ¥

= + + +

=

Â



Solution 7.17

(i) Using our formula, we get:

( ) ( )

(0.5 0.15) (0.8 0.3) (1.3 0.5) (2.5 0.05)

1.09

E Y yP Y y= =

= ¥ + ¥ + ¥ + ¥

=

Â

So the mean value of the investment is £1.09m.

(ii) First, we need 2( )E Y :

2 2

2 2 2 2

( ) ( )

(0.5 0.15) (0.8 0.3) (1.3 0.5) (2.5 0.05)

1.387

E Y y P Y y= =

= ¥ + ¥ + ¥ + ¥

=

Â

Hence:

2 2

2

var( ) ( ) ( )

1.387 1.09

0.1989

Y E Y E Y= -

= -

=


0.1989 0.4460= The standard deviation of the investment returns is £0.4460m.



Solution 7.18

Since the standard deviation is 2, we have 2var( ) 2 4Z = = .

(i) 2

2

var(5 7) 5 var( )

5 4

100

Z Z+ =

= ¥

=

(ii) 2var(9 4 ) ( 4) var( )

16 var( )

16 4

64

Z Z

Z

- = -

=

= ¥

=

(iii) Since:

3 1 3 1

8 8 8

ZZ

+ = +

We get:

2

2

3 1 3 1var var

8 8 8

3var( )

8

3 94 0.5625

8 16

ZZ

Z

+Ê ˆ Ê ˆ= +Á ˜ Á ˜Ë ¯ Ë ¯

Ê ˆ= Á ˜Ë ¯

Ê ˆ= ¥ = =Á ˜Ë ¯

(iv) We can’t get 2( )E Z from the mean, ( )E Z . But rearranging the variance

formula:

2 2 2 2var( ) ( ) ( ) ( ) var( ) ( )Z E Z E Z E Z Z E Z= - fi = +

Substituting we get:

2 2( ) 4 6 40E Z = + =



Solution 7.19

First, we need the mean:

( ) ( ) (1 0.2) (2 0.6) (3 0.2) 2E X xP X xm = = = = ¥ + ¥ + ¥ =Â

We could have spotted this result straightaway using symmetry. Hence:

{ } { } { }

3

3

3 3 3

( ) [( ) ]

( ) ( )

(1 2) 0.2 (2 2) 0.6 (3 2) 0.2

0.2 0 0.2

0

Skew X E X

X P X x

m

m

= -

= - =

= - ¥ + - ¥ + - ¥

= - + +

=

Â



Solution 7.20

To calculate the coefficient of skewness we first require the mean:

( ) ( ) (1 0.2) (2 0.3) (3 0.5) 2.3E X xP X xm = = = = ¥ + ¥ + ¥ =Â

Now calculating the skewness using the first formula:

{ } { } { }

3

3

3 3 3

( ) [( ) ]

( ) ( )

(1 2.3) 0.2 (2 2.3) 0.3 (3 2.3) 0.5

0.4394 0.0081 0.1715

0.276

Skew X E X

X P X x

m

m

= -

= - =

= - ¥ + - ¥ + - ¥

= - - +

= -

Â

or we could have calculated the skewness using our alternative formula:

2 2 2 2 2

3 3 3 3 3

( ) ( ) (1 0.2) (2 0.3) (3 0.5) 5.9

( ) ( ) (1 0.2) (2 0.3) (3 0.5) 16.1

E X x P X x

E X x P X x

= = = ¥ + ¥ + ¥ =

= = = ¥ + ¥ + ¥ =

ÂÂ

3 2 3

3

( ) ( ) 3 ( ) 2

16.1 3 2.3 5.9 2 2.3

0.276

skew X E X E Xm mfi = - +

= - ¥ ¥ + ¥

= -

To calculate the coefficient of skewness we need the variance:

2 2 2 2 2

2 2 2

( ) ( ) (1 0.2) (2 0.3) (3 0.5) 5.9

var( ) ( ) ( ) 5.9 2.3 0.61

E X x P X x

X E X E X

= = = ¥ + ¥ + ¥ =

fi = - = - =

Â

Note that 2( )E X was already calculated had we used the alternative skewness formula.

Hence, the coefficient of skewness is given by:

32

0.2760.579

0.61- = -



Solution 7.21

(i) The second moment of V is:

2 2

2 2

( ) ( )

[( 1) 0.4] [2 0.6]

2.8

E V v P V v= =

= - ¥ + ¥

=

Â

(ii) The third order central moment of V is:

3( )E V mÈ ˘-Î ˚

Calculating the mean:

( ) ( ) ( 1 0.4) (2 0.6) 0.8E V vP V vm = = = = - ¥ + ¥ =Â

Hence:

{ } { }

3 3

3 3

( ) ( ) ( )

( 1 0.8) 0.4 (2 0.8) 0.6

2.3328 1.0368

1.296

E V v P V vm mÈ ˘- = - =Î ˚

= - - ¥ + - ¥

= - +

= -

Â

The third central moment is just the skewness. (iii) The second order moment about 1 is:

{ } { }

2 2

2 2

( 1) ( 1) ( )

( 1 1) 0.4 (2 1) 0.6

1.6 0.6

2.2

E V v P V vÈ ˘- = - =Î ˚

= - - ¥ + - ¥

= +

=

Â



Solution 7.22

(i) Using U to stand for “under-funded” and N to stand for “not under-funded”, we get the following tree diagram:

U

N

U

N

U

N

U

NU

NNU

1

3

2

3

1

2

1

2

0

1

1

3

2

3

1

2=

1

3

2

3

1

2 1 =

1

3

The third branch is slightly strange – since there is only 1 pension left – it certain

to be the under-funded one. Now, at the moment we don’t have a discrete random variable – as we should be

assigning probabilities to numbers. So using X to be the number of pensions reviewed, we get the following probability distribution:

x 1 2 3

P(X = x) 13 1

3 13

(ii) The expected number of pensions reviewed is:

1 1 13 3 3

( ) ( )

(1 ) (2 ) (3 )

2

E X xP X x= =

= ¥ + ¥ + ¥

=

Â

Alternatively, we could just noticed that the distribution is symmetrical – so the

mean will be in the middle.



(iii) Using C to stand for the amount charged by the actuary, we have: 50 300C X= + The mean amount charged is given by:

( ) (50 300 )

50 300 ( ) using ( ) ( )

50 300 2 since ( ) 2

£650

E C E X

E X E aX b aE X b

E X

= +

= + + = +

= + ¥ =

=

The variance in the amount charged is:

2 2

var( ) var(50 300 )

300 var( ) using var( ) var( )

C X

X aX b a X

= +

= + =

So we need to calculate var( )X first:

2 2

2 2 21 1 13 3 3

23

( ) ( )

(1 ) (2 ) (3 )

4

E X x P X x= =

= ¥ + ¥ + ¥

=

Â

So:

2 2

223

23

var( ) ( ) ( )

4 2

X E X E X= -

= -

=

Hence:

2 2 23var( ) 300 var( ) 300 60,000C X= = ¥ =

So the standard deviation is given by:

60,000 £244.95=




P7.1 Writing out the probability distribution:

i 2- 1- 0 1 2 P(X = i) 0.2 2a- 0.2 a- 0.2 0.2 a+ 0.2 2a+

Using the fact that the probabilities sum to 1 is not helpful here: (0.2 2 ) (0.2 ) 0.2 (0.2 ) (0.2 2 ) 1a a a a- + - + + + + + =

So, we need to use the fact that each of the probabilities must lie between 0 and 1: 0 0.2 2 1 0.2 2 0.8 0.1 0.4 0.4 0.1a a a a£ - £ fi - £ - £ fi ≥ ≥ - fi - £ £

0 0.2 1 0.2 0.8 0.2 0.8 0.8 0.2a a a a£ - £ fi - £ - £ fi ≥ ≥ - fi - £ £

0 0.2 1 0.2 0.8a a£ + £ fi - £ £

0 0.2 2 1 0.2 2 0.8 0.1 0.4a a a£ + £ fi - £ £ fi - £ £

Now we have to find the range of values that a can take, so that all of these conditions are satisfied. The easiest way to do this is to sketch a number line:

0

X = – 2

X = – 1

X = 1

0.1 0.4 0.8– 0.1 – 0.4 – 0.8 – 0.2 0.2

X = 2

The shaded region shows the values that satisfy all four inequalities. Hence: 0.1 0.1a- £ £



P7.2 Again, using the fact that the probabilities sum to 1 is not helpful here. So we need to use the fact that each of the probabilities must lie between 0 and 1:

1 3 10 1 0 1 3 4 1 3 3 1

4 3

q q q q+£ £ fi £ + £ fi - £ £ fi - £ £

1

0 1 0 1 4 1 3 1 3 3 14

q q q q q-£ £ fi £ - £ fi - £ - £ fi ≥ ≥ - fi - £ £

1 2 1 1

0 1 0 1 2 4 1 2 3 14 2 2

q q q q+£ £ fi £ + £ fi - £ £ fi - £ £

1 4 1 3 3 1

0 1 0 1 4 4 1 4 34 4 4 4 4

q q q q q-£ £ fi £ - £ fi - £ - £ fi ≥ ≥ - fi - £ £

Now we have to find the range of values that q can take, so that all of these conditions are satisfied. The easiest way to do this is to sketch a number line:

0 1

4 1 1

1

2 –

1

3 –

3

4 – 3 –

1

2

The shaded region shows the values that satisfy all four inequalities. Hence:

1 1

3 4q- £ £

This is option D.



P7.3 We have:

22( ) 0,1,2,

!

w

P W w e ww

-= = =

(i) 3

2 22 8( 3) 0.18045

3! 6P W e e- -= = = =

(ii) Since W can only take the values 0, 1, 2, … this means that: ( 1) ( 2,3,4, )P W P W> = =

Eek! This will take too long! Using the fact that all the probabilities for W sum

to 1: ( 1) ( 2,3,4, ) 1 ( 0 or 1)P W P W P W> = = = - =

Since a random variable can only take one value at a time, the values are

mutually exclusive. Hence:

( 1) 1 ( 0,or 1)

1 [ ( 0) ( 1)]

P W P W

P W P W

> = - =

= - = + =

Working out each of these probabilities:

0

2 22 1( 0) 0.13534

0! 1P W e e- -= = = =

1

2 22 2( 1) 0.27067

1! 1P W e e- -= = = =

Hence:

( 1) 1 [0.13534 0.27067] 0.5940P W > = - + =



P7.4 Writing down the probability distribution for X:

6112 12

1 412 12

1 212 12

( 2) (8 2)

( 4) (8 4)

( 6) (8 6)

P X

P X

P X

= = - =

= = - =

= = - =

(i) Adding up the probabilities as we go along, the graph will be:

0

8

12

0 2 4 61 3 5

FX (x)

x

2

12

4

12

6

12

10

12

1

(ii) From this we get the cumulative distribution function to be:

612

1012

0 2

2 4( )

4 6

1 6

X

x

xF x

x

x

<ÏÔÔ £ <Ô= ÌÔ £ <ÔÔ £Ó



P7.5 Before we can find the mean, mode and median we need to determine k. Writing out the probability distribution of Y , we get:

y 1 2 3 4

P(Y = y) k 12 k 1

3 k 14 k

Since the sum of the probabilities must be 1, we have:

1 1 12 3 4

2512

1225

( ) 1

1

0.48

P Y y k k k k

k

k

= = + + + =

fi =

fi = =

Â

So the probability distribution is:

y 1 2 3 4 P(Y = y) 0.48 0.24 0.16 0.12

The mean is given by:

( ) ( ) (1 0.48) (2 0.24) (3 0.16) (4 0.12) 1.92E Y yP Y y= = = ¥ + ¥ + ¥ + ¥ =Â

The mode is 1, as this value of y has the greatest probability. The graph of the cumulative distribution function is:

0 0 2 41 3 5

FY (y)

y

0.2

0.4

0.6

0.8

1

From this it is clear to see that the median is 2.



P7.6 Let X be the payment for one unit at the end of the year. The probability distribution for X is:

x £0 £2,000 £5,000 P(X = x) 0.1 0.7 0.2

The expected payment after one year is:

( ) ( )

(£0 0.1) (£2,000 0.7) (£5,000 0.2)

£2, 400

E X xP X x= =

= ¥ + ¥ + ¥

=

Â

Since there are eight units, the total expected payment after one year is: 8 £2,400 £19,200¥ =



P7.7 To calculate the mean and standard deviation we require the probabilities:

3 0 30( 0) 0.1 0.9 0.729P X C= = ¥ ¥ =

3 1 21( 1) 0.1 0.9 0.243P X C= = ¥ ¥ =

3 2 12( 2) 0.1 0.9 0.027P X C= = ¥ ¥ =

3 3 03( 3) 0.1 0.9 0.001P X C= = ¥ ¥ =

So the probability distribution is:

x 0 1 2 3 P(X = x) 0.729 0.243 0.027 0.001

The mean is:

( ) ( )

(0 0.729) (1 0.243) (2 0.027) (3 0.001)

0.3

E X xP X x= =

= ¥ + ¥ + ¥ + ¥

=

Â

To calculate the standard deviation, we require 2( )E X :

2 2

2 2 2 2

( ) ( )

(0 0.729) (1 0.243) (2 0.027) (3 0.001)

0.36

E X x P X x= =

= ¥ + ¥ + ¥ + ¥

=

Â

Hence:

2 2 2var( ) ( ) ( ) 0.36 0.3 0.27X E X E X= - = - =


0.27 0.5196=



P7.8 (i) Drawing a tree diagram will make this a bit clearer. Using A to represent “A dying” and B to represent “B dying” and N to represent “not dying” we get:

0.9

0.1

0.2

0.8

0.2

0.8

A

N

B

N

B

N

AB

AN

NN

NB

0.1 0.2 = 0.02

0.1 0.8 = 0.08

0.9 0.2 = 0.18

0.9 0.8 = 0.72

life A life B

Now, at the moment we don’t have a discrete random variable – as we should be assigning probabilities to numbers. Now we are told that if A dies, we lose £50,000 and if B dies we lose £30,000. So using X to be the amount lost, we get the following probability distribution:

outcome AB AN NB NN x £80,000 £50,000 £30,000 £0 P(X = x) 0.02 0.08 0.18 0.72

So the mean is given by:

( ) ( )

(80,000 0.02) (50,000 0.08) (30,000 0.18) (0 0.72)

£11,000

E X xP X x= =

= ¥ + ¥ + ¥ + ¥

=

Â

To find the variance we need to find 2( )E X :

2 2

2 2 2 2

( ) ( )

(80,000 0.02) (50,000 0.08) (30,000 0.18) (0 0.72)

490,000,000

E X x P X x= =

= ¥ + ¥ + ¥ + ¥

=

Â

2 2 2var( ) ( ) ( ) 490,000,000 11,000 369,000,000X E X E Xfi = - = - =


369,000,000 £19, 209=



P7.9 The mean of C is given by:

( ) (7.00 0.0742 )

7.00 0.0742 ( ) using ( ) ( )

7.00 0.0742 600 since ( ) 600

51.52

E C E N

E N E aX b aE X b

E N

= +

= + + = +

= + ¥ =

=

The variance of C is given by:

2 2

2

var( ) var(7.00 0.0742 )

0.0742 var( ) using var( ) var( )

0.0742 250 since var( ) 250

1.37641

C N

N aX b a X

N

= +

= + =

= ¥ =

=



P7.10 (i) (3 6 ) 3 6 ( ) using ( ) ( )

3 6 8 since ( ) 600

51

E U E U E aX b aE X b

E N

+ = + + = +

= + ¥ =

=

(ii) 2 2

2 2

var(8 2 ) ( 2) var( ) using var( ) var( )

4 3 since var( ) 3

36

U U aX b a X b

U

- = - + = +

= ¥ =

=

So the standard deviation is 36 6= . Alternatively, we could have used ( ) ( )sd aX b a sd X+ = ¥ :

(iii) ( )

( )

813 3

2 213

2 219

8var var

3

var( ) using var( ) var( )

3 since var( ) 3

1

UU

U aX b a X

U

-Ê ˆ = -Á ˜Ë ¯

= + =

= ¥ =

=

(iv) 2 2

2

2

( 4 7) ( ) 4 ( ) 7splitting up the expectation

( ) 4 8 7 since ( ) 8

( ) 25

E U U E U E U

E U E U

E U

- + = - +

= - ¥ + =

= -

To work this out we need 2( )E U . We don’t have the distribution, so we can’t

work it out from first principles. We use the trick of rearranging the variance formula:

2 2

2 2 2 2

var( ) ( ) ( )

( ) var( ) ( ) 3 8 73

U E U E U

E U U E U

= -

fi = + = + =

Hence:

2( 4 7) 73 25 48E U U- + = - =



P7.11 Both formulae for the skewness require the mean:

( ) ( ) (0 0.4) (1 0.6) 0.6E X xP X xm = = = = ¥ + ¥ =Â

Substituting this into the first formula gives:

{ } { }

3

3

3 3

( ) [( ) ]

( ) ( )

(0 0.6) 0.4 (1 0.6) 0.6

0.048

Skew X E X

X P X x

m

m

= -

= - =

= - ¥ + - ¥

= -

Â

or using our alternative formula gives:

2 2 2 2

3 3 3 3

( ) ( ) (0 0.4) (1 0.6) 0.6

( ) ( ) (0 0.4) (1 0.6) 0.6

E X x P X x

E X x P X x

= = = ¥ + ¥ =

= = = ¥ + ¥ =

ÂÂ

3 2 3

3

( ) ( ) 3 ( ) 2

0.6 3 0.6 0.6 2 0.6

0.048


= - ¥ ¥ + ¥

= -


2 2 2 2

2 2 2

( ) ( ) (0 0.4) (1 0.6) 0.6

var( ) ( ) ( ) 0.6 0.6 0.24

E X x P X x

X E X E X

= = = ¥ + ¥ =

fi = - = - =

Â



32

0.0480.40825

0.24- = -



P7.12 Both formulae for the skewness require the mean:

( ) ( ) ( 2 0.1) ( 1 0.1) (0 0.5) (1 0.3) 0E X xP X xm = = = = - ¥ + - ¥ + ¥ + ¥ =Â

Substituting this into the first formula gives:

{ } { } { } { }

3

3

3 3 3 3

( ) [( ) ]

( ) ( )

( 2 0) 0.1 ( 1 0) 0.1 (0 0) 0.5 (1 0) 0.3

0.8 0.1 0 0.3

0.6

Skew X E X

X P X x

m

m

= -

= - =

= - - ¥ + - - ¥ + - ¥ + - ¥

= - - + +

= -

Â

or using our alternative formula gives:

2 2 2 2 2 2

3 3 3 3 3 3

( ) ( ) (( 2) 0.1) (( 1) 0.1) (0 0.5) (1 0.3) 0.8

( ) ( ) (( 2) 0.1) (( 1) 0.1) (0 0.5) (1 0.3) 0.6

E X x P X x

E X x P X x

= = = - ¥ + - ¥ + ¥ + ¥ =

= = = - ¥ + - ¥ + ¥ + ¥ = -

ÂÂ

3 2 3

3

( ) ( ) 3 ( ) 2

0.6 3 0 0.8 2 0

0.6


= - - ¥ ¥ + ¥

= -


2 2 2 2 2 2

2 2 2

( ) ( ) (( 2) 0.1) (( 1) 0.1) (0 0.5) (1 0.3) 0.8

var( ) ( ) ( ) 0.8 0 0.8

E X x P X x

X E X E X

= = = - ¥ + - ¥ + ¥ + ¥ =

fi = - = - =

Â



32

0.60.8385

0.8- = -

Stats Pack-08: Discrete distributions Page 1


Chapter 8

Discrete distributions

Links to CT3: Chapter 4 Syllabus objectives: (ii)2. Evaluate probabilities (by calculation or by referring to tables as appropriate)

associated with distributions.

0 Introduction

In this chapter we will be looking at four discrete distributions: The uniform distribution, which assumes that all values have an equal chance of occurring. The Bernoulli distribution, which can model the outcome of a simple event. The binomial distribution, which can model the outcomes of n simple events. This is useful for modelling the number of deaths or retirements amongst a fixed number of policyholders. The Poisson distribution, which is useful for modelling the number of claims a general insurance company receives per unit time. For each of these distributions we will look at graphs of the probability functions, derive some basic results and find probabilities.

Stats Pack-08: Discrete distributions


1

23

4

1 The discrete uniform distribution

1.1 General features of the discrete uniform distribution

If we roll a fair die, we have the following probability distribution:

x 1 2 3 4 5 6

P(X = x) 16

16

16

16

16

16

The graph of this probability function is:

x 6

P(X = x)

1 3 52 4

Each whole number has the same probability of occurring. Well in the same way that people who wear uniform look the same, we call this the discrete uniform distribution. Now if we were to spin a fair four-sided spinner, we would obtain the following probability distribution:

So for 6 discrete values each had probability 16 and for 4 discrete values each had

probability 14 . So in general, if we had k discrete numbers each would have probability

of 1k .

x 1 2 3 4

P(X = x) 14

14

14

14



1.2 The PF of the discrete uniform distribution

As we have seen, the general case where we have k whole numbers each with the same probability of happening is given by:

1

( )P X xk

= = 1, 2, ,x k=

1 2 kx

P(X = x)

1

k

This is a probability function as each probability is non-negative and all the probabilities add up to 1. To get a better ‘feel’ for this distribution, we’ll look at two discrete uniform distributions with different values for k:

= 2

1 2x

P(X = x)

1

2

k

1 32 4

1

4

P(X = x)

k = 4

x

We can see that as k increases, we are sharing the total probability of 1 between more numbers. Hence, the probability of each number occurring is less.

We doubled the number of values (from 2 to 4) and so the probability is halved (from 12

to 14 ) as it is shared over twice as many numbers.



1.3 Moments of the uniform distribution

To find the moments, let’s look at the discrete uniform distribution with 5k = :

15( )P X x= = 1, 2,3,4,5x =

1 2 4x

P(X = x)

1

5

3 5

Mean Since the distribution is symmetrical the mean is in the middle (ie it is the median value): 1 2 3 4 5 ( ) 3E X =

We could also have found the mean by using our formula from Chapter 7:

( ) ( )E X xP X x= =Â

Question 8.1

Use this formula to show that the mean is 3 for:

15( )P X x= = 1, 2,3,4,5x =



Variance There’s no easy way to get the variance except to use our formula from Chapter 7:

2 2var( ) ( ) ( )X E X E X= -

Remembering that:

2 2( ) ( )E X x P X x= =Â

Question 8.2

Use the these formulae to show that the variance is 2 for:

15( )P X x= = 1, 2,3,4,5x =

Median Since the graph is symmetrical the median is also in the middle and is the same as the mean. Alternatively, we could count through the probabilities to find which value is half way through the distribution (or better still use the graph of the CDF).

Question 8.3

Count through the probabilities to show that the median is 3 (same as ( )E X ) for:

15( )P X x= = 1, 2,3,4,5x =

Mode Since ( )P X x= is the same for all values in the range there is no mode, as no value

occurs with a greater probability than the rest.



In general, the results are:

Discrete uniform distribution

1( ) 1,2, ,P X x x k

k= = =

12

2112

( ) ( 1)

var( ) ( 1)

E X k

X k

= +

= -

These are given in the Tables on page 10 and so do not need to be memorised. The proofs are a little messy and can be found in Appendix A.

1.4 Probabilities for a discrete uniform distribution

Probabilities can be calculated easily using the probability function.

Question 8.4

In a game of chance, a large spinner with the numbers 1 – 20 is spun. If the probability of getting any number is the same, calculate the probability of obtaining: (i) an even number (ii) a number greater than 18.

Since calculating probabilities is straightforward, no questions have been asked on this distribution in the Subject CT3 (or Subject 101) exam.



G

N

0.6

0.4

2 The Bernoulli distribution

2.1 General features of the Bernoulli distribution

Suppose I’m driving down the road and I come to a traffic light. The light is either green (in which case I can go) or it’s not green (in which case I’ll have to stop). If the probability that the traffic light is green when I reach it is 0.6, we have:

(green) 0.6

(not green) 0.4

P

P

=

=

Well this isn’t much of a random variable – as we should be assigning probabilities to numbers! What we’ll do is use the number 1 for a ‘success’ (ie a green traffic light) and the number 0 for a ‘failure’ (ie a not green traffic light).

( 1) 0.6

( 0) 0.4

P X

P X

= =

= =

We could think of these values for X as the number of successes; 1 success (ie 1 green traffic light) and 0 successes (ie no green traffic lights). Now we can write this random variable more concisely using a clever trick:

1( ) 0.6 0.4 0,1x xP X x x-= = =

When we have 0 successes (ie a failure):

0 1( 0) 0.6 0.4 0.4P X = = =

When we have 1 success:

1 0( 0) 0.6 0.4 0.6P X = = =



2.2 The PF of the Bernoulli distribution

In general, if we have a probability p of success, we get:

( 1)

( 0)

P X p

P X q

= =

= = where 1q p= -

or we could write this using our clever shortcut:

1( ) 0,1x xP X x p q x-= = =

This distribution is called the Bernoulli distribution (named after its creator Jacob Bernoulli) with parameter p. Since p is a probability we must have 0 1p£ £ .

The shortcut way of writing ‘X has a Bernoulli distribution with parameter p’ is: ~ ( )X Bernoulli p

So what does this distribution look like?

0 1x

P(X = x)

0

0.4

0.81

0.2

0.6

p = 0.3

0 1x

P(X = x)

0

0.4

0.81

0.2

0.6

p = 0.6

0 1x

P(X = x)

0

0.4

0.81

0.2

0.6

p = 0.5

Above are three Bernoulli distributions with different values of p. As we can see, the greater the value of p, the greater the success ( 1X = ) bar and the smaller the failure ( 0X = ) bar.



2.3 Moments of the Bernoulli distribution

Mean We can find the mean by using our formula from Chapter 7:

( ) ( )E X xP X x= =Â

Question 8.5

Use this formula to show that ( )E X p= for the Bernoulli distribution:

( 1)

( 0)

P X p

P X q

= =

= =

Variance We can find the variance using the formula from Chapter 7 is:

2 2var( ) ( ) ( )X E X E X= -

where:

2 2( ) ( )E X x P X x= =Â

Question 8.6

Use these formulae to show that var( )X pq= for the Bernoulli distribution:

( 1)

( 0)

P X p

P X q

= =

= =



Median We can get the median by counting through the probabilities to find which of the two values is half way through the distribution. Mode The mode is whichever value (0 or 1) has the greatest probability. In summary:

Bernoulli distribution, ( )Bernoulli p

( 1)

( 0)

P X p

P X q

= =

= = or 1( ) 0,1x xP X x p q x-= = =

( )

var( )

E X p

X pq

=

=

We will see later that the Bernoulli distribution is a special case of the Binomial distribution. We can then use the Binomial formulae on page 11 of the Tables rather than memorising these results.

2.4 Probabilities of an Bernoulli distribution

There are no probabilities to calculate as ( 0)P X = and ( 1)P X = are both given in the

definition of the Bernoulli distribution! Hence, there have been no questions on this distribution in the Subject CT3 (or Subject 101) exam.



G

N

0.6

0.4

3 The binomial distribution

3.1 General features of the binomial distribution

We used the Bernoulli distribution to model a set of traffic lights:

(green) 0.6

(not green) 0.4

P

P

=

=

What happens if I pass through, say, 4 sets of traffic lights (where the probability that each set is green is 0.6 and all sets are independent)? To calculate the probabilities of getting 0, 1, 2, 3 or 4 green lights we draw a tree diagram like we did in Chapter 6:

0.6

0.4

0.6

0.4

0.6

0.4

0.6

0.4

0.6

0.6

0.6

0.4

0.6

0.40.6

0.40.6

0.4

0.6

0.4

0.6

0.4

0.6

0.40.6

0.40.6

0.4

0.4

0.4

G

N

G

N

G

N

G

G

G

G

N

N

N

N

G

G

G

G

G

G

G

G

N

N

N

N

N

N

N

N

GGGG

GGNG

GNGG

GNNG

NGGG

NGNG

NNGG

NNNG

GGGN

GGNN

GNGN

GNNN

NGGN

NGNN

NNGN

NNNN

Then to get the probability of, say, GGGN we multiply the probabilities on the appropriate branches: ( ) 0.6 0.6 0.6 0.4 0.0864P GGGN = ¥ ¥ ¥ =



Well this isn’t much of a random variable as we should be assigning probabilities to numbers. What we’ll do is to count the number of ‘successes’ (ie green traffic lights) like we did for the Bernoulli distribution. For our example, if X is the number of ‘successes’ for our 4 sets of traffic lights it will range from 0 (NNNN) up to 4 (GGGG). So how can we work out probabilities for this distribution? Let’s find out the probability of 1 ‘success’ (ie 1 out the 4 sets of traffic lights were green) for our example. The options we require are: GNNN 0.6 0.4 0.4 0.4 0.0384¥ ¥ ¥ = NGNN 0.4 0.6 0.4 0.4 0.0384¥ ¥ ¥ = NNGN 0.4 0.4 0.6 0.4 0.0384¥ ¥ ¥ = NNNG 0.4 0.4 0.4 0.6 0.0384¥ ¥ ¥ = Hence, the total probability is 4 0.0384 0.1536¥ = . Recall from Chapter 6 that there was a quicker way of obtaining this result. We first note that each option has the same probability since 1 ‘success’ consists of 1 green and 3 not green lights:

30.6 0.4 0.0384¥ = We then need to calculate how many different ways there are of choosing 1 green set from the 4 sets of traffic lights. Recall from Chapter 6 that this is:

41 4C =

Hence, the probability is given by:

4 31( 1) 0.6 0.4 4 0.0384 0.1536P X C= = ¥ ¥ = ¥ =

Question 8.7

Write down the expressions to calculate ( 2)P X = and ( 3)P X = .

We call each set of traffic lights a trial.



What happens if I change the number of trials (ie sets of traffic lights) and the probability of success (ie probability of each set being green)? For example, let’s have 10 sets of lights. Now the probability that each set is green when I reach them is 0.8. Let X be the number of successes (ie the number of green sets). To calculate the probability of 3 successes, we first note that we require all the options where we have 3 successes (green sets) and 7 failures (not green sets):

3 70.8 0.2¥ The number of different ways of choosing 3 successes (green sets) from the 10 trials (ie sets of traffic lights) is:

103C

Hence, we have:

10 3 73( 3) 0.8 0.2 0.000786P X C= = ¥ ¥ = (3 SF)

Question 8.8

X is the number of successes in 20 trials (ie sets of traffic lights). The probability of success (ie probability of each set being green) is 0.9. Calculate: ( 13)P X =

In general, if we have n trials (ie sets of traffic lights) each of which has a probability of success (ie being green) of p, then the probability of x successes is:

(1 )n x n xxC p p -¥ ¥ -

This is the binomial distribution with parameters n and p.

number of ways of choosing x successes

from n ‘trials’

probability of x successes

probability of n x- failures



3.2 The PF of the binomial distribution

The binomial distribution has two parameters: n (the number of trials) and p (the probability of success in each trial). Its probability function is:

( ) (1 ) 0,1,2, , 0 1n x n xxP X x C p p x n p-= = - = < <

Subject CT3 writes nxC as

n

x

Ê ˆÁ ˜Ë ¯

. It means exactly the same; it’s just an alternative way

of writing combinations. The shortcut way of writing ‘ X has a binomial distribution with parameters n and p’ is: ~ ( , )X Bin n p

Remember that X is the random variable for the number of successes in n trials. To get a better feel for this distribution, we’ll look at a number of binomial distributions with different values for n and p.

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

x

P(X

=x)

In the graph above, we have 10 trials and probability of success of 0.2. We can see that, since the probability of success is so low, we are most likely to get only a small number of successes (1, 2 or 3) out of our 10 trials.



What happens as we increase the probability of success to, say 0.5:

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

x

P(X

=x)

We can see that we are more likely to get a higher number of successes than before as we have a higher probability of success. We can also see that because 0.5p = the

graph is symmetrical. Increasing p to 0.8 we get:

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

x

P(X

=x)

We are now at the other extreme. Since the probability of success is high we are more likely to get successes from most of our 10 trials. Note also how (10,0.2)Bin is a

reflection of (10,0.8)Bin .



3.3 Moments of the binomial distribution

Mean

Finding the mean using ( ) ( )E X xP X x= =Â is an algebraic nightmare best avoided.

Recall from Section 2 that the mean of a Bernoulli trial was p. The binomial distribution is the sum of n Bernoulli trials so we would expect the mean to be: ( )E X np=

This should make sense! If we have 10 traffic lights with a probability of success of 0.6, we would expect 10 0.6 6¥ = of the lights to be green. This result can be proved more rigorously but requires adding expectations of different random variables, which is covered in Chapter 6 of Subject CT3. Variance We can use a similar idea to calculate the variance of the binomial. Recall from Section 2 that the variance of a Bernoulli trial was pq . The binomial distribution is the

sum of n Bernoulli trials so the variance is: var( )X npq=

Again, this result can be proved more rigorously but requires adding variances of different random variables, which is covered in Chapter 6 of Subject CT3.

Question 8.9

The probability that an actuarial student studied for a maths degree is 0.7. An ActEd tutorial contains 12 students. State the: (i) expected number of students in a tutorial who have studied for a maths degree (ii) variance of the number of students who have studied for a maths degree.



Median There is no easy way to get the median value other than counting through the probabilities to find which value is half way through the distribution. For example, when ~ (6,0.4)X Bin we have:

x 0 1 2 3 4 5 6 P(X = x) 0.047 0.187 0.311 0.276 0.138 0.037 0.004

So we can see that the median is 2. Mode The mode is the value that has the greatest probability. Looking at either the probability distribution of ~ (6,0.4)X Bin given above or its graph below:

0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6

x

P(X

=x)

It is easy to see that the mode is 2 as this has the greatest probability and so is most likely to occur.

Question 8.10

Calculate the mode and median of ~ (3,0.2)X Bin .

0.047

0.047 0.187 0.234

0.047 0.187 0.311 0.545

so 0.5 is in here!



In summary:

Binomial distribution, ( , )Bin n p

( ) (1 ) 0,1,2, , 0 1n x n xxP X x C p p x n p-= = - = < <

( )

var( )

E X np

X npq

=

=

These are given in the Tables on page 6 and so do not need to be memorised. Note that the binomial distribution with 1n = is the Bernoulli distribution.

3.4 Probabilities for the binomial distribution

We can use the probability function to calculate probabilities. Suppose ~ (5,0.7)X Bin .

To calculate a single probability, ( 3)P X = , we just substitute the value of x in our

probability function:

5 3 23( 3) 0.7 0.3 0.3087P X C= = ¥ ¥ =

What about calculating ( 1)P X > ? Since ~ (5,0.7)X Bin there are 5 trials, so the

number of successes (ie the values that X can take) range from 0 to 5. So if 1X > , it can take the values 2,3,4 or 5. Hence: ( 1) ( 2) ( 3) ( 4) ( 5)P X P X P X P X P X> = = + = + = + =

5 2 32

5 4 14

5 5 05

( 2) 0.7 0.3 0.1323

( 3) 0.3087 from above

( 4) 0.7 0.3 0.36015

( 5) 0.7 0.3 0.16807

P X C

P X

P X C

P X C

= = ¥ ¥ =

= =

= = ¥ ¥ =

= = ¥ ¥ =

Hence: ( 1) 0.1323 0.3087 0.36015 0.16807 0.96922P X > = + + + =



There must be a quicker way! Yes, there is! Since the probabilities of all the values X can take sum to 1, we have: ( 1) ( 1) 1 ( 1) 1 ( 1)P X P X P X P X£ + > = fi > = - £

But since X can only take the values 0, 1, 2, 3, 4 or 5, we have: ( 1) ( 0) ( 1)P X P X P X£ = = + =

As there are less values to work out, this way will be (slightly) quicker:

5 0 50

5 1 41

( 0) 0.7 0.3 0.00243

( 1) 0.7 0.3 0.02835

P X C

P X C

= = ¥ ¥ =

= = ¥ ¥ =

Hence: ( 1) 1 (0.00243 0.02835) 0.96922P X > = - + =

Question 8.11

Given that ~ (9,0.75)X Bin , calculate:

(i) ( 6)P X =

(ii) ( 8)P X <

(iii) (2 4)P X£ < .

There are limited tables for the cumulative distribution function, ( ) ( )F x P X x= £ , of

the binomial distribution in the Tables on pages 186 to 188. These can be used, but we need to be careful when distinguishing between signs. For example: ( 3) ( 2)P X P X< = £



3.5 Miscellaneous problems

Nearly all of the questions you come across will be presented ‘in context’ rather than stating that the random variable has a binomial distribution. The key to spotting that we have a binomial distribution is that we are counting successes from a number of independent trials (and each trial can only give 1 success). For example the number of days it rains in 20 days, questions right out of a test of 30 questions, matches won in a league, deaths from 10 policyholders, or faulty components in a box of 50.

Question 8.12

An insurance company is assessing the complexity of its claims forms by checking how many of its forms are correctly completed. The company picks a sample of 6 forms and counts the number of incorrect forms. It is believed that the probability of a form being incorrectly completed is 0.1. Calculate the probability that the company discovers: (i) exactly two incorrect forms (ii) more than 4 incorrect forms in a randomly picked sample.



4 The Poisson distribution

4.1 General features of the Poisson distribution

The exponential function can be expressed as a series expansion. For example:

2 3

12! 3!

ell ll= + + + +

We can use this to get a discrete random variable. Dividing both sides by el (or

equivalently multiply by e l- ) we get:

2 3

12! 3!

e e e el l l ll ll- - - -= + + + +

This is the same as:

0 1 2 3

10! 1! 2! 3!

e e e el l l ll l l l- - - -= + + + +

Since the terms add up to 1 (and are all positive) they could be our probabilities! We just need to write out a general expression for each term. This is:

!

x

ex

ll -

where the x can take the values 0, 1, 2, 3, … So we have:

( ) 0,1,2,3,!

x

P X x e xx

ll -= = =

This is called the Poisson distribution (after its creator Siméon Poisson).



Now at the moment we have a distribution that probably means very little to you! Let’s take a quick look at a Poisson distribution with 1.5l = :

0.000

0.100

0.200

0.300

0.400

0 1 2 3 4 5 6 7 8

x

P(X

=x)

This gives us a lovely positively skewed distribution. But what is it useful for? Well it can be shown that this is great for modelling events that occur at a rate of l per unit of time or length. For example, it can be used to model the number of claims per month, deaths per week, people joining a queue per minute, flaws per metre in a piece of metal, or accidents per mile on a road. This makes the Poisson distribution exceptionally useful in general insurance for modelling the number of claims received by an insurance office. It also has applications for modelling the number of deaths for pensions and life companies.



4.2 The PF of the Poisson distribution

In general, for events occurring at a (mean) rate of l per unit, we get:

( ) 0,1,2,3,!

x

P X x e xx

ll -= = =

This distribution is called the Poisson distribution with parameter l . The shortcut way of writing ‘ X has a Poisson distribution with parameter l ’ is: ~ ( )X Poi l

Some textbooks (and the Tables) use m instead of l to remind us that the parameter

gives us the mean rate of occurrences. To get a better feel for this distribution, we’ll look at a number of Poisson distributions with different values for l .

0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x

P(X

=x)

In the graph above, we have 2l = , that is we have a mean rate of occurrences of 2. We can see that we are most likely to get about 2 occurrences. We can see that the Poisson distribution is positively skew.



What happens as we increase l to, say 5?

0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x

P(X

=x)

Since we now have a mean rate of occurrences of 5, we can see that we are most likely to get about 5 occurrences. Since l is greater than before, we are more likely to get more occurrences. Increasing l to 8 we get:

0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x

P(X

=x)

Since we now have a mean rate of occurrences of 8, we can see that we are most likely to get about 8 occurrences. Since l is greater than before, we are more likely to get more occurrences. Notice how as l increases the distribution becomes more symmetrical.



4.3 Moments of the Poisson distribution

Mean We can find the mean by using the formula for the mean from Chapter 7:

( ) ( )E X xP X x= =Â

and by using the series expansion of xe :

2 3

12! 3!

x x xe x= + + + +

Question 8.13 (messy)

The Poisson distribution has probability function:

( ) 0,1,2,!

x

P X x e xx

ll -= = =

Use the formulae above to show that the mean is l .

This answer should be obvious: a Poisson distribution with a mean rate l has a mean of l . The mean can be obtained much more easily using moment generating functions (Chapter 5 in Subject CT3) than the method used in Question 8.13. So don’t get hung up on it!



Variance We can find the variance using our formulae from Chapter 7:

2 2 2 2var( ) ( ) ( ) where ( ) ( )X E X E X E X x P X x= - = =Â

and by using the series expansion of xe :

2 3

12! 3!

x x xe x= + + + +

This gives: var( )X l=

However, the proof is messy and can be found in Appendix B. The variance can be obtained much more easily using moment generating functions (Chapter 5 in Subject CT3) than the method used in Appendix B.

Question 8.14

The number of radioactive particles emitted per hour has a Poisson distribution with 25l = .

Find the mean and standard deviation of the number of particles emitted in an hour.



Median There is no easy way to get the median value other than counting through the probabilities to find which value is half way through the distribution. For example, when ~ (2)X Poi we have:

x 0 1 2 3 4 5 … P(X = x) 0.135 0.271 0.271 0.180 0.090 0.036 …

So we can see that the median is 2. Mode The mode is the value that has the greatest probability. Looking at either the probability distribution of ~ (2)X Poi given above or its graph below:

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

x

P(X

=x)

It is easy to see that 1 and 2 are both modes as these have the greatest probability of occurring.

Question 8.15

Calculate the median and mode of ~ (1)X Poi .

0.135

0.135 0.271 0.406

0.135 0.271 0.271 0.677

so 0.5 is in here!



In summary:

Poisson distribution, ( )Poi l

( ) 0,1,2,!

x

P X x e xx

ll -= = =

( )

var( )

E X

X

l

l

=

=

These results are given in the Tables on page 7 and so do not need to be memorised.

4.4 Probabilities of a Poisson distribution

We can use the probability function to calculate probabilities. Suppose ~ (3)X Poi , our

probability function is:

33( ) 0,1,2,

!

x

P X x e xx

-= = =

To calculate a single probability, ( 2)P X = , we just substitute the value into our

probability function:

2

3 33 9( 2) 0.22404

2! 2P X e e- -= = = =

What about calculating ( 2)P X ≥ ? Since ~ (3)X Poi can take values 0,1,2, this

means: ( 2) ( 2) ( 3) ( 4)P X P X P X P X≥ = = + = + = +

Eek! There’s no way we can work it out this way! So we need to use the fact that probabilities of all the values X can take sum to 1: ( 2) ( 2) 1 ( 2) 1 ( 2)P X P X P X P X< + ≥ = fi ≥ = - <



Since X can take the values 0,1,2,3, we have:

( 2) ( 0) ( 1)P X P X P X< = = + =

All we have to do is work out ( 0)P X = and ( 1)P X = :

03 3

13 3

3( 0) 0.04979

0!

3( 1) 3 0.14936

1!

P X e e

P X e e

- -

- -

= = = =

= = = =

Hence:

( 2) 0.04979 0.14936 0.19915

( 2) 1 0.19915 0.80085

P X

P X

< = + =

fi ≥ = - =

Question 8.16

Given that ~ (1.8)X Poi , calculate:

(i) ( 4)P X =

(ii) ( 1)P X ≥

(iii) (2 5)P X£ < .

There are tables for the cumulative distribution function, ( ) ( )F x P X x= £ , of the

Poisson distribution in the Tables on pages 175 to 185. The parameter of the Poisson distribution is written down the side and the value of the random variable, ie x, is written across the top of the table. The numbers in the main part of the table are values for ( )P X x£ .

These can be used, but we need to be careful when distinguishing between signs. For example: ( 3) ( 2)P X P X< = £




‘In context’ questions Many of the questions you come across will be presented ‘in context’ rather than stating that the random variable has a Poisson distribution. The key to spotting that we have a Poisson distribution (as opposed to a binomial distribution) is that we can have an infinite number of occurrences per unit time/length. The question will refer to a rate (or mean number) of occurrences per unit time/length. This rate (or the mean) gives us the value of l for that unit time/length. For example, if we are told that telephone calls to a switchboard arrive at a rate of 4 per hour this means that 4l = . We then use the Poisson distribution to work out probabilities of calls per hour.

Question 8.17

The mean number of accidents (in a year) on a twenty-mile stretch of motorway is 5. Calculate the probability that: (i) there are exactly 7 accidents on the twenty miles of road in any given year (ii) there are more than 3 accidents on the twenty miles of road in any given year.



Conditional probabilities We can work out conditional probabilities involving the Poisson distribution using the conditional probability formula from Chapter 5:

( and )

( | )( )

P A BP A B

P B=

For example, to calculate ( 1| 2)P X X= < where ~ (3)X Poi :

( 1 and 2) ( 1)

( 1| 2)( 2) ( 2)

P X X P XP X X

P X P X

= < == < = =< <

Now: ( 2) ( 0) ( 1) 0.04979 0.14936 0.19915P X P X P X< = = + = = + =

since:

03

13

3( 0) 0.04979

0!

3( 1) 0.14936

1!

P X e

P X e

-

-

= = =

= = =

So:

( 1) 0.14936

( 1| 2) 0.750( 2) 0.19915

P XP X X

P X

== < = = =<



Changing the units Let’s look at our telephone example again. We had telephone calls to a switchboard arriving at a rate of 4 per hour. Our distribution was: ~ (4)X Poi calls per hour

A common trick used by the examiners is to change the unit. For example, they could ask us to calculate the probability that we have 10 calls in 5 hours. Our unit has changed from 1 hour to 5 hours. This means our distribution needs to change as well. A bit of common sense tells us that it we have calls arriving at 4 per hour on average, then we will have 20 calls per 5 hours on average. So to calculate probabilities for 5 hours, we use: ~ (20)X Poi calls per 5 hours

Question 8.18

(i) A small company receives home insurance claims at a rate of 3 per month. Calculate the probability that they receive 20 claims in a year.

(ii) The same company receives car insurance claims at a rate of 4.8 per working

week. Calculate the probability that they receive no claims in a day.



Adding Poisson distributions Suppose a company receives on average 2 travel insurance claims in a week and on average 1 pet insurance claims in a week. So we have: ~ (2)X Poi travel insurance claims per week

~ (1)Y Poi pet insurance claims per week

How can we work out the probability that the company has a total of 4 claims of any type in a week? Well, a bit of common sense tells us that since we have 2 travel and 1 pet claim per week on average, we will have a total of 3 claims of any type per week on average: (3)Poi total number of claims of any type per week

What we have shown is that we can add Poisson distributions together: “ (2) (1) ~ (3)Poi Poi Poi+ ”

Examiners sometimes use this trick in questions:

Question 8.19

Car insurance policyholders make claims at a rate of 0.8 per year. Calculate the probability that 5 policyholders make a total of 3 claims in a year.



5 Appendix A – mean and variance for = = 1( )k

P X x

For the discrete uniform distribution the probability function is:

1

( )P X xk

= = 1, 2, ,x k=

Mean The mean can be found using:

( ) ( )E X xP X x= =Â

This gives:

( ) ( ) ( )1 1 1

1

( ) 1 2

(1 2 )

k k k

k

E X k

k

= ¥ + ¥ + + ¥

= + + +

We now use the pure maths result that:

121 2 ( 1)n n n+ + + = +

This gives:

1 1 12 2( ) ( 1) ( 1)kE X k k k= ¥ + = +

An alternative method is to note that the mean is the middle value (ie the median) due to

symmetry. We then recall from Chapter 2 that the median was the 12 ( 1)n + th value.

Now since the values are just 1,2, the 12 ( 1)n + th value is 1

2 ( 1)n + .



Variance The variance is found using:

2 2var( ) ( ) ( )X E X E X= -

where:

2 2( ) ( )E X x P X x= =Â

This gives:

( ) ( ) ( )2 2 2 21 1 1

2 2 21

( ) 1 2

(1 2 )

k k k

k

E X k

k

= ¥ + ¥ + + ¥

= + + +

We now use the pure maths result that:

2 2 2 161 2 ( 1)(2 1)n n n n+ + + = + +

This gives:

2 1 1 16 6( ) ( 1)(2 1) ( 1)(2 1)kE X k k k k k= ¥ + + = + +

Substituting, we get:

2 2

21 16 2

21 16 4

2 21 16 4

2 21 1 1 1 1 13 2 6 4 2 4

21 112 12

2112

var( ) ( ) ( )

( 1)(2 1) ( 1)

( 1)(2 1) ( 1)

(2 3 1) ( 2 1)

( 1)

X E X E X

k k k

k k k

k k k k

k k k k

k

k

= -

È ˘= + + - +Î ˚

= + + - +

= + + - + +

= + + - - -

= -

= -



6 Appendix B – variance of ~ ( )X Poi l

For the Poisson distribution the probability function is:

( ) 0,1, 2,!

x

P X x e xx

ll -= = =

The variance is found using:

2 2 2 2var( ) ( ) ( ) where ( ) ( )X E X E X E X x P X x= - = =Â

This gives:

( ) ( )2 3

2 2 2 2 2

2 3 4

2 3

2 3

( ) 0 1 2 32! 3!

4 9 16

2! 3! 4!

4 9 161

2! 3! 4!

3 41 2

2! 3!

E X e e e e

e e e e

e

e

l l l l

l l l l

l

l

l ll

l l ll

l l ll

l ll l

- - - -

- - - -

-

-

Ê ˆ Ê ˆ= ¥ + ¥ + ¥ + ¥ +Á ˜ Á ˜Ë ¯ Ë ¯

= + + + +

Ï ¸Ô Ô= + + + +Ì ˝Ô ÔÓ ˛

Ï ¸Ô Ô= + + + +Ì ˝Ô ÔÓ ˛

Recall that the exponential series is:

2 3

12! 3!

ell ll= + + + +

The part is the brackets isn’t quite the exponential series – so we’ll split it up into the exponential series and a bit left over:

2 32

2 3

( ) 12! 3!

2 3

2! 3!

E X e l l ll l

l ll

- ÏÔ= + + + +ÌÔÓ

Ô+ + + + ˝Ô

The first series is just el . We can now simplify the left over bit.



{ }

2 32

32

2

2 3( )

2! 3!

2!

12!

E X e e

e e

e e

e e e

l l

l l

l l

l l l

l ll l

ll l l

ll l l

l l

-

-

-

-

Ï ¸Ô Ô= + + + +Ì ˝Ô ÔÓ ˛

Ï ¸Ô Ô= + + + +Ì ˝Ô ÔÓ ˛

Ï ¸Ê ˆÔ Ô= + + + +Ì ˝Á ˜Ë ¯Ô ÔÓ ˛

= +

Multiplying out the bracket gives:

2 0 2 0

2

( )E X e el l

l l

= +

= +

Therefore the variance is given by:

2 2

2 2

var( ) ( ) ( )

( )

X E X E X

l l l

l

= -

= + -

=



Extra practice questions Section 3: The binomial distribution

P8.1 Subject C1, September 1997, Q2 (adapted) In a particular road it is estimated that there is a 25% chance that any specified house will be burgled over a period of two years, independently for each house. There are six houses in the road. Calculate the probability that fewer than two houses will be burgled over the period. [3]

P8.2 Subject C1, September 1998, Q2 (adapted) The probability of suffering a side effect from a certain flu vaccine is 0.005. If 1,000 persons are inoculated, calculate the probability that at most 1 person suffers a side effect. [2]

P8.3 Subject C1, April 1995, Q5 Insurance policies of a particular type covering risks related to accidents are such that each policy, independently of all others, has probability 0.025 of giving rise to a claim exceeding £100,000 in a year. Consider the claims experience of a random sample of 500 such policies over a year. The most likely number of policies in the sample to give rise to claims exceeding £100,000 is: A 11 B 12 C 13 D 14 [3]

P8.4 Subject 101, September 2002, Q1 A very crude model for the distribution of claim size, X, in a particular situation represents X as a discrete random variable, which takes the values £5,000, £10,000, and £20,000 with probabilities 0.4, 0.5, and 0.1 respectively. Calculate the probability that of five randomly selected claims, three are for £5,000 each and the other two are for larger amounts. [2]



P8.5 Subject C1, April 1998, Q5 (adapted) In a sampling inspection scheme 3 bottles in every box of 12 are examined. If all 3 selected are found to be faulty, the remaining 9 bottles are examined. If at least one of the first 3 is satisfactory, no further bottles from the box are examined. Assume that each bottle examined has the same chance independently of being faulty, equal to 0.4. When a box is inspected under this sampling scheme, work out the probability that 5 faulty bottles are found. [3] Section 4: The Poisson distribution

P8.6 Subject C1, April 1997, Q1 (adapted)

If X has a Poisson distribution with mean 5, what is the value of 2( )E X ? [2]

P8.7 Subject C1, April 1995, Q6 (adapted) For a certain class of non-life insurance business, the number of claims per policy in a year has a Poisson distribution with mean 0.2. What is the probability that there are a total of exactly 3 claims in a year under a group of 12 identical and unrelated such policies? [2]

P8.8 Subject C1, April 1994, Q7 (adapted) For a certain type of insurance business, the number of claims per policy in a year has a Poisson distribution with mean 0.4. Consider a policy, which you know, has given rise to at least one claim in the last year. What is the probability that this policy has in fact given rise to exactly two claims in the least year? [3]



P8.9 Subject C1, September 1997, Q1 (adapted) The number of demands made on a service team each day has a Poisson distribution with mean 2. Under current arrangements the service team can handle, at most, 3 demands per day and no demands are carried forward. Calculate the mean number of demands handled by the service team per day. [3]



Chapter 8 Summary The binomial distribution If X has a binomial distribution with parameters n and p then we write ~ ( , )X Bin n p .

It models the number of ‘successes’ from n trials where each trial has a probability p of success. It has probability function:

( ) (1 ) 0,1,2, , 0 1n x n xxP X x C p p x n p-= = - = < <

The moments are: ( ) var( )E X np X npq= =

Probabilities can be found by using the probability function, ( )P X x= , or by using the

(limited) tables of ( ) ( )F x P X x= £ on pages 186-188 of the Tables.

The Poisson distribution If X has a Poisson distribution with parameter l then we write ~ ( )X Poi l . It

models the number of occurrences . It has probability function:

( ) 0,1,2,3,!

x

P X x e xx

ll -= = =

The moments are:

( ) var( )E X Xl l= =

Probabilities can be found by using the probability function, ( )P X x= , or by using the

tables of ( ) ( )F x P X x= £ on pages 175-185 of the Tables.

If the unit changes, we can change the Poisson distribution. eg (5)Poi = claims per month (60)Poifi = claims per year

We can also add Poisson distributions: eg (3) (5) ~ (8)Poi Poi Poi+




We have:

15( ) 1,2,3,4,5P X x x= = =

So the mean is given by:

( ) ( ) ( ) ( ) ( )1 1 1 1 15 5 5 5 5

155

( ) ( )

1 2 3 4 5

3

E X xP X x= =

= ¥ + ¥ + ¥ + ¥ + ¥

= =

Â

Solution 8.2

We have:

15( ) 1,2,3,4,5P X x x= = =

The variance is given by:

2 2var( ) ( ) ( )X E X E X= -

So we need 2( )E X :

( ) ( ) ( ) ( ) ( )2 2

2 2 2 2 21 1 1 1 15 5 5 5 5

555

( ) ( )

1 2 3 4 5

11

E X x P x x= =

= ¥ + ¥ + ¥ + ¥ + ¥

= =

Â

2 2

2

var( ) ( ) ( )

11 3

2

X E X E Xfi = -

= -

=



Solution 8.3

The probabilities are:

x 1 2 3 4 5

P(X = x) 15 1

5 15 1

5 15

Counting through these gives:

x 1 2 3 4 5

P(X = x) 15 1

5 15 1

5 15

We can see that the median is 3.

15

0.2

1 1 25 5 5

0.4

31 1 15 5 5 5

0.6

so 0.5 is in here!



Solution 8.4

The probability distribution is:

120( ) 1,2,3, , 20P X x x= = =

(i) The probability is:

1 1 120 20 20

1020

12

( even) ( 2, 4,6, , 20)

( 2) ( 4) ( 20)

P X P X

P X P X P X

= = =

= = + = + + =

= + + +

=

=

(ii) The probability is:

1 120 20

110

( 18) ( 19,20)

( 19) ( 20)

0.1

P X P X

P X P X

> = =

= = + =

= +

= =

Solution 8.5

We have:

( 1)

( 0)

P X p

P X q

= =

= =


( ) ( )

(1 ) (0 )

E X xP X x

p q

p

= =

= ¥ + ¥

=

Â



Solution 8.6

We have:

( 1)

( 0)

P X p

P X q

= =

= =

The variance is given by:

2 2var( ) ( ) ( )X E X E X= -

So we need 2( )E X :

( ) ( )2 2

2 2

( ) ( )

1 0

E X x P X x

p q

p

= =

= ¥ + ¥

=

Â

Hence:

2 2

2

var( ) ( ) ( )

(1 )

X E X E X

p p

p p

pq

= -

= -

= -

=



Solution 8.7

For 2 successes we have 2 green and 2 not green lights with probability:

2 20.6 0.4¥ We then need to calculate how many different ways there are of choosing 2 green sets from the 4 sets of traffic lights:

42C

Hence, the probability is given by:

4 2 22( 2) 0.6 0.4P X C= = ¥ ¥

Similarly, we get:

4 3 13( 4) 0.6 0.4P X C= = ¥ ¥

Solution 8.8

We want 13 green lights out of 20 lights. This gives a probability of:

20 13 713( 13) 0.9 0.1 0.00197P X C= = ¥ ¥ =



Solution 8.9

We have 12 students (trials) with a probability of studying maths (success) of 0.7. So we have: ~ (12,0.7)X Bin

(i) The mean is given by ( )E X np= , therefore:

( ) 12 0.7 8.4E X = ¥ =

(ii) The variance is given by var( )X npq= , therefore:

var( ) 12 0.7 0.3 2.52X = ¥ ¥ =

Solution 8.10

First, we need to calculate the probabilities:

3 0 30

3 1 21

3 2 12

3 3 03

( 0) 0.2 0.8 0.512

( 1) 0.2 0.8 0.384

( 2) 0.2 0.8 0.096

( 3) 0.2 0.8 0.008

P X C

P X C

P X C

P X C

= = ¥ ¥ =

= = ¥ ¥ =

= = ¥ ¥ =

= = ¥ ¥ =

The mode is the value with the greatest probability, which is 0. To find the median, we need to count through the probabilities until we get to 0.5. We can see that we are there in the very first value. So the median is also 0.



Solution 8.11

We have ~ (9,0.75)X Bin , so our probability function is:

9 9( ) 0.75 0.25 0,1, ,9x xxP X x C x-= = ¥ ¥ =

(i) Substituting directly into the probability function, we get:

9 6 36( 6) 0.75 0.25 0.23360P X C= = ¥ ¥ =

(ii) Now since X can take the values from 0 to 9, it will take too long to calculate

( 8)P X < using:

( 8) ( 0) ( 1) ( 7)P X P X P X P X< = = + = + + =

So we use the fact that all the probabilities sum to 1: ( 8) 1 ( 8)P X P X< = - ≥

Now:

( 8) ( 8) ( 9)

0.22525 0.07508 0.3003

P X P X P X≥ = = + =

= + =

where:

9 8 18

9 9 09

( 8) 0.75 0.25 0.22525

( 9) 0.75 0.25 0.07508

P X C

P X C

= = ¥ ¥ =

= = ¥ ¥ =

Hence:

( 8) 1 ( 8) 1 0.3003 0.6997P X P X< = - ≥ = - =

(iii) (2 4) ( 2) ( 3)

0.00124 0.00865 0.00989

P X P X P X£ < = = + =

= + =

where:

9 2 72

9 3 63

( 2) 0.75 0.25 0.00124

( 3) 0.75 0.25 0.00865

P X C

P X C

= = ¥ ¥ =

= = ¥ ¥ =



Solution 8.12

We have 6 forms (trials) each with a 0.1 chance of being incorrectly filled in (success). So we have ~ (6,0.1)X Bin with probability function is:

6 6( ) 0.1 0.9 0,1, ,6x xxP X x C x-= = ¥ ¥ =

(i) Substituting directly into the probability function, we get:

6 2 42( 2) 0.1 0.9 0.098415P X C= = ¥ ¥ =

(ii) Now since X can take the values from 0 to 6, we have:

( 4) ( 5) ( 6)

0.000054 0.000001

0.000055

P X P X P X> = = + =

= +

=

where:

6 5 15

6 6 06

( 5) 0.1 0.9 0.000054

( 6) 0.1 0.9 0.000001

P X C

P X C

= = ¥ ¥ =

= = ¥ ¥ =



Solution 8.13

For the Poisson distribution the probability function is:

( ) 0,1, 2,!

x

P X x e xx

ll -= = =

The mean is found using:

( ) ( )E X xP X x= =Â

This gives:

( ) ( )2 3

3 42

2 3

( ) 0 1 2 32! 3!

2! 3!

12! 3!

E X e e e e

e e e e

e

l l l l

l l l l

l

l ll

l ll l

l ll l

- - - -

- - - -

-

Ê ˆ Ê ˆ= ¥ + ¥ + ¥ + ¥ +Á ˜ Á ˜Ë ¯ Ë ¯

= + + + +

Ï ¸Ô Ô= + + + +Ì ˝Ô ÔÓ ˛

Recall that the exponential series is:

2 3

12! 3!

ell ll= + + + +

So we have:

( )E X e el ll

l

-=

=



Solution 8.14

We have:

~ (25)X Poi

So:

( ) 25

var( ) 25

E X

X

l

l

= =

= =


25 5= Solution 8.15

The probabilities are:

x 0 1 2 3 4 5 … P(X = x) 0.368 0.368 0.184 0.061 0.015 0.003 …

So the median is 1. The mode is the value with the highest probability. Here we can see that both 0 and 1 are the mode.

0.368 0.368 0.368 0.736

so 0.5 is in here!



Solution 8.16

(i) Using the probability function:

4

1.81.8( 4) 0.07230

4!P X e-= = =

(ii) We have:

0

1.81.8( 1) 1 ( 0) 1 0.83470

0!P X P X e-≥ = - = = - =

(iii) We have:

2 3 41.8 1.8 1.8

(2 5) ( 2) ( 3) ( 4)

1.8 1.8 1.8

2! 3! 4!

0.26778 0.16067 0.07230

0.5008

P X P X P X P X

e e e- - -

£ < = = + = + =

= + +

= + +

=

Solution 8.17

The number of accidents in a year on the twenty-mile stretch of road can be modelled as a Poisson distribution with parameter 5.

(i) 7

55( 7) 0.10444

7!P X e-= = =

(ii)

( )( )

( 3) 1 ( 3)

1 ( 0) ( 1) ( 2) ( 3)

1 0.00674 0.03369 0.08422 0.14037

0.7350

P X P X

P X P X P X P X

> = - £

= - = + = + = + =

= - + + +

=



Solution 8.18

(i) The number of insurance claims per month has a Poisson distribution with parameter 3.

Therefore the number of insurance claims per year has a Poisson distribution

with parameter 3 12 36¥ = . So:

20

3636( 20) 0.00127

20!P X e-= = =

(ii) The number of car insurance claims per week has a Poisson distribution with

parameter 4.8. Therefore the number of insurance claims per day has a Poisson distribution

with parameter 4.8 5 0.96∏ = . So:

0

0.960.96( 0) 0.38289

0!P X e-= = =

Solution 8.19

The number of claims per year for each policyholder has a Poisson distribution with parameter 0.8. The number of claims per year for five policyholders has a Poisson distribution with parameter 4. We have:

3

44( 3) 0.19537

3!P X e-= = =




P8.1 Let X be the number of houses burgled over the period. Then ~ (6,0.25)X Bin .

6 0 6 6 1 50 1

( 2) ( 0) ( 1)

0.25 0.75 0.25 0.75

0.17798 0.35596 0.534

P X P X P X

C C

< = = + =

= ¥ ¥ + ¥ ¥

= + =

P8.2 Let X be the number of people suffering a side effect. Then ~ (1000,0.005)X Bin .

1,000 0 1,000 1,000 1 9990 1

( 1) ( 0) ( 1)

0.005 0.995 0.005 0.995

0.0066540 0.033437 0.040

P X P X P X

C C

£ = = + =

= ¥ ¥ + ¥ ¥

= + =

P8.3 Let X be the number of policies giving rise to a claim exceeding £100,000 in a year. Then ~ (500,0.025)X Bin .

The most likely number is the number of policies that has the highest probability of occurring. Therefore we need to work out the probability of observing the values given in the question:

500 11 489

11( 11) 0.025 0.975

0.109648

P X C= = ¥ ¥

=

500 12 488

12( 12) 0.025 0.975

0.114568

P X C= = ¥ ¥

=

500 13 487

13( 13) 0.025 0.975

0.110275

P X C= = ¥ ¥

=

500 14 486

14( 14) 0.025 0.975

0.098359

P X C= = ¥ ¥

=

The most likely number of policies is 12.



P8.4 Let X be the number of claims for £5,000 (as opposed to over £5,000). Then ~ (5,0.4)X Bin .

5 3 2

3( 3) 0.4 0.6

0.2304

P X C= = ¥ ¥

=

P8.5 The probability that the first three are faulty is 30.4 . Otherwise we won’t continue the inspection and won’t find 5. We then require that 2 out of the remaining 9 bottles are faulty. The number of faulty bottles out of 9 has a binomial distribution with parameters 9n = and 0.4p = .

The required probability is then 3 9 2 720.4 0.4 0.6 0.0103C¥ ¥ ¥ = .

P8.6 Using the formula for the variance backwards:

2 2( ) var( ) ( )E X X E X= +

But the mean and variance of the (5)Poisson distribution are both 5. So:

2 2( ) 5 5 30E X = + =

P8.7 For one policy, the number of claims has a Poisson distribution with mean 0.2. This means that for 12 policies, the number of claims has a Poisson distribution with mean 2.4. Let X be the number of claims, then we want ( 3)P X = :

2.4 32.4

( 3) 0.2093!

eP X

-= = =



P8.8 Let X be the number of claims per policy in a year. We want ( 2 | 1)P X X= ≥ :

( 2 and 1) ( 2)

( 2 | 1)( 1) ( 1)

P X X P XP X X

P X P X

= £ == ≥ = =≥ ≥

We have:

0.4 20.4

( 2) 0.0536262!

eP X

-= = =

0.4 00.4

( 1) 1 ( 0) 1 1 0.670320 0.3296800!

eP X P X

-≥ = - = = - = - =

Therefore:

0.053626

( 2 | 1) 0.16270.329680

P X X= ≥ = =

P8.9 The number of demands handled by the team each day is X , where:

( ) 20P X e-= = ( ) 21 2P X e-= =

( ) 22 2P X e-= = ( ) 23 1 5P X e-= = -

So the mean of this random variable is:

( )2 2 2 2 2[ ] 0 1 2 2 2 3 1 5 3 9 1.782E X e e e e e- - - - -= ¥ + ¥ + ¥ + - = - =

Stats Pack-09: Continuous random variables Page 1


Chapter 9

Continuous random variables

Links to CT3: Chapter 3 Syllabus objectives: (ii)2. Explain what is meant by a continuous random variable, define the distribution

function and the probability density function of such a variable, and use these functions to calculate probabilities.

(ii)3. Define the expected value of a function of a random variable, the mean, the

variance, the standard deviation, the coefficient of skewness and the moments of a random variable, and calculate such quantities.

0 Introduction

In Chapter 7 we met discrete random variables. These were variables that used probabilities to decide which discrete value they could take. In this chapter we extend this concept to random variables that can take any value within a given range. This will allow us to model heights, weights, IQ’s, claim amounts and so on. Claim amounts? Surely that’s a discrete random variable, as money can only take 1p, 2p, ….?! Well yes it is – however trying to get an appropriate function that will give answers to an exact amount of pounds and pence is a nightmare. So what we do is pretend that it’s continuous and use a function to model claim amounts. We then round our answers to the nearest penny.

Stats Pack-09: Continuous random variables


It’s a good bluff since in real life the amounts we will be dealing with will be in the millions and so numbers given to the nearest penny are pretty much continuous, eg: £14.27568134 million So whereas discrete random variables can model the number of claims and the number of deaths, continuous random variables can be used to model the amount of the claims or the exact age at death. This chapter covers the same material as Chapter 7 (eg mean, mode, median, variance, skewness) but just for continuous random variables. The sections in this chapter have been numbered in the same way as Chapter 7 so that you can easily refer back.



1 Random variables

In Chapter 7, we defined a random variable to be a variable that assigns a probability to each possible value. We also met discrete random variables, which were random variables that could only take certain numerical values (ie discrete values).

1.1 Continuous random variables

Recall from Chapter 1 that continuous data could take any value within a given range. For example, time measured from now can take any positive value. A continuous random variable is a random variable that can take any numerical value within a given range. For example, we could model the lifetime of a battery (in hours) using a continuous random variable. Like before, we use a capital letter, eg X, to stand for the random variable and its lower case equivalent, eg x, to stand for a value that it takes.

1.2 Graphs for continuous random variables

In Chapter 1, we recorded discrete data in a frequency distribution and drew a simple bar chart. In Chapter 8, we extended this by recording discrete random variables in a probability distribution and drawing simple bar charts:

x 1 2 3 4 5 6

P(X = x) 16

16

16

16

16

16

x 6

P(X = x)

1 3 52 4

1

6

Notice how the heights give the probabilities of each of the values.



So what do we do for continuous random variables? In Chapter 1 we recorded continuous data in grouped frequency distributions:

Claim amount (x) Frequency 0 500x£ < 6

500 1,000x£ < 10

1,000 1,500x£ < 9

1,500 2,000x£ < 8

2,000 4,000x£ < 7

We then represented these data using a histogram:

0 1,000 2,000 3,000 4,000

claim amount (£)

0.004

0.008

0.012

0.016

0.020

Fre

quen

cy d

ensi

ty

Recall that the area gave the frequencies of each of the groups. We can do the same thing for continuous random variables, but using probabilities instead of frequencies:

0 1,000 2,000 3,000 4,000

claim amount (£)

0.0001

0.0002

0.0003

0.0004

0.0005

Pro

babi

lity

den

sity



So our histogram will have probability density on the y-axis and the area of the bars will give the probabilities of each group. This is not totally satisfactory, as we’d prefer to calculate the exact claim amount rather than saying that it’s in a particular group. So we use a continuous function, ( )f x , to

obtain the probability densities. Unsurprisingly, it is called the probability density function. For example, the claim amounts in the histogram above can be modelled by:

1.5 0.0018( ) xf x cx e-=

where c is a messy constant. (PS Don’t panic about where this formula came from – I actually used it to generate the claim amounts – hence I’m fairly sure that it’s a good model!) This gives us the following beautiful curve:

f(x)

x1000 2000 3000 4000 5000

0.0001

0.0002

0.0003

0.0004

0.0005

prob

abil

ity

dens

ity

Many students get confused and think that the probability density function, ( )f x , gives

us the probabilities. It doesn’t! It gives us the probability densities. For a histogram (which had frequency density) the area of the bars gave the frequency. In the same way, the area under our graph (which has probability density) gives us the probabilities. Recall from A-level (or equivalent) that we found the area under a graph using integration. You have been warned!



2 Probability density functions

2.1 Definition

A probability density function, ( )f x , of a random variable X is the function that

assigns probability densities to all of the values that X can take. The area under the probability density function (often abbreviated to PDF) gives us the probabilities:

Important result For a continuous random variable, X , with probability density function, ( )f x :

( ) ( )b

a

P a X b f x dx£ £ = Ú

We will practice finding probabilities in Section 2.3. In Chapter 4, the probability function, ( )P X x= , for discrete random variables had to

satisfy two conditions: ( ) 0P X x= ≥ ie no negative probabilities

and ( ) 1x

P X x= =Â ie probabilities sum up to 1

Exactly the same results hold for the probability density function. However, since our function is continuous we use integration instead of summation (as this is the continuous equivalent): ( ) 0f x ≥

and ( ) 1f x dx =Ú



These two conditions have exactly the same meaning as for discrete random variables: The area under the graph gives us the probability. We find this using

integration. Recall from A-level, that the integral of the part of a function where ( ) 0f x < is negative. So the first condition stops us from getting negative

probabilities.

The second condition tells us that the total area under the graph is 1. But the area under the graph gives us the probability. So this condition is telling us that the total probability must add up to 1.

These two conditions give us our formal definition:

Definition

( )f x is a probability function if:

(i) ( ) 0f x ≥ (ii) ( ) 1f x dx =Ú

Question 9.1

Show that the following function is a probability density function:

238( ) 0 2f x x x= £ £

ie show that ( ) 0f x ≥ and that ( ) 1f x dx =Ú .

Technically, the probability density function is defined for all values of x, with

( ) 0f x = for the values that are impossible. So for our function from Question 9.1, we

should have written:

238 0 2

( )0 otherwise

x xf x

Ï £ £Ô= ÌÔÓ

However, because mathematicians are lazy we tend to just assume this is the case without explicitly writing it down.



2.2 Solving probability density function problems

We can use the probability density function to solve problems where a value is missing. For example, suppose we have the following probability density function for a random variable Y: ( ) 0 4f y ky y= £ £

What is k? We can find the value of k by using the fact that the probabilities must sum to 1:

4

421 12 80

0

( ) 1 8 1f y dy ky dy ky k kÈ ˘= = = fi = fi =Î ˚Ú Ú

Question 9.2

The probability density function of a random variable X is given by: ( ) (1 ) 0 1f x c x x= - £ £

Calculate the value of c.



2.3 Using probability density functions to find probabilities

We have already seen that the area under the probability density function, ( )f x , gives

the probabilities. Therefore, we just integrate ( )f x to obtain probabilities.

Let’s try this out! Taking our example from the previous page:

18( ) 0 4f y y y= £ £

To find (2 3)P Y£ £ , we just integrate between 2 and 3:

3

32 9 51 1 48 16 16 16 162

2

(2 3)P Y y dy yÈ ˘£ £ = = = - =Î ˚Ú

To find ( 2.8)P Y ≥ , we just integrate from 2.8 upwards. Up to what? Well the function

stops at 4 (and is zero thereafter), so we integrate up to 4:

4

421 18 16 2.8

2.8

( 2.8) 1 0.49 0.51P Y y dy yÈ ˘≥ = = = - =Î ˚Ú

Similarly, to find ( 0.7)P Y £ , we just integrate from 0.7 downwards. Since the function

stops being non-zero at 0y = , we’ll stop there:

0.7

0.721 18 16 0

0

( 0.7) 0.030625 0 0.030625P Y y dy yÈ ˘£ = = = - =Î ˚Ú

Question 9.3

The probability density function of a random variable W is given by:

238( ) (1 ) 1 1f w w w= + - £ £

Calculate: (i) (0.3 0.9)P W£ £ (ii) ( 0.5)P W £ (iii) ( 0.8)P W ≥ -

OK, so far so good. Now let’s throw a spanner in the works.



Taking our example again:

18( ) 0 4f y y y= £ £

What is ( 2)P Y = ?

Surely, we’d just integrate from 2 to 2?!? This gives:

2

221 18 16 2

2

( 2) 1 1 0P Y y dy yÈ ˘= = = = - =Î ˚Ú

So the probability is zero. Hold on – using the same approach we’d get

( 1) 0, ( 1.5) 0, ( 2.628) 0,P Y P Y P Y= = = = = = etc. In fact, the probability that Y is

exactly equal to any of the values in its range is zero! Hmmm. This seems odd. Why should this be? Well the problem is that Y can take any value between 0 and 4 and there are an infinite number of values between 0 and 4. So the probability of taking any one single value is 1 out of infinity which is zero. Think of it another way – if I ask you how tall you are you might say 168 cm (well, we are going metric!). Exactly? Well probably not – if we measured more accurately you might be 168.4 cm. Exactly? Well probably not – if we measured even more accurately you might be 168.38 cm. Exactly? Well probably not – we can continue and may find that you are 168.38100482304728759274 cm tall. You get the idea. Hopefully this helps you to see how silly it is to say that someone is exactly 168 cm tall, in the same way that it would be daft to say that someone is exactly 185.0000394828494743948 cm tall. Just because it’s a nice number doesn’t make it more likely!!! So it’s only sensible to talk about the probability that someone is roughly 168 cm tall, ie 168 cm to the nearest cm (that is between 167.5 cm to just below 168.5 cm). So we only calculate probabilities of ranges. Finally, since we can’t get ‘equal’ probabilities, we tend to be a bit blasé about our ranges as ( 3)P X ≥ is exactly the same as ( 3)P X > .



3 The cumulative distribution function

3.1 Definition

In Chapter 1, we found the cumulative frequencies by accumulating (ie adding up) the frequencies as we went through each of the data values. In Chapter 7, we extended this by adding up probabilities to obtain cumulative probabilities. We then defined the cumulative distribution function (CDF) for discrete random variables as: ( ) ( )XF x P X x= £ for all x

This gives us the cumulative probability for any value of x (ie the total probability so far up to x). How does this work for continuous random variables? We will use exactly the same definition for the cumulative distribution function. The only difference will be how we work it out. For discrete random variables we just added up the probabilities, whereas for continuous random variables we will integrate. For example, if we have a random variable, Y , with probability density function:

18( ) 0 4f y y y= £ £

The cumulative distribution function for Y is:

18

0

( ) ( )y

YF y P Y y y dy= £ = Ú

Hmmm. This integral is a bit dodgy as we can’t really integrate over y and have one of the limits as y! So we’ll change the variable in the integral to another letter, say t:

2 21 1 18 16 160

0

( ) ( )y

y

YF y P Y y t dt t yÈ ˘= £ = = =Î ˚Ú

Sorted!



Definition The cumulative distribution function (CDF) of a random variable X is:

start

( ) ( ) ( )x

XF x P X x f t dt= £ = Ú

One last slight amendment. Recall that, technically, our probability density function is defined for all values of y:

18 0 4

( )0 otherwise

y yf y

Ï £ £Ô= ÌÔÓ

Similarly, our cumulative distribution function must be defined for all values of y. For

0y < we haven’t reached any values that our random variable can take! So our total

probability must be zero: eg ( 1) 0YF - =

If 4y > , we have already met all the possible values that the function can take and so

have reached the cumulative probability of 1: eg (5) 1YF =

This gives:

2116

0 0

( ) 0 4

1 4

Y

y

F y y y

y

<ÏÔÔ= £ £ÌÔÔ >Ó

However, we often just write 2116( )YF y y= (as we are lazy) and assume that the other

parts are ‘obvious’.



Question 9.4

Obtain the cumulative distribution function for the random variable, X , with probability density function:

237( ) 1 2f x x x= £ £

Great! Now we can work out the cumulative distribution function (CDF) from the probability density function (PDF). But how can we get back again? Well, we integrated the PDF, ( )f x , to get the CDF, ( )F x , so it makes sense that to go

backwards we differentiate.

Important result We can obtain the probability density function, ( )f x , from the cumulative density

function, ( )F x , as follows:

( ) ( )F x f x=¢

If you studied mathematics at university (and can remember any of it), you will recognise that the notation used is the same as the fundamental theorem of calculus (which basically said we integrate to get big F and differentiate to get little f ).

Suppose we have a random variable with cumulative distribution function:

3

0 0

( ) 0 327

1 3

x

xF x x

x

<ÏÔÔÔ= £ £ÌÔÔ >ÔÓ

We get the probability density function by differentiating:

2 2

( ) ( ) 3 0 327 9

x xf x F x x= = = £ £¢

The derivative of the other parts are both 0 (as they are constants).



3.2 Using cumulative distribution functions to find probabilities

We can use the cumulative distribution function to find the probabilities for a random variable, either by using ( )F x directly or by working backwards to find the original

probability density function ( )f x and then using this. Taking our example from the

previous page:

3

0 0

( ) 0 327

1 3

x

xF x x

x

<ÏÔÔÔ= £ £ÌÔÔ >ÔÓ

Now suppose we want to find: (i) ( 1)P X £ (ii) ( 2)P X >

Using ( )XF x to find the probability function

We could just find the original probability density function by differentiating:

2 2

( ) ( ) 3 0 327 9

x xf x F x x= = = £ £¢

It is now easy to get the probabilities:

(i) 1

12 31 1 19 27 270

0

( 1)P X x dx xÈ ˘£ = = =Î ˚Ú

(ii) 3

32 3 191 1 19 27 27 272

2

( 2) (27 8)P X x dx xÈ ˘> = = = - =Î ˚Ú

However, notice that when we integrate to get the probabilities we’re just going back to the cumulative distribution function. So we might as well use that (unless you love integrating).



Using ( )XF x to find probabilities directly

We have:

3

27

0 0

( ) ( ) 0 3

1 3

x

x

F x P X x x

x

<ÏÔÔ= £ = £ £ÌÔÔ >Ó

(i) Since the CDF already gives less than or equal probabilities, we get:

31 1

27 27( 1) (1)XP X F£ = = =

(ii) Since the probabilities of all possible values add up to 1, we have: ( 2) ( 2) 1P X P X< + > =

Hence:

3227

1927

( 2) 1 ( 2)

1 (2)

1

X

P X P X

F

> = - £

= -

= -

=

Question 9.5

A random variable, V , has cumulative distribution function:

0.5( ) 1 0vF v e v-= - ≥

Calculate: (i) ( 2)P V > (ii) ( 1)P V < (iii) ( 0.8)P V =



4 Measures of location

In Chapter 2, we summarised a set of sample data using the mean, mode and median. In Chapter 7, we extended this by calculating the mean, mode and median of a population modelled by a discrete random variable. We will now look at how we can calculate the mean, mode and median for a population modelled by a continuous random variable.

4.1 Mean

In Chapter 2, we found the sample mean using the formula:

fx

xf

= ÂÂ

In Chapter 7, we found the mean of a population modelled by a discrete random variable by using probabilities, ( )P X x= , instead of frequencies, f :

( )

( )( )

xP X xE X

P X xm

== =

=ÂÂ

Since the sum of probabilities of a probability function, ( )P X x= is 1 we got:

( ) ( )E X xP X xm = = =Â

We used m to stand for the mean of a population modelled by a random variable to

distinguish it from the mean of a sample, x . Confusingly, we also called the population mean the expectation of X (and denoted it ( )E X ). This was because it gave us the mean of the values we would expect to get

(ie the theoretical mean). So how does this formula work for continuous random variables? Well, all we do is replace the ( )P X x= with ( )f x and use integration instead of summation. Hence:

( ) ( )E X xf x dxm = = Ú



Definition The mean (or expectation) of a continuous random variable X is:

( ) ( )E X xf x dxm = = Ú

where the integral is over all possible values that X can take. Let’s find the mean of a continuous random variable, X , with probability density function:

3

2( ) 1f x x

x= >

Using the formula:

23

1 1

1

1

2( ) ( ) 2

2

0 ( 2)

2

E X xf x dx x dx x dxx

x

m• •

-

•-

= = = =

È ˘= -Î ˚

= - -

=

Ú Ú Ú

That’s it! However, since there aren’t many marks given in the Subject CT3 exam for this, plenty of practice is advised (especially if your integration is rusty) to ensure you can do it quickly.

Question 9.6

Find the mean of the random variable, X , that has probability density function:

2118( ) ( 2 ) 2 5f x x x x= - £ £



4.2 Median

In Chapter 2, we obtained the median (the middle value) by counting through half of the frequencies. In Chapter 7, we found the median (the middle value) of a discrete random variable by counting through half of the probabilities. So the median, M, has half of the probability below it and half the probability above it:

medianM

P(X > M) =1

2 P(X < M) =

1

2

So, we can find the median, M, by using:

12( )P X M< =

Since for a continuous random variable ( ) ( )b

a

P a X b f x dx< < = Ú , we have:

12

start

( ) ( )M

P X M f x dx< = =Ú

Solving this for M gives the median. For example, if we have a continuous random variable, X , with probability density function: ( ) 2 0 1f x x x= £ £

The median, M, is given by:

12

0

( ) 2M

P X M x dx< = =Ú



Solving this gives:

2 2 120

0

2 ( 0)M

Mx dx x MÈ ˘= = - =Î ˚Ú

2 1 1

2 2 0.70711M Mfi = fi = =

Looking at the graph of the probability function, ( ) 2f x x= , we can see that this value

splits the area (the probability) into two halves:

area of triangle ½ base height

½ 0.7071 1.4142

0.5

= ¥ ¥

= ¥ ¥

=

area of trapezium ½ base (sum of sides)

½ (1 0.7071) (1.4142 2)

0.5

= ¥ ¥

= ¥ - ¥ +

=

Question 9.7

Calculate the median of the random variable, X , that has probability density function:

29( ) ( 2) 2 5f x x x= - £ £

Earlier we defined the cumulative distribution function, ( ) ( )XF x P X x= £ . So we

could rewrite 12( )P X M< = as:

12( )XF M =

So what? Well this can make life easier as the cumulative distribution functions are listed in the Tables for the common distributions. So this will save us integrating. Nice!

x

f(x)

00 0.5

1

2

1M



4.3 Mode

In Chapter 2, we defined the mode as the value with the highest frequency (ie the value which occurs most). In Chapter 7, we found the mode of a discrete random variable by finding the value with the highest probability, ( )P X x= :

x

P(X = x)

0 1 20

0.2

0.4

3

For a continuous random variable the mode is the value which has greatest probability density function, ( )f x :

f(x)

x

greatest f(x)

mode

This is just the maximum of ( )f x . We can find this by differentiating. Recall that:

if ( ) 0f x =¢

and ( ) 0 then maximises ( )f x x f x<¢¢ .

greatest probability

mode



For example, a continuous random variable, X , has probability density function:

3( ) 4 4 0 1f x x x x= - £ £

Differentiating and setting this equal to zero gives:

2 2 13( ) 4 12 0 0.57735f x x x x= - = fi = fi = ±¢

Since ( )f x is only defined for 0 1x£ £ , this means that the mode must be:

0.57735x = Checking that this gives a maximum:

( ) 24

(0.57735) 13.856 0 max

f x x

f

= -¢¢

fi = - < fi¢¢

Hence the mode for X is 0.57735.

Question 9.8

Find the mode of the random variable, X , that has probability density function:

2( ) 12 (1 ) 0 1f x x x x= - £ £

For some functions, the mode cannot be found using this method. For example:

1 12 2( ) 1 ( ) 0f x x f x= - fi = - π¢

We cannot solve this! In these cases, a quick sketch of the probability density function, will show us the mode.

x

f(x)

00 1 2

1

biggest value of f(x) so mode is x = 0



5 Expectation of functions

5.1 General rule for the expectation of any function

In Chapter 7, we defined the expectation (mean) of a discrete random variable, X, to be:

( ) ( )E X xP X x= =Â

This was extended to find the expectation of other functions. For example:

2 2

3 3

2 2

( ) ( )

( ) ( )

(2 1) (2 1) ( )

( 5) ( 5) ( )

E X x P X x

E X x P X x

E X x P X x

E X x P X x

= =

= =

- = - =

È ˘- = - =Î ˚

ÂÂÂ

Â

In general, for a function ( )g x we had:

[ ( )] ( ) ( )E g X g x P X x= =Â

For continuous random variables we obtained the expectation (mean) by using integration instead of summation:

( ) ( )E X xf x dx= Ú

Similarly, we can also extend this to find expectation of other functions:

2 2

3 3

2 2

( ) ( )

( ) ( )

(2 1) (2 1) ( )

( 5) ( 5) ( )

E X x f x dx

E X x f x dx

E X x f x dx

E X x f x dx

=

=

- = -

È ˘- = -Î ˚

Ú

Ú

Ú

Ú



So in general:

Important result If X is a continuous random variable and ( )g X is a function of that random variable,

then:

[ ( )] ( ) ( )E g X g x f x dx= Ú

Question 9.9

A continuous random variable, Y , has probability density function:

4

3( ) 1f y y

y= ≥

Calculate:

(i) 2( )E Y (ii) (5 2)E Y - (iii) ( )1YE



5.2 Expectation of linear functions of a random variable

We can find the expectation of a linear function of X, say 2 1X - , using our rule:

(2 1) (2 1) ( )E X x f x dx- = -Ú

However, there is a cunning shortcut we can use to work this out if we know ( )E X .

In Chapter 2, we found that if the sample mean of a set of data was x , then when we multiplied each sample value by a and added b the new mean was ax b+ . In Chapter 7, we found that this rule worked for expectations of discrete random variables: ( ) ( )E aX b aE X b+ = +

We can show (see Appendix A) that this rule works for continuous random variables:


( ) ( )E aX b aE X b+ = +

Hence, if ( ) 5E X = then (20 7) 20 ( ) 7 20 5 7 93E X E X- = - = ¥ - = .

Question 9.10

A random variable, Z , has a mean of 4. Calculate:

(i) (6 2 )E Z- (ii) 5 1

6

ZE

+Ê ˆÁ ˜Ë ¯

(iii) (5)E



5.3 Expectation of linear combinations of a random variable

In Chapter 7, we showed that we could split up more complex expectations of discrete random variables. For example:

2 2( 2 ) ( ) 2 ( )E X X E X E X- = -

Does this work for continuous random variables? Let’s try it out!

2 2( 2 ) ( 2 ) ( )E X X x x f x dx- = -Ú

Expanding the bracket and splitting up the integral gives:

{ }2 2

2

2

2

( 2 ) ( ) 2 ( )

( ) 2 ( )

( ) 2 ( )

( ) 2 ( )

E X X x f x xf x dx

x f x dx xf x dx

x f x dx xf x dx

E X E X

- = -

= -

= -

= -

Ú

Ú Ú

Ú Ú

Hurrah! This means we can split up all sorts of grotty expectations of continuous random variables. For example:

2 2(7 3 2) 7 ( ) 3 ( ) 2E X X E X E X- + = - +

So for 2(7 3 2)E X X- + we will only have to work out 2( )E X and ( )E X , which will

be less messy!

Question 9.11

Simplify:

(i) 2(1 8 3 )E X X+ - (ii) 2( 3)E XÈ ˘-Î ˚ (iii) ( )5XE



6 Measures of spread

In Chapter 3, we measured the spread of a set of sample data using the range, the IQR and the standard deviation. Recall that the variance is the standard deviation squared. In Chapter 7, we extended this by calculating the standard deviation and variance of a population modelled by a discrete random variable:

2 2

2 2

var( ) ( )

( ) ( )

X E X

E X E X

s mÈ ˘= = -Î ˚

= -

( ) Var( )sd X Xs = =

These definitions are exactly the same for continuous random variables. The only difference is how we calculate them. For discrete random variables we used summations:

2 2( ) ( ) and ( ) ( )E X xP X x E X x P X x= = = =Â Â

whereas for continuous random variables we will use integration:

2 2( ) ( ) and ( ) ( )E X xf x dx E X x f x dx= =Ú Ú

Definition The variance of a random variable X is:

2 2 2 2var( ) ( ) ( ) ( )X E X E X E Xs mÈ ˘= = - = -Î ˚

The standard deviation is given by:

var( )Xs =

The proof that 2 2 2( ) ( ) ( )E X E X E XmÈ ˘- = -Î ˚ can be found in Appendix B.



Let’s find the standard deviation for a continuous random variable, X , which has probability function:

4

3( ) 1f x x

x= >

Now 2 2var( ) ( ) ( )X E X E X= - , so we need to calculate ( )E X and 2( )E X :

3 23 32 24 1

1 1

3( ) ( ) 3 0 ( ) 1.5E X xf x dx x dx x dx x

x

• • •- -È ˘= = = = - = - - =Î ˚Ú Ú Ú

2 2 2 2 14 1

1 1

3( ) ( ) 3 3 0 ( 3) 3E X x f x dx x dx x dx x

x

• • •- -È ˘= = = = - = - - =Î ˚Ú Ú Ú

2 2 2var( ) ( ) ( ) 3 1.5 0.75X E X E Xfi = - = - =


0.75 0.86603= This value tells us how spread out we would expect the values to be when they occur.

Question 9.12

Calculate the standard deviation of the random variable, Y , that has probability density function: ( ) 6 (1 ) 0 1f y y y y= - < <



6.1 Variance of linear functions of a random variable

In Chapter 3, we found that if the sample variance of a set of data was 2s , then when

we multiplied each sample value by a and added b the new variance was 2 2a s . In Chapter 7, we found that this rule worked for variances of discrete random variables:


We can show (see Appendix C) that this rule works for continuous random variables:



and standard deviation( ) standard deviation( )aX b a X+ = ¥

Hence, if var( ) 3X = then 2var(5 2) 5 var( ) 25 3 75X X- = = ¥ = .

The reasoning behind this result is that adding a constant to each of the values does not alter the spread of the distribution:

spread

X

0

0.40.2

0.6

1

f(x)

x 1 2 3

0.8

00

0.4

0.8

0.2

0.6

1

f(x)

x 2 3 41

spread

X + 1

0

Whereas multiplying the distribution by, say 3, does multiply the spread by 3. Hence,

since variance measures spread squared – we multiply it by 23 .



Question 9.13

A random variable, Z , has a mean of 11 and a standard deviation of 4. Calculate:

(i) var(10 3 )Z- (ii) 5 2

6

Zsd

-Ê ˆÁ ˜Ë ¯

(iii) 2( )E Z



7 Skewness

Recall that the skewness reflects the shape of the distribution (or more accurately it is a measure of how asymmetrical the distribution is). The three types were:


In Chapter 3, we measured the skewness of a sample using:

31( )ix x

n-Â

The cube ensures that we get a negative answer for a negatively skewed sample, a positive answer for a positively skewed sample and zero for a symmetrical sample. In Chapter 7, we measured the skewness of a discrete random variable, X , using:

33

3 2 3

( ) [( ) ]

( ) 3 ( ) 2

Skew X E X

E X E X

m m

m m

= = -

= - +

where:




Skew X X

Skew X X

Skew X X

> fi

= fi

< fi

This definition is exactly the same for continuous random variables. The only difference is how we calculate it. For a discrete random variable we used summations. For example:

3 3( ) [( ) ] ( ) ( )Skew X E X x P X xm m= - = - =Â

Whereas for a continuous random variable we will use integrals. For example:

3 3( ) [( ) ] ( ) ( )Skew X E X x f x dxm m= - = -Ú



Definition The skewness of a random variable X is given by:

33

3 2 3

( ) [( ) ]

( ) 3 ( ) 2

Skew X E X

E X E X

m m

m m

= = -

= - +

where:




Skew X X

Skew X X

Skew X X

> fi

= fi

< fi

The proof that 3 3 2 3[( ) ] ( ) 3 ( ) 2E X E X E Xm m m- = - + can be found in Appendix D.

So how do we use this formula? Let’s calculate the skewness for a continuous random variable, X , with probability density function:

23125( ) 0 5f x x x= £ £

A look at the graph of the PDF shows that X is negatively skew (as all of the values are to the left of the maximum):

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 1 2 3 4 5

x

f(x)



We first need to calculate the mean:

5 52 33 3

125 1250 0

543 31125 4 1250

( ) ( )

(156.25 0) 3.75

E X xf x dx x x dx x dx

x

m = = = ¥ =

È ˘= = - =Î ˚

Ú Ú Ú

To use the first version of the formula, we would have to integrate:

53 3 3 23

1250

( ) [( ) ] ( ) ( ) ( 3.75)Skew X E X x f x dx x x dxm m= - = - = -Ú Ú

Yuk! The second version of the formula will be much more straightforward:

5

52 2 4 53 3 31125 125 5 1250

0

( ) ( ) (625 0) 15E X x f x dx x dx xÈ ˘= = = = - =Î ˚Ú Ú

5

5 15,6253 3 5 63 3 31125 125 6 125 60

0

( ) ( ) ( 0) 62.5E X x f x dx x dx xÈ ˘= = = = - =Î ˚Ú Ú

3 2 3

3

( ) ( ) 3 ( ) 2

62.5 (3 3.75 15) (2 3.75 )

0.78125


= - ¥ ¥ + ¥

= -

We have a negative value, so X is negatively skew.

Question 9.14

The random variable, X , has this symmetrical probability density function:

234( ) (1 ) 1 1f x x x= - - £ £

Show that 3 2 2 3( ) ( ) (3 ) (3 ) ( )Skew X E X E X E X Em m m= - + - .

Whilst either definition can be used, 3 2 3( ) 3 ( ) 2E X E Xm m- + is usually less messy.



7.1 Coefficient of skewness

Skewness is measured in cubic units. For example, if a random variable X is used to

model the claim amount (in £), the skewness would be measured in 3£ . In Chapter 7, we got round this problem by standardising the skewness – that is getting rid of the units so we just have a number (a coefficient). We did this by dividing by the

standard deviation cubed, 3s . Once again, we use the same definition for continuous random variables. The only difference is that we will use integrals to calculate the standard deviation.

Definition The coefficient of skewness of a random variable X is given by:

3

( )Skew X

s

The reasoning behind why dividing by the standard deviation ‘standardises’ a measure is explored more fully in Chapters 11 and 12.

Question 9.15

A random variable, X , has probability density function:

12( ) 1 0 2f x x x= - £ £

Show that the coefficient of skewness of X is 0.566.

The coefficient of skewness can take any value depending on the shape of the distribution. For example, the exponential distribution (which we will meet in Chapter 10) has a coefficient of skewness of 2.



8 Population moments

In Chapter 7, we defined moments, central moments and moments about c as follows:

th moment [ ]

th central moment [( ) ]

th moment about [( ) ]

k

k

k

k E X

k E X

k c E X c

m

=

= -

= -

The definitions are exactly the same for continuous random variables. The only difference is how we calculate them. For discrete random variables we used summations. For example:

( ) ( ) ( )k kE X c x c P X xÈ ˘- = - =Î ˚ Â

Whereas for continuous random variables we use integrals. For example:

( ) ( ) ( )k kE X c x c f x dxÈ ˘- = -Î ˚ Ú

Question 9.16

The random variable, V, has the following probability density function:

12( ) 0 2f v v v= £ £

(i) Calculate the fourth moment of V. (ii) Write down integral expressions for the: (a) third order central moment of V (b) second order moment of V about 1.

Just as for discrete random variables: the first order moment, ( )E X , is the mean of the

random variable, the second order central moment, 2[( ) ]E X m- , is var( )X and the

third order central moment, 3[( ) ]E X m- , is ( )Skew X .



9 Miscellaneous questions

We have now covered all the types of questions you can be asked on generic continuous random variables. However, exam questions tend to: lump several areas of this chapter together

use more messy integration (eg integration by parts)

This next question gives practice of these important skills:

Question 9.17

Claims on a particular type of insurance policy (in £’s) during 2003 are modelled by a continuous random variable, X , with probability density function:

0.01( ) 0xf x ke x-= >

(i) Calculate the value of k. (ii) Show that for 2003, the mean claim amount is £100 and the standard deviation

of the claim amounts is also £100. (iii) For claims during 2004, the insurance company anticipates that claims are going

to rise by 12%. What will be the mean and standard deviation of the claim amounts during 2004?



10 Appendix A – proof of ( ) ( )E aX b aE X b+ = +

Using the fact that the expectation of any function of X , say ( )g X , is given by:

[ ( )] ( ) ( )E g X g x f x dx= Ú

We have:

( ) ( ) ( )E aX b ax b f x dx+ = +Ú

Expanding the brackets gives:

( ) { ( ) ( )}E aX b axf x bf x dx+ = +Ú

We can split up the integral:

( ) ( ) ( )E aX b axf x dx bf x dx+ = +Ú Ú

Since a and b are constants we can take them out of the integrals:

( ) ( ) ( )E aX b a xf x dx b f x dx+ = +Ú Ú

But we know that the sum of all probabilities, ( )f x dxÚ , is 1:

( ) ( )E aX b a xf x dx b+ = +Ú

Recalling the definition of the mean, ( ) ( )E X xf x dx= Ú , we get:

( ) ( )E aX b aE X b+ = +

By replacing all the x’s in the proof with ( )g x ’s, this result can be generalised to show

that: [ ( ) ] [ ( )]E ag X b aE g X b+ = +



11 Appendix B – proof that 2 2var( ) ( ) ( )X E X E X= -

This proof is exactly the same as the proof in Chapter 7.

We can obtain this result by rearranging 2var( ) ( )X E X mÈ ˘= -Î ˚ :


2 2var( ) 2X E X Xm mÈ ˘= - +Î ˚


2 2var( ) ( ) (2 ) ( )X E X E X Em m= - +


2 2

(2 ) 2 ( )

( )

E X E X

E

m m

m m

=

=

Hence:

2 2var( ) ( ) 2 ( )X E X E Xm m= - +


2 2 2

2 2

2 2

var( ) ( ) 2

( )

( ) ( )

X E X

E X

E X E X

m m

m

= - +

= -

= -



12 Appendix C – proof of 2var( ) var( )aX b a X+ =

This proof is exactly same as the proof in Chapter 7. This result can be proved by considering either definition of the variance. However, we will prove it using the simpler form of:

2 2var( ) ( ) ( )X E X E X= -

Replacing X with aX b+ in our variance formula gives:

2 2var( ) [( ) ] [ ( )]aX b E aX b E aX b+ = + - +

Expanding the brackets in the first term gives:

2 2 2 2var( ) [ 2 ] [ ( )]aX b E a X abX b E aX b+ = + + - +

Splitting up the first expectation:

2 2 2 2var( ) ( ) (2 ) ( ) [ ( )]aX b E a X E abX E b E aX b+ = + + - +

Using ( ) ( )E aX b aE X b+ = + gives:

2 2 2 2var( ) ( ) 2 ( ) [ ( ) ]aX b a E X abE X b aE X b+ = + + - +

Expanding the brackets in the last term and simplifying:

{ }

2 2 2 2 2 2

2 2 2 2

2 2 2

2

var( ) ( ) 2 ( ) [ ( ) 2 ( ) ]

( ) ( )

( ) ( )

var( )

aX b a E X abE X b a E X abE X b

a E X a E X

a E X E X

a X

+ = + + - + +

= -

= -

=



13 Appendix D – proof that 3 2 3( ) ( ) 3 ( ) 2Skew X E X E Xm m= - +

This is exactly the same as the proof in Chapter 7. We have:

3( ) ( )Skew X E X mÈ ˘= -Î ˚


[ ]2 2

3 2 2 3

( ) ( )( )( )

( )( 2 )

3 3

Skew X E X X X

E X X X

E X X X

m m m

m m m

m m m

= - - -

È ˘= - - +Î ˚

È ˘= - + -Î ˚


3 2 2 3( ) ( ) (3 ) (3 ) ( )Skew X E X E X E X Em m m= - + -


2 2

2 2

3 3

(3 ) 3 ( )

(3 ) 3 ( )

( )

E X E X

E X E X

E

m m

m m

m m

=

=

=

Hence:

3 2 2 3( ) ( ) 3 ( ) 3 ( )Skew X E X E X E Xm m m= - + -


3 2 3 3

3 2 3

( ) ( ) 3 ( ) 3

( ) 3 ( ) 2

Skew X E X E X

E X E X

m m m

m m

= - + -

= - +



Extra practice questions Section 2: Probability functions

P9.1 Subject C1, September 1994, Q10 Consider the family of distributions with probability density functions:

21( ) 1 (1 ) , 1 1

2f x kx x xÈ ˘= + - - < <Î ˚ where 1 1k- < < .

(i) Verify that the area under the density curve is 1. [2] (ii) Find the mean of the distributions in terms of k. [2] [Total 4]

P9.2 Subject C1, September 1998, Q9 The random variable X has probability density function:

( )

2(1 ) : 1 1

0 : otherwise

k x xf x

Ï - - £ £Ô= ÌÔÓ

Evaluate the constant k and hence calculate var( )X . [4]

P9.3 Subject 101, September 2001, Q7 The probability density function of a random variable X is given by:

2(1 ) 0 1

( ) =0 otherwise

kx ax xf x

Ï - £ £ÔÌÔÓ

where k and a are positive constants. (i) Show that 1a £ , and determine the value of k in terms of a. [3] (ii) For the case 1a = , determine the mean of X. [2] [Total 5]



P9.4 Subject 101, April 2000, Q3 In an investigation into the proportion (q ) of lapses in the first year of a certain type of policy, the uncertainty about q is modelled by taking q to have a beta distribution with parameters 1a = and 9b = , that is, with density:

8( ) 9(1 )f q q= - : 0 1q< <

Using this distribution, calculate the probability that q exceeds 0.2. [2] Section 3: Cumulative distribution functions

P9.5 Subject 101, September 2002, Q3 A random variable X, which can be used in certain circumstances as a model for claim sizes, has cumulative distribution function:

3

0 , 0

( ) = 21 , 0

2

x

F xx

x

<ÏÔÌ Ê ˆ- >Ô Á ˜Ë ¯+Ó

Calculate the value of the conditional probability ( 3 | 1)P X X> > . [3]

Sections 4, 5 and 6: Measures of location and spread

P9.6 A continuous random variable, Y , has probability density function:

14( ) (4 ) 1 3f y y y= - £ £

Calculate the mean, median, mode and variance of Y .



P9.7 Subject C1, April 1995, Q1 (adapted) A plumber has a call-out charge of £20 and in addition he charges £30 per hour for all his jobs. In a particular week the mean and standard deviation of the lengths of his jobs are 3.5 hours and 0.5 hours respectively. What are the mean and standard deviation of the invoice values for this particular week? [2] Section 7: Skewness

P9.8 Calculate the coefficient of skewness for the random variable W , which has the following probability density function:

23( ) 1 2f w w w= £ £



Chapter 9 Summary Continuous random variables A random variable uses probabilities to decide its numerical value. Continuous random variables can take any value within in a given range, eg 1 1x- £ £ .

( )f x is a probability density function (PDF) of X if:

( ) 0 and ( ) 1f x f x dx≥ =Ú

We integrate ( )f x to find probabilities:

( ) ( )b

a

P a X b f x dx< < = Ú

The cumulative distribution function (CDF) of X is: ( ) ( )XF x P X x= £

Measures of location The mean (or expectation) of a population modelled by a random variable X is:

( ) ( )E X xf x dxm = = Ú

The median of X is the value(s) that correspond to a cumulative probability of 0.5:

start

( ) ( ) 0.5M

F M f x dx= =Ú

The mode of X is the value that maximises ( )f x :

( ) 0 and ( ) 0f x f x= <¢ ¢¢



Variance and standard deviation The variance of a population modelled by a random variable X is:

2 2 2 2var( ) ( ) ( ) ( )X E X E X E Xs mÈ ˘= = - = -Î ˚

It measures the spread squared of the distribution. The standard deviation of a population modelled by a random variable X is:

var( )Xs =

It measures the spread of the distribution. Linear functions of a random variable We can simplify means and variances of linear functions using:

2

( ) ( )

var( ) var( )

E aX b aE X b

aX b a X

+ = +

+ =

We can also split up expectations as follows:

2 2(3 5 1) 3 ( ) 5 ( ) 1E X X E X E X- + = - +

Skewness The skewness of a population modelled by a random variable X is:

3 3 2 33 ( ) ( ) ( ) 3 ( ) 2Skew X E X E X E Xm m m mÈ ˘= = - = - +Î ˚

The coefficient of skewness is the best way to compare the shape of distributions:

3

( )coefficient of skewness

Skew X

s=




We have:

238( ) 0 2f x x x= £ £

To show that this is a probability density function, first we need to see if ( )f x is non-

negative. The easiest way to check this is to look at the graph:

f(x)

x1 2

0.5

1

1.5

No problems there. Next we need to check that the total area under the graph (ie the total probability) is 1:

2

22 3 33 1 18 8 80

0

(2 0) 1x dx xÈ ˘= = - =Î ˚Ú

So the function is a probability density function.



Solution 9.2

Using the fact that the total area under the graph (ie the total probability) is 1, we get:

{ }

1121 1 1

2 2 200

(1 ) (1 ) 0 1

2

c x dx c x x c c

c

È ˘- = - = - - = =Î ˚

fi =

Ú

Solution 9.3

(i) To find (0.3 0.9)P W£ £ , we just integrate between 0.3 and 0.9:

{ }

0.923

80.3

0.933 18 3 0.3

38

(0.3 0.9) (1 )

(0.9 0.243) (0.3 0.009) 0.31275

P W w dw

w w

£ £ = +

È ˘= +Î ˚

= + - + =

Ú

(ii) To find ( 0.5)P W £ we integrate from 0.5 down to the start of the function:

{ }

0.523

81

0.533 18 3 1

3 0.125 18 3 3

( 0.5) (1 )

(0.5 ) ( 1 ) 0.703125

P W w dw

w w

-

-

£ = +

È ˘= +Î ˚

= + - - - =

Ú

(iii) To find ( 0.8)P W ≥ - we integrate from 0.8- up to the end of the function:

{ }

123

80.8

133 18 3 0.8

3 0.51218 3 3

( 0.8) (1 )

(1 ) ( 0.8 ) 0.864

P W w dw

w w

-

-

≥ - = +

È ˘= +Î ˚

= + - - - =

Ú



Solution 9.4

Integrating from the start up to x, we have:

2 3 33 1 17 7 71

1

( ) ( ) ( 1)x

xF x P X x t dt t xÈ ˘= £ = = = -Î ˚Ú

Again, note how we used t in the integral so we don’t have x as the variable and as one of the limits. Since, technically, our function is:

237 1 2

( )0 otherwise

x xf x

Ï £ £Ô= ÌÔÓ

We should give the cumulative distribution function as:

317

0 1

( ) ( 1) 1 2

1 2

x

F x x x

x

<ÏÔÔ= - £ £ÌÔÔ >Ó

In practice in the Subject CT3 exam, the first version would be fine.



Solution 9.5

Solving this question by first deriving the probability density function, we get:

0.5( ) ( ) 0.5 vf v F v e-= =¢

(i) To calculate ( 2)P V > we integrate from 2 upwards. Up to what? Well the

function has no upper limit, so we integrate to infinity:

0.5

2

0.5

2

1 1

( 2) 0.5

( ) 0 0.36788

v

v

P V e dv

e

e e e

•-

•-

-• - -

> =

È ˘= -Î ˚

= - - - = + =

Ú

(ii) To calculate ( 1)P V < we just integrate from 1 down to the start of the function

(ie 0):

10.5

0

10.5

0

0.5 0

0.5

( 1) 0.5

( )

1 0.39347

v

v

P V e dv

e

e e

e

-

-

-

-

< =

È ˘= -Î ˚

= - - -

= - =

Ú

(iii) Remember that the probability that a continuous random variable is exactly

equal to any number is zero. So:

( 0.8) 0P V = =

Alternatively, we could have calculated the probabilities in (i) and (ii) using ( )F v

directly:

(i) ( )0.5 2 1( 2) 1 ( 2) 1 (2) 1 1 0.36788P V P V F e e- ¥ -> = - < = - = - - = =

(ii) 0.5 1 0.5( 1) (1) 1 1 0.39347P V F e e- ¥ -£ = = - = - =



Solution 9.6

We have:

2118( ) ( 2 ) 2 5f x x x x= - £ £

Substituting this into the formula for the mean gives:

{ }

5 52 3 21 1

18 182 2

54 31 1 218 4 3 2

625 250 16118 4 3 3

( ) ( ) ( 2 ) 2

( ) (4 ) 4.125

E X xf x dx x x x dx x x dx

x x

m = = = ¥ - = -

È ˘= -Î ˚

= - - - =

Ú Ú Ú

Solution 9.7

Using 12( )P X M< = we get:

2 19 2

2

( ) ( 2)M

P X M x dx< = - =Ú

Integrating, we get:

( ){ }

22 1 19 2 22

22 1 19 2 2

21 4 19 9 18

2

2

2 ( 2)

0

2 8 1 0

Mx x

M M

M M

M M

È ˘- =Î ˚

- - - =

- - =

- - =

Using the quadratic formula:

28 ( 8) 4 2 ( 1) 8 72

4.12 or 0.1212 2 4

M± - - ¥ ¥ - ±= = = -

¥

Since ( )f x is only defined for the range 2 5x£ £ , the median must be 4.12.



Solution 9.8

We have:

2 2 3( ) 12 (1 ) 12 12f x x x x x= - = -

Differentiating and setting this equal to zero gives:

2

23

( ) 24 36 0

12 (2 3 ) 0 0 or

f x x x

x x x

= - =¢

fi - = fi =

Checking to see which of these values gives the maximum: ( ) 24 72f x x= -¢¢

23

(0) 24 0 min

( ) 24 0 max

f

f

fi = > fi¢¢

= - < fi¢¢

So the mode is 23x = .



Solution 9.9

(i) 2 2 2 24

1 1

1

1

3( ) ( ) 3

3

0 ( 3)

3

E Y y f y dy y dy y dyy

y

• •-

•-

= = =

È ˘= -Î ˚

= - -

=

Ú Ú Ú

(ii) 3 44

1 1

2 3

1

3(5 2) (5 2) ( ) (5 2) 15 6

7.5 2

0 ( 7.5 2)

5.5

E Y y f y dy y dy y y dyy

y y

• •- -

•- -

- = - = - = -

È ˘= - +Î ˚

= - - +

=

Ú Ú Ú

(iii) 54

1 1

4

1

1 1 1 3( ) 3

0.75

0 ( 0.75)

0.75

E f y dy dy y dyY y y y

y

• •-

•-

Ê ˆ = = ¥ =Á ˜Ë ¯

È ˘= -Î ˚

= - -

=

Ú Ú Ú



Solution 9.10

We are told that ( ) 4E Z = .

Using our rule we get: (i) (6 2 ) 6 2 ( )

6 (2 4)

2

E Z E Z- = -

= - ¥

= -

(ii) Since:

5 1 5 1

6 6 6

ZZ

+ = +

We get:

5 1 5 1

6 6 6

5 1( )

6 6

5 14

6 6

3.5

ZE E Z

E Z

+Ê ˆ Ê ˆ= +Á ˜ Á ˜Ë ¯ Ë ¯

= +

Ê ˆ= ¥ +Á ˜Ë ¯

=

(iii) (5) 5E =

This follows from the rule ( ) ( )E aX b aE X b+ = + when 0a = :

( )E b b=

That is to say the expectation of a constant is just the constant.

Since the constant doesn’t change with what the random variable is, we would expect it to be unchanged. Alternatively, we could see that it is equal to:

( ) ( ) ( ) 1E b bf x dx b f x dx b b= = = ¥ =Ú Ú



Solution 9.11

(i) 2 2(1 8 3 ) 1 8 ( ) 3 ( )E X X E X E X+ - = + -

(iii) Multiplying out the brackets first:

2 2

2

( 3) [ 6 9]

( ) 6 ( ) 9

E X E X X

E X E X

È ˘- = - +Î ˚

= - +

(iv) ( ) ( )( )

5 1

1

5

5

X X

X

E E

E

= ¥

=

We cannot simplify ( )1XE to 1

( )E X ! We would have to work it out from first

principles:

( )1 1 ( )X xE f x dx= Ú



Solution 9.12

We have: ( ) 6 (1 ) 0 1f y y y y= - < <

To calculate the variance of Y we require ( )E Y and 2( )E Y :

1 12 3

0 0

13 432 0

32

( ) ( ) 6 (1 ) 6 6

2

(2 ) 0

0.5

E Y yf y dy y y y dy y y dy

y y

= = ¥ - = -

È ˘= -Î ˚

= - -

=

Ú Ú Ú

1 12 2 2 3 4

0 0

14 53 62 5 0

3 62 5

( ) ( ) 6 (1 ) 6 6

( ) 0

0.3

E Y y f y dy y y y dy y y dy

y y

= = ¥ - = -

È ˘= -Î ˚

= - -

=

Ú Ú Ú

Hence:

2 2

2

var( ) ( ) ( )

0.3 0.5

0.05

Y E Y E Y= -

= -

=


0.05 0.22361=



Solution 9.13

Since the standard deviation is 4, we have 2var( ) 4 16Z = = .

(i) 2var(10 3 ) ( 3) var( )

9 var( )

9 16

144

Z Z

Z

- = -

=

= ¥

=

(ii) Since:

5 2 5 2

6 6 6

ZZ

- = -

We get:

5 2 5 2

6 6 6

5( )

6

54

6

13

3

Zsd sd Z

sd Z

-Ê ˆ Ê ˆ= -Á ˜ Á ˜Ë ¯ Ë ¯

=

= ¥

=

(iii) Be careful! [ ]22( ) ( )E Z E Zπ . But by rearranging the variance formula we get:

2 2

2 2

var( ) ( ) ( )

( ) var( ) ( )

Z E Z E Z

E Z Z E Z

= -

fi = +

So:

2 2( ) 16 11 137E Z = + =



Solution 9.14

We first need to calculate the mean:

{ }

1 12 33 3

4 41 1

12 43 31 1 1 1 1 14 2 4 4 2 4 2 41

( ) ( ) (1 )

( ) ( )

0

E X xf x dx x x dx x x dx

x x

m- -

-

= = = - = -

È ˘= - = - - -Î ˚

=

Ú Ú Ú

The first version of the formula is (bizarrely) OK to integrate:

{ }

13 3 3 23

41

113 5 4 63 3 1 1

4 4 4 6 11

3 1 1 1 14 4 6 4 6

( ) [( ) ] ( ) ( ) ( 0) (1 )

( ) ( )

0

Skew X E X x f x dx x x dx

x x dx x x

m m-

--

= - = - = - -

È ˘= - = -Î ˚

= - - -

=

Ú Ú

Ú

Alternatively, we could have used 3 2 3( ) ( ) 3 ( ) 2Skew X E X E Xm m= - + :

{ }

1 12 2 2 2 2 43 3

4 41 1

13 53 31 1 1 1 1 1 14 3 5 4 3 5 3 5 51

( ) ( ) (1 )

( ) ( ) 0.2

E X x f x dx x x dx x x dx

x x

- -

-

= = - = -

È ˘= - = - - - + = =Î ˚

Ú Ú Ú

{ }

1 13 3 3 2 3 53 3

4 41 1

14 63 31 1 1 1 1 14 4 6 4 4 6 4 61

( ) ( ) (1 )

( ) ( ) 0

E X x f x dx x x dx x x dx

x x

- -

-

= = - = -

È ˘= - = - - - =Î ˚

Ú Ú Ú

3 2 3

3

( ) ( ) 3 ( ) 2

0 (3 0 0.2) (2 0 )

0


= - ¥ ¥ + ¥

=



Solution 9.15

We have:

12( ) 1 0 2f x x x= - £ £


{ }2

22 2 31 1 1 4 22 2 6 3 30

0

( ) ( ) (2 ) 0E X xf x dx x x dx x xm È ˘= = = - = - = - - =Î ˚Ú Ú

To use the first version of the formula, we would have to integrate:

23 3 32 1

3 20

( ) [( ) ] ( ) ( ) ( ) (1 )Skew X E X x f x dx x x dxm m= - = - = - -Ú Ú

Yuk! Using 3 2 3( ) ( ) 3 ( ) 2Skew X E X E Xm m= - + will be much easier:

{ }2

22 2 2 3 3 4 81 1 1 22 3 8 3 30

0

( ) ( ) ( 2) 0E X x f x dx x x dx x xÈ ˘= = - = - = - - =Î ˚Ú Ú

{ }2

23 3 3 4 4 51 1 1 1 42 4 10 5 50

0

( ) ( ) (4 3 ) 0E X x f x dx x x dx x xÈ ˘= = - = - = - - =Î ˚Ú Ú

( ) ( )

3 2 3

34 2 2 25 3 3 3

( ) ( ) 3 ( ) 2

3 2 ( ) 0.0593


= - ¥ ¥ + ¥ =

To calculate the coefficient of skewness, we need the variance:

( )22 2 2 2 23 3 9var( ) ( ) ( ) 0.2X E X E X= - = - = =


1.5

0.05930.566

0.2=



Solution 9.16

We have:

12( ) 0 2f v v v= £ £

(i) The fourth moment of V is:

2

24 4 5 61 1 1 12 12 3 30

0

( ) ( ) 5 0 5E V v f v dv v dv vÈ ˘= = = = - =Î ˚Ú Ú

(ii) (a) The third central moment is given by:

3 3( ) ( ) ( )E V v f v dvm mÈ ˘- = -Î ˚ Ú

We require the mean, m :

2

22 31 1 4 42 6 3 30

0

( ) ( ) 0E V vf v dv v dv vm È ˘= = = = = - =Î ˚Ú Ú

Therefore, we get:

2

3 34 13 2

0

( ) ( )E V v v dvmÈ ˘- = -Î ˚ Ú

(b) The second moment about 1 is given by:

2

2 2 2 12

0

( 1) ( 1) ( ) ( 1)E V v f v dv v v dvÈ ˘- = - = -Î ˚ Ú Ú



Solution 9.17

(i) Using the fact that the area under a PDF is 1, we get:

0.01 0.01 00.01 0.01 0.010

0

0 ( ) 1

0.01

x xk k kke dx e e

k

• •- -È ˘= - = - - = =Î ˚

fi =

Ú

(ii) Using the formula for the mean, we get:

0.01

0

( ) ( ) 0.01 xE X xf x dx x e dx•

-= =Ú Ú

We need to use integration by parts. The formula given on page 3 of the Tables is:

[ ]b b

ba

a a

dv duu dx uv v dx

dx dx= -Ú Ú

Setting u x= and 0.010.01 xdve

dx-= we get:

0.01 0.01 0.01

00 0

0.0110.01 0

010.01

10.01

0.01

0

0 ( )

£100

x x x

x

x e dx xe e dx

e

e

• ••- - -

•-

È ˘= - - -Î ˚

È ˘= + -Î ˚

= - -

=

=

Ú Ú

To get the variance, we need the second moment, 2( )E X :

2 2 2 0.01

0

( ) ( ) 0.01 xE X x f x dx x e dx•

-= =Ú Ú



Using integration by parts again and setting 2u x= and 0.010.01 xdve

dx-= we get:

2 0.01 2 0.01 0.01

00 0

0.01 0.01

0 0

0.01 2

0 2 2

x x x

x x

x e dx x e xe dx

xe dx xe dx

• ••- - -

• •- -

È ˘= - - -Î ˚

= + =

Ú Ú

Ú Ú

We now need to use integration by parts to calculate this integral. Setting

2u x= and 0.01xdve

dx-= we get:

2

2

2

0.01 0.01 0.011 10.01 0.010

0 0

0.0110.01 0

020.01

20.01

2 2 2

0 2

0 ( )

20,000

x x x

x

xe dx x e e dx

e

e

• ••- - -

•-

È ˘= - - -Î ˚

È ˘= + -Í ˙Î ˚

= - -

= =

Ú Ú

Alternatively, we could use the integral from the previous page:

0.01

0

0.01 0.01

0 0

0.01 100

10,000 2 20,000

x

x x

x e dx

xe dx xe dx

•-

• •- -

=

fi = fi =

Ú

Ú Ú

Hence:

2 2 2var( ) ( ) ( ) 20,000 100 10,000X E X E X= - = - =


10,000 £100=



(iii) Now, X is the random variable for the claim size during 2003. So 1.12X will be the random variable for the claim size during 2004.

(1.12 ) 1.12 ( ) 1.12 £100 £112E X E X= = ¥ =

(1.12 ) 1.12 ( ) 1.12 £100 £112sd X sd X= = ¥ =




P9.1 (i) The area under the density curve is given by:

{ }{ }

121

21

131

21

12 41 1 12 2 4 1

1 1 1 1 12 2 4 2 4

1 1 12 4 4

12

( ) 1 (1 )

1

(1 ) ( 1 )

(1 ) ( 1 )

2

1

f x dx kx x dx

kx kx dx

x kx kx

k k k k

k k

-

-

-

È ˘= + -Î ˚

= + -

È ˘= + -Î ˚

= + - - - + -

= + - - +

= ¥

=

Ú Ú

Ú

(ii) The mean is given by:

{ }{ }

12 41

21

12 3 51 1 1 12 2 3 5 1

1 1 1 1 1 1 12 2 3 5 2 3 5

1 1 2 1 22 2 15 2 15

1 42 15

215

( ) ( )

( ) ( )

( ) ( )

E X xf x dx

x kx kx dx

x kx kx

k k k k

k k

k

k

-

-

=

= + -

È ˘= + -Î ˚

= + - - - +

= + - -

= ¥

=

Ú

Ú



P9.2 Since ( ) 1f x dx =Ú we have:

{ }1

12 31 1 1 43 3 3 31

1

34

(1 ) (1 ) ( 1 ) 1k x dx k x x k k

k

--

È ˘- = - = - - - + = =Î ˚

fi =

Ú

To calculate the variance we require ( )E X and 2( )E X :

{ }

133

41

12 43 1 14 2 4 1

3 1 1 1 14 2 4 2 4

( ) ( )

( ) ( )

0

E X xf x dx x x dx

x x

-

-

= = -

È ˘= -Î ˚

= - - -

=

Ú Ú

Alternatively, a quick sketch of the PDF would show that it is symmetrical about zero. Hence the mean will be zero.

{ }{ }

12 2 2 43

41

13 53 1 14 3 5 1

3 1 1 1 14 3 5 3 5

3 2 24 15 15

15

( ) ( )

( ) ( )

( )

0.2

E X x f x dx x x dx

x x

-

-

= = -

È ˘= -Î ˚

= - - - +

= - -

= =

Ú Ú

Hence:

2 2 2var( ) ( ) ( ) 0.2 0 0.2X E X E X= - = - =



P9.3 (i) If ( )f x is a PDF then ( ) 0f x ≥ for all values of x.

2(1 ) 0 0 1kx ax x- ≥ £ £

Since k is a positive constant and 0x ≥ this means that 0kx ≥ and so

2(1 ) 0ax- ≥ .

2

2

2

(1 ) 0

1

1

ax

ax

ax

- ≥

≥

£

Since this must be true for all x in the range 0 1x£ £ 2

11

1afi £ = .

To find k we use the fact that if ( )f x is a PDF then ( ) 1f x dx =Ú :

( )

( )

1 12 3

0 0

12 41 12 4 0

1 12 4

1 12 4

(1 ) 1

1

1

1 4

2

kx ax dx k x ax dx

k x ax

k a

kaa

- = - =

È ˘fi - =Î ˚

fi - =

fi = =--

Ú Ú

(ii) If 1a = then 4k = .

1 12 2 2 4

0 0

13 54 43 5 0

4 43 5

815

( ) 4 (1 ) 4 4

0.53

E X x x dx x x dx

x x

= - = -

È ˘= -Î ˚

= -

= =

Ú Ú



P9.4 To find probabilities for a continuous random variable, we use:

( ) ( )b

a


So:

1

0.2

18

0.2

19

0.2

9

( 0.2) ( )

9(1 )

(1 )

0 ( 0.8 )

0.13422

P f d

d

q q q

q q

q

> =

= -

È ˘= - -Î ˚

= - -

=

Ú

Ú

P9.5 Using the conditional probability formula:

( 3 and 1) ( 3)( 3 | 1)

( 1) ( 1)

P X X P XP X X

P X P X

> > >> > = => >

Now ( ) ( )F x P X x= < , so ( ) 1 ( )P X x F x> = - . Hence:

( ) ( )3 32 22 3 5( 3) 1 (3) 1 1P X F +

È ˘> = - = - - =Í ˙Î ˚

( ) ( )3 32 22 1 3( 1) 1 (1) 1 1P X F +

È ˘> = - = - - =Í ˙Î ˚

So:

( )( )

325

323

( 3 | 1) 0.216P X X> > = =



P9.6 We have:

14( ) (4 ) 1 3f y y y= - £ £


{ }

321

41

32 31 14 3 1

1 14 3

1 14 3

56

( ) ( ) 4

2

(18 9) (2 )

7

1 1.83

E Y yf y dy y y dy

y y

= = -

È ˘= -Î ˚

= - - -

= ¥

= =

Ú Ú

The median, M, can be found using 12( )P Y M< = :

1 14 2

1

( ) 4M

P Y M y dy< = - =Ú

Integrating, we get:

{ }

21 1 14 2 21

21 1 1 14 2 2 2

2 318 8

2

4

(4 ) 3

1 0

8 11 0

My y

M M

M M

M M

È ˘- =Î ˚

- - =

- + =

- + =

Using the quadratic formula:

28 ( 8) 4 1 11 8 20

1.76 or 6.242 1 2

M± - - ¥ ¥ ±= = =

¥

Since ( )f y is only defined for the range 1 3x£ £ , the median must be 1.76.



The mode is the value that gives the greatest value of ( )f y .

Using differentiation is not helpful:

14( ) 0f y = - π¢

A quick sketch of the PDF shows that the mode is 1:

0.25

0.5

0.75

00 2 41 3

y

f(y) largest value of f(x)

To get the variance we need to calculate 2( )E Y :

{ }

32 2 2 31

41

33 41 4 14 3 4 1

1 1 4 14 4 3 4

1 24 3

23

( ) ( ) 4

(36 20 ) ( )

14

3

E Y y f y dy y y dy

y y

= = -

È ˘= -Î ˚

= - - -

= ¥

=

Ú Ú

Hence:

( )22 2 52 113 6 36var( ) ( ) ( ) 3 1 0.305Y E Y E Y= - = - = =



P9.7 If H is the number of hours worked, and C is the cost then:

20 30C H= + The mean of the cost, C, is given by:

( ) (20 30 )

20 30 ( ) using ( ) ( )

20 30 3.5 since ( ) 3.5

£125

E C E H

E H E aX b aE X b

E H

= +

= + + = +

= + ¥ =

=

The standard deviation of the cost, C, is given by:

( ) (20 30 )

30 ( ) using ( ) ( )

30 0.5 since ( ) 0.5

£15

sd C sd H

sd H sd aX b a sd X

sd H

= +

= + = ¥

= ¥ =

=



P9.8 The skewness is given by:

3 2 3( ) ( ) 3 ( ) 2Skew W E W E Wm m= - +

Calculating the moments 2( ), ( )E W E W and 3( )E W :

2

22 3 8 52 2 1 2 13 3 3 3 3 3 91

1

( ) ( ) ( ) 1 1.5E W wf w dw w dw wm È ˘= = = = = - = =Î ˚Ú Ú

2

22 2 3 42 2 1 2 1 13 3 4 3 4 21

1

( ) ( ) (4 ) 2E W w f w dw w dw wÈ ˘= = = = - =Î ˚Ú Ú

2

23 3 4 52 2 1 2 2 1 23 3 5 3 5 5 151

1

( ) ( ) (6 ) 4 4.13E W w f w dw w dw wÈ ˘= = = = - = =Î ˚Ú Ú

This gives:

3 2 3

35 52 115 9 2 9

( ) ( ) 3 ( ) 2

4 (3 1 2 ) (2 (1 ) )

0.00521

Skew W E W E Wm mfi = - +

= - ¥ ¥ + ¥

= -

To get the coefficient of skewness, we need the variance:

2 2 25 1312 9 162var( ) ( ) ( ) 2 (1 ) 0.08025W E W E W= - = - = =

Therefore the coefficient of skewness is:

1.5

0.005210.229

0.08025

Stats Pack-10: Continuous distributions Page 1


Chapter 10

Continuous distributions

Links to CT3: Chapter 4 Syllabus objectives: (ii)2. Evaluate probabilities (by calculation or by referring to tables as appropriate)


0 Introduction

In this chapter we will be looking at two continuous distributions: The uniform distribution, which assumes that all values have an equal chance of occurring. The exponential distribution, which is useful for modelling lifetimes of equipment, claims amounts and the waiting times between events. For each of these distributions we will look at graphs of the probability distribution functions, derive the basic results (mean, median, mode and variance) and find probabilities.

Stats Pack-10: Continuous distributions


1 The uniform distribution

1.1 General features of the uniform distribution

Do you remember school uniform? Everyone was supposed to wear the same clothes. Well this is the idea behind the uniform distribution – all numbers have the same probability of occurring. Now in Chapter 8 we met the discrete uniform distribution, where each whole number had the same probability of occurring. For example, if we roll a fair die there are 6

numbers, each of which has the same probability ( 16 ) of occurring.

x 6

P(X = x)

1 3 52 4

The continuous uniform extends this by allowing any number in a range to occur:

f(x)

x 0 6

How can we work out the PDF for any uniform distribution? Recall that one of the properties of a PDF is that the area under its graph equals 1:

( ) 1f x dx•

-•

=Ú



Since the area under the graph is just a rectangle it’s easy to find the height, ( )f x , from

the area (which must be 1) and the width using:

area

heightwidth

=

So for our distribution from 0 to 6 the width is 6 0 6- = . Then our PDF is:

area 1

( ) heightwidth 6

f x = = =

Question 10.1

What would be the PDF for a continuous uniform distribution between 2 and 6?

1.2 The PDF of the uniform distribution

As we have seen all we need to calculate the PDF for a continuous uniform distribution is the values between which it lies, denoted a and b. So what is the PDF for a general continuous uniform distribution between a and b?

f(x)

x 0 a b

Again, using the fact that the area under the graph is 1, we get:

area 1

( ) heightwidth

f xb a

= = =-

So we have:

1( )f x a x b

b a= < <

-



The shortcut way of writing ‘X has a uniform distribution between a and b’ is: ~ ( , )X U a b

To get a better ‘feel’ for this distribution, we’ll look at a number of continuous uniform distributions with different values for a and b. First, let’s look at ( 1,3)U - and (5,9)U :

U(5,9) U( – 1,3)

f(x)

x 0 3 5 9– 1

Both of these uniform distributions have the same width – so they will both have the same height. Only the position is different. Next we’ll look at what happens when we make the width bigger, say doubling the width from a (5,9)U to a (1,9)U .

U(5,9)

f(x)

x 5 9

U(1,9)

1



We can see that when the width has doubled (from 4 to 8), the height has halved (from 14 to 1

8 ). Since the distribution is spread over twice as many values, the probability of

any particular value occurring must be half as much.

1.3 Moments of the uniform distribution

To find the moments for the continuous uniform distribution let’s look at ~ (2,8)X U :

U(2,8)

f(x)

x 2 8

Mean Since the distribution is symmetrical, the mean is in the middle:

2 8

( ) 52

E X+= =

We could also have found the mean by using our formula from Chapter 9:

( ) ( )E X xf x dx= Ú

Question 10.2

Use integration to prove that the mean of (2,8)U is 5.



Variance There’s no easy way to get the variance except to use our formula from Chapter 9:

2 2var( ) ( ) ( )X E X E X= -

Remembering that:

2 2( ) ( )E X x f x dx= Ú

Question 10.3

Use the above formulae to show that the variance of (2,8)U is 3.

Median Since the graph is symmetrical the median is also in the middle and is the same as the mean. Alternatively, we could prove this using the formula from Chapter 9:

M

start

( ) ( ) ½P X M f x dx£ = =Ú

Question 10.4

Use the above formula to show that the median of (2,8)U is 5 (same as ( )E X ).

Mode Since ( )f x is the same for all values in the range there is no mode, as no value occurs

with a greater probability than the rest.



In general, for a ( , )U a b the results are:

Continuous uniform distribution, ~ ( , )X U a b

1

( )f x a x bb a

= < <-

12

2112

( ) ( )

var( ) ( )

E X a b

X b a

= +

= -

These are given in the Tables on page 13 and so do not need to be memorised. The proofs are a little messy and can be found in Appendix A.

1.4 Probabilities for a uniform distribution

There are two ways of calculating the probabilities for a continuous uniform distribution: integrating the PDF or using the cumulative density function. Calculating probabilities using integration Recall from Chapter 9 that to calculate probabilities for a continuous random variable we just integrate the PDF:

( ) ( )P X f x dxb

aa b< < = Ú

This is easy to do for the continuous uniform distribution. For example, to calculate ( 8)P X > where ~ (7,12)X U we first need the PDF:

1 1 1

( )12 7 5

f xb a

= = =- -

Hence:

12 12

12 81 1 12 45 5 5 5 58

8 8

( 8) ( )P X f x dx dx xÈ ˘> = = = = - =Î ˚Ú Ú



Question 10.5

If ~ (13,30)X U , calculate:

(i) ( 24)P X <

(ii) (15 18.2)P X< < .

Calculating probabilities using ( )F x Recall from Chapter 9 that the cumulative density function, ( )F x , was:

start

( ) ( ) ( )x

F x P X x f t dt= < = Ú

For the uniform distribution ( , )U a b it is:

1

( )x x

aa

t x aF x dt

b a b a b a

-È ˘= = =Í ˙- - -Î ˚Ú

This result is also given on page 13 of the Tables. We can use this to calculate any probabilities. For example:

7

~ (7,12) ( )5

xX U F x

-fi =

10 7

( 10) (10) 0.65

P X F-< = = =

8.5 7

( 8.5) 1 ( 8.5) 1 (8.5) 1 0.75

P X P X F-> = - < = - = - =

(8.4 10.2) ( 10.2) ( 8.4)

(10.2) (8.4)

10.2 7 8.4 7

5 5

0.64 0.28 0.36

P X P X P X

F F

< < = < - <

= -

- -= -

= - =



Question 10.6

Write down the cumulative density function, ( )F x for ~ ( 2,6)X U - and use it to

calculate: (i) ( 5)P X <

(ii) ( 1.7)P X >

(iii) ( 1 5.5)P X- < < .


We can also work out conditional probabilities. For example, ( 3 | 2)P X X> > where

~ (1,5)X U . Using the conditional probability formula from Chapter 5:

( and )

( | )( )

P A BP A B

P B=

So:

( 3 and 2) ( 3) 0.5 2

( 3 | 2)( 2) ( 2) 0.75 3

P X X P XP X X

P X P X

> > >> > = = = => >

Other problems that could be asked are to calculate the values of a and b given some information such as probabilities, the mean or the variance. For example, find b if

~ (7, )X U b and ( ) 14E X = :

7

( ) 14 212

bE X b

+= = fi =

Question 10.7

The random variable X has the distribution ~ ( , )X U a b . The mean of X is 10 and the

standard deviation is 4. Calculate the values of a and b.



2 The exponential distribution

2.1 General features of the exponential distribution

Consider the lifetime of the humble light bulb. Most last a short while until they break. A few bulbs last a long time and very rarely a bulb lasts for absolutely ages. If we were to draw the PDF for the lifetime of the bulb, it would look something like this:

x

f(x)

This looks rather like the graph of the exponential ( ) xf x e-= where 0x > . Is this a

PDF? Recall from Chapter 9 that a PDF must satisfy:

( ) 0 for all

( ) 1

f x x

f x dx

≥

=Ú

The graph is clearly non-negative so all we need to do is check that the area under the graph is 1:

0

00

0 ( ) 1x xe dx e e• •- -È ˘= - = - - =Î ˚Ú

Great! We have a PDF that can be used to model lifetimes. Our next question is how can we generalise this? For example, suppose we want an exponential with a shorter tail (they could be cheap light bulbs which don’t last as long on average). We could choose:

5( ) 0xf x e x-= >

most last a short time

a few last longer

very rare to last this long



As the graph for this function looks like:

x

f(x)

Is this a PDF as well? Well it’s clearly non-negative, but since the area under xe- is 1

we can see that the area under 5xe- must be less than this. Darn! Maybe we could multiply the function by a constant, k, to scale the area up to 1:

5( ) 0xf x ke x-= >

We’ll now find out what this constant must be so that we do have a PDF:

5

0

5

0

1

15

0 1 55

x

x

ke dx

ke

kk

•-

•-

=

È ˘fi - =Í ˙Î ˚

- - = fi =

Ú

So the following function is a PDF:

5( ) 5 0xf x e x-= >

Similarly, 3( ) 3 xf x e-= will be a PDF and so on. In general, we have:

( ) 0xf x e xll -= >

Since these PDFs have the exponential function in them we call them exponential distributions.

xe 5xe



2.2 The PDF of the exponential distribution

The exponential distribution has one parameter, l . Its PDF is given by:

( ) 0xf x e xll -= >

The shortcut way of writing ‘X has an exponential distribution with parameter l ’ is: ~ ( )X Exp l

Question 10.8

Prove that:

( ) 0xf x e xll -= >

is a PDF (ie the area under this function is 1).

Once again, to get a better ‘feel’ for this distribution, we’ll look at a couple of exponential distributions with different values for l .

0

1

2

3

0 1 2 3 4 5

x

f(x)

The first thing to notice is that the graph always crosses the vertical axis at l . Increasing the value of l ‘squashes’ the graph towards to vertical axis. This means that as l increases, the exponential distribution is more likely to take lower values. A larger value of l corresponds to a ‘cheap’ light bulb that has a shorter lifetime on average.

3

1



As we decrease the value of l , the graph becomes more ‘stretched out’:

0

0.5

1

0 1 2 3 4 5

x

f(x)

With a smaller value of l the exponential distribution is less likely to take smaller values (we can see the graph is lower at the start) and more likely to take higher values (the graph is higher towards the end). A smaller value of l would correspond to a ‘deluxe’ light bulb that has a longer lifetime on average.

2.3 Moments of the exponential distribution

Mean We can find the mean by using our formula from Chapter 9:

( ) ( )E X xf x dx= Ú

Question 10.9

Use this formula to show that the mean of the exponential distribution is:

1

( )E Xl

=

1

½



How does this result tie in with our graphs of the PDFs? Let’s take a look at the means of two exponential distributions:

13

1 ( ) 1

3 ( )

E X

E X

l

l

= =

= =

The bigger the value of lambda, the smaller the mean.

0

1

2

3

0 1 2 3 4 5

x

f(x)

We can see on the graph that as l gets bigger the graph is ‘squashed’ more towards the vertical axis and so the mean will be smaller. Similarly as l gets smaller the graph is more ‘stretched out’ and so the mean will be bigger. Variance The formula for the variance from Chapter 9 is:

2 2var( ) ( ) ( )X E X E X= -

where:

2 2( ) ( )E X x f x dx= Ú

3

1

( )E X



Question 10.10

Use the above formulae to show that the variance of the exponential distribution is:

2

1var( )X

l=

Now let’s look at the variances of two exponential distributions:

19

1 var( ) 1

3 var( )

X

X

l

l

= =

= =

The bigger the value of l , the smaller the variance (ie the smaller the spread).

0

1

2

3

0 1 2 3 4 5

x

f(x)

We can see that (whilst the graphs continue indefinitely) as l gets bigger the graph is ‘squashed’ more towards the vertical axis and so the spread of values is smaller (ie more of the values are bunched towards the start). Similarly as l gets smaller the graph is more ‘stretched out’ and so the spread of values is greater.

spread ( 3)l =

spread ( 1)l =

3

1



Median To find the median we use the formula from Chapter 9:

12

start

( )M

f x dx =Ú

Question 10.11

Use this formula to show that the median of the exponential distribution is:

1

ln 2Ml

=

Mode The graph of the exponential PDF is greatest when 0x so this is the mode. However, technically it never quite reaches this point as x can only take positive values ( 0x ). In summary:

Exponential distribution, ( )Exp l

( ) 0xf x e xll -= >

2

1( )

1var( )

E X

X

l

l

=

=

These are given in the Tables on page 11 and so do not need to be memorised.

2.4 Probabilities for an exponential distribution

There are two ways of calculating the probabilities for an exponential distribution: integrating the PDF or using the cumulative density function.



Calculating probabilities using integration Recall from Chapter 9 that to calculate probabilities for a continuous random variable we just integrate the PDF:

( ) ( )b

a


This is easy to do for the exponential distribution. For example, let’s calculate ( 5)P X > where ~ (0.1)X Exp . First, we need the PDF:

0.1( ) 0.1 xf x e-=

Next, we integrate this from 5 upwards:

0.1 0.1 0.5

55 5

( 5) ( ) 0.1 0 ( ) 0.60653x xP X f x dx e dx e e• • •- - -È ˘> = = = - = - - =Î ˚Ú Ú

Question 10.12

Claim amounts for an insurer are exponentially distributed with parameter 0.005. Calculate the probability that a claim: (i) exceeds 210 (ii) is less than 100 (iii) is between 300 and 500.



Calculating probabilities using ( )F x Recall from Chapter 9 that the cumulative density function, ( )F x , was:

start

( ) ( ) ( )x

F x P X x f t dt= < = Ú

For the exponential distribution, ~ ( )X Exp l , it is:

0

0

( ) 1x

xt t xF x e dt e el l ll - - -È ˘= = - = -Î ˚Ú

This result is also given on page 11 of the Tables. We can use this to calculate any probabilities. For example:

0.1~ (0.1) ( ) 1 xX Exp F x e-fi = -

0.5( 5) (5) 1 0.39347P X F e-< = = - =

( )0.2 0.2( 2) 1 ( 2) 1 (2) 1 1 0.81873P X P X F e e- -> = - < = - = - - = =

( ) ( )0.47 0.31

(3.1 4.7) ( 4.7) ( 3.1)

(4.7) (3.1)

1 1

0.37500 0.26655 0.10844

P X P X P X

F F

e e- -

< < = < - <

= -

= - - -

= - =

Question 10.13

Write down the cumulative density function, ( )F x for ~ (0.5)X Exp and use it to

calculate: (i) ( 3)P X <

(ii) ( 0.5)P X >

(iii) (2 8)P X< < .




Given the mean of the distribution One trick that the examiners often use is to give the mean, m , of the exponential

distribution rather than the parameter, l . This means that we will have to calculate l first before we can write down the PDF and calculate probabilities. To do this we will need to use:

1

( )E Xml

= =

For example, suppose we are given an exponential distribution with mean 100:

1 1

100 0.01100

m ll

= = fi = =

We can now write down the PDF:

0.01( ) 0.01 0xf x e x-= >

This can be used to calculate probabilities.

Question 10.14

The lifetime of a certain battery is exponentially distributed with mean 500 hours. Calculate the probability that the battery lasts: (i) more than 700 hours (ii) between 400 and 600 hours.

Common Error: Students often assume that the parameter l is given without reading the question carefully to see whether the parameter or the mean is given.



Conditional probabilities We can work out conditional probabilities involving the exponential distribution using the conditional probability formula from Chapter 5:

( and )

( | )( )

P A BP A B

P B=

For example, to calculate ( 1| 2)P X X< < where ~ (2)X Exp :

2

4

( 1 and 2) ( 1) 1( 1| 2) 0.88080

( 2) ( 2) 1

P X X P X eP X X

P X P X e

-

-< < < -< < = = = =

< < -

‘Backwards’ problems We could be asked to calculate the values of l given a probability. For example, find l if ~ ( )X Exp l and ( 10) 0.05P X < = :

10

10 10

00

( 10) 1x xP X e dx e el l ll - - -È ˘< = = - = -Î ˚Ú

But since ( 10) 0.05P X < = , we have:

10 10 11 0.05 0.95 ln 0.95 0.00513

10e el l l- -- = fi = fi = - =

Question 10.15

Given that ~ ( )X Exp l and ( 40) 0.7P X > = .

(i) Calculate l . (ii) Calculate ( 105 | 80)P X X> > .



2.6 Waiting time for a Poisson distribution

The Poisson distribution is excellent for modelling the arrival of claims or the death of policyholders and is one of the most useful distributions for actuaries. We will now show that, if events occur as a Poisson distribution (with parameter l ), the waiting time between events has an exponential distribution (with parameter l ). For example, suppose the number of claims arriving per hour has a Poisson distribution with parameter l : ( )Poi l claims per hour

If X is the number of claims arriving in t hours, we would have: ~ ( )X Poi tl claims per t hours

So the probability function for the number of claims arriving in t hours is:

( )

( ) 0,1,2,3,!

xtt

P X x e xx

ll -= = =

Let T be the number of hours that we have been waiting since the last claim. The probability that we have been waiting for more than t hours for a claim is: ( )P T t>

Since we are still waiting, this means that we have had no claims in t hours: ( ) (0 claims in hours)P T t P t> =

We can find this probability using ( 0)P X = :

0

( ) (0 claims in hours)

( 0) where ~ ( )

( )

0!t

t

P T t P t

P X X Poi t

te

e

l

l

l

l -

-

> =

= =

=

=



Since we know ( )P T t> , we can find ( )P T t< as probabilities sum to 1:

( ) 1 ( ) 1 tP T t P T t e l-< = - > = -

But wait! ( )P T t< is the definition of the cumulative density function of T:

( ) ( ) 1 tF t P T t e l-= < = -

Now either we can recognise that this is the cumulative density function of ( )Exp l or

we can use it to find the PDF:

( ) ( ) tf t F t e ll -= =¢

This is the PDF of ( )Exp l . So the waiting time (in hours) has an exponential

distribution with parameter l .

Question 10.16

The number of calls arriving at a switchboard each minute has a Poisson distribution with mean 5. (i) Write down the distribution for the time between calls. (ii) Calculate the probability that the time from one call to the next exceeds

1 minute.

This proof was asked in the April 2001 Subject 101 exam – so it is worth taking the time to learn it.



3 Appendix A – proof of mean and variance for ( , )U a b

For ~ ( , )X U a b the PDF is 1

( )f xb a

=-

.

Mean The mean is given by:

( ) ( )E X xf x dx= Ú

This gives:

2 2 21 ½ ½( )

( )

bb

a a

x b aE X x dx

b a b a b a

È ˘ -= = =Í ˙- - -Í ˙Î ˚Ú

To simplify this expression, we use the difference of two squares result:

2 2( )( )x y x y x y- + = -

This gives:

2 2

12

½( ) ½( )( )( ) ( )

b a b a b aE X a b

b a b a

- - += = = +- -

Variance The variance is found using:

2 2var( ) ( ) ( )X E X E X= -

where:

2 2( ) ( )E X x f x dx= Ú



This gives:

3 3 31 1

2 2 3 3 ( )1( )

bb

a a

x b aE X x dx

b a b a b a

È ˘ -Í ˙= = =

- - -Í ˙Î ˚Ú

To simplify the expression, we use the fact that:

2 2 3 3( )( )b a a ab b b a- + + = -

This gives:

3 3 2 21 1

2 2 23 3 13

( ) ( )( )( ) ( )

b a b a a ab bE X a ab b

b a b a

- - + += = = + +

- -

Substituting, we get:

2 2

22 21 13 2

2 2 21 13 4

2 2 2 21 13 4

2 21 1 112 6 12

2 2112

2112

var( ) ( ) ( )

( ) ( )

( ) ( )

( ) ( 2 )

( 2 )

( )

X E X E X

a ab b a b

a ab b a b

a ab b a ab b

a ab b

a ab b

b a

= -

È ˘= + + - +Î ˚

= + + - +

= + + - + +

= - +

= - +

= -



Extra practice questions Section 1: The uniform distribution

P10.1 If ~ (15,21)X U :

(i) Sketch the PDF of X. (ii) Calculate: (a) ( 18)P X >

(b) (19.5 20.3)P X< < .

(iii) Calculate the mean and variance of X.

P10.2 Subject C1, September 1999, Q10 (part) The random variable X is distributed uniformly over [2, 4]. (i) State the value of the population mean ( )E X . [1]

(ii) Show that 1

var( )3

X = . [2]

[Total 3]

P10.3 A random variable, Y, is uniformly distributed with mean 7 and standard deviation 5. (i) Obtain the PDF of Y. (ii) Calculate ( 5 | 0)P Y Y> > .



Section 2: The exponential distribution

P10.4 Subject 101, April 2002, Q2 Claim amounts are modelled as an exponential random variable with mean £1,000. (i) Calculate the probability that one such claim amount is greater than £5,000. [1] (ii) Calculate the probability that a claim amount is greater than £5,000 given that it

is greater than £1,000. [2] [Total 3]

P10.5 Subject C1, April 1997, Q10 Claim sizes are modelled by an exponential distribution with mean £50. Determine the probabilities that an individual claim: (i) exceeds £50 [1] (ii) exceeds £200 [1] (iii) exceeds £200, given that it exceeds £50. [2] [Total 4]

P10.6 Subject C1, April 1998, Q11 The median, m, of the distribution of a continuous random variable X is defined by the relation ( ) 0.5P X m< = .

Show that the median of an exponential random variable is about 0.7 of the mean. [4]

P10.7 Subject C1, April 1998, Q6 (adapted) The number of accidents in a factory is represented by a Poisson distribution averaging 2 accidents per 5 days. Evaluate the probability that the time from one accident to the next exceeds 3 days. [2]



Chapter 10 Summary The uniform distribution If X has a uniform distribution between a and b then we write ~ ( , )X U a b . It has

PDF:

1

( )f x a x bb a

= < <-

The moments are:

12

2112

( ) ( )

var( ) ( )

E X a b

X b a

= +

= -

The median is the same as the mean (by symmetry). There is no mode.

Probabilities can be found by integrating the PDF or using ( )x a

F xb a

-=-

.

The exponential distribution If X has an exponential distribution with parameter l then we write ~ ( )X Exp l . It

models the waiting time between events occurring in a Poisson process. It has PDF:

( ) 0xf x e xll -= >

The moments are:

2

1

1

( )

var( )

E X

X

l

l

=

=

The median can be found by integration or using ( )F x to be 1 ln 2l . The mode is 0.

Probabilities can be found by integrating the PDF or using ( ) 1 xF x e l-= - .




The graph of the PDF would be:

f(x)

x 62

The area under this graph must be 1 (since it’s a PDF). Our distribution is from 2 to 6 so the width is 6 2 4- = . Therefore the PDF is:

area 1

( ) heightwidth 4

f x = = =

Solution 10.2

The PDF of (2,8)U is:

1 1

( ) 2 68 2 6

f x x= = < <-

So:

816

2

82112 2

2 2112

( ) ( )

(8 2 )

5

E X xf x dx

x dx

x

=

=

È ˘= Î ˚

= -

=

Ú

Ú



Solution 10.3


1 1

( ) 2 68 2 6

f x x= = < <-

So:

2 2

82 1

62

83118 2

3 3118

( ) ( )

(8 2 )

28

E X x f x dx

x dx

x

=

=

È ˘= Î ˚

= -

=

Ú

Ú

This gives:

2 2

2

var( ) ( ) ( )

28 5

3

X E X E X= -

= -

=



Solution 10.4


1 1

( ) 2 68 2 6

f x x= = < <-

So the median, M, is given by:

1 16 2

2

M

dx =Ú

Solving this gives:

1 16 22

1 16 2( 2)

5

Mx

M

M

È ˘ =Î ˚

- =

=



Solution 10.5

The PDF of ~ (13,30)X U is:

1

( ) 13 3017

f x x= < <

Integrating the PDF to find the probabilities gives:

(i) 24

117

13

24117 13

117

1117

( 24)

(24 13)

P X dx

x

< =

È ˘= Î ˚

= -

=

Ú

(ii) 18.2

117

15

18.2117 15

117

(15 18.2)

(18.2 15)

0.18824

P X dx

x

< < =

È ˘= Î ˚

= -

=

Ú



Solution 10.6

The cumulative density function of ~ ( 2,6)X U - is:

2

( ) ( )8

x a xF x P X x

b a

- += < = =-

(i) 5 2 7

( 5) (5) 0.8758 8

P X F+< = = = =

(ii) ( 1.7) 1 ( 1.7) since probabilities sum to 1

1 (1.7)

3.71

8

0.5375

P X P X

F

> = - <

= -

= -

=

(iii) ( 1 5.5) ( 5.5) ( 1)

(5.5) ( 1)

7.5 1

8 8

0.8125

P X P X P X

F F

- < < = < - < -

= - -

= -

=



Solution 10.7

For ~ ( , )X U a b we have:

12

2 2112

( ) ( ) 10

var( ) ( ) 4

E X a b

X b a

= + =

= - =

Rearranging the mean gives 20b a= - . Substituting this into the variance:

2112

2

(20 ) 16

(20 2 ) 192

a a

a

- - =

- =

Solving this:

20 2 192

20 192

2

16.93 or 3.07

a

a

- = ±

±=

=

20 3.07 16.93 or 20 16.93 3.07b bfi = - = = - =

Since a b< , we have 3.07a = and 16.93b = . Solution 10.8

The area under the graph of ( ) xf x e ll -= is given by:

0

00

0 ( ) 1x xe dx e el ll• •- -È ˘= - = - - =Î ˚Ú

Since the area under this graph is 1 and ( ) 0f x > , ( )f x is a PDF.



Solution 10.9

Using the formula for the mean, we get:

0

( ) ( ) xE X xf x dx x e dxll•

-= =Ú Ú


[ ]b b

ba

a a

dv duu dx uv v dx

dx dx= -Ú Ú

Setting u x= and xdve

dxll -= we get:

00 0

10

01

1

0

0 ( )

x x x

x

x e dx xe e dx

e

e

l l l

ll

l

l

l• ••- - -

•-

È ˘= - - -Î ˚

È ˘= + -Î ˚

= - -

=

Ú Ú



Solution 10.10

The second moment, 2( )E X , is given by:

2 2 2

0

( ) ( ) xE X x f x dx x e dxll•

-= =Ú Ú


[ ]b b

ba

a a

dv duu dx uv v dx

dx dx= -Ú Ú

Setting 2u x= and xdve

dxll -= we get:

2 2

00 0

0 0

2

0 2 2

x x x

x x

x e dx x e xe dx

xe dx xe dx

l l l

l l

l• ••- - -

• •- -

È ˘= - - -Î ˚

= + =

Ú Ú

Ú Ú

We now need to use integration by parts to calculate this integral. Setting 2u x= and

xdve

dxl-= we get:

2

2

2

1 10

0 0

1

0

02

2

2 2 2

0 2

0 ( )

x x x

x

xe dx x e e dx

e

e

l l ll l

ll

l

l

• ••- - -

•-

È ˘= - - -Î ˚

È ˘= + -Í ˙Î ˚

= - -

=

Ú Ú

Hence:

( )2 2

22 2 2 1 1var( ) ( ) ( )X E X E X ll l= - = - =



Solution 10.11

Using:

12

start

( )M

f x dx =Ú

we get:

12

0

Mxe dxll - =Ú

120

12

12

12

12

1

ln

ln

Mx

M

M

e

e

e

M

M

l

l

l

l

l

-

-

-

È ˘fi - =Î ˚

- =

=

- =

= -

Now using the fact that ln ln aa x x= gives ( ) 11 12 2ln ln ln 2

-- = = . So:

1

ln 2

ln 2

M

M l

lfi =

=



Solution 10.12

The PDF of ~ (0.005)X Exp is:

0.005( ) 0.005 0xf x e x-= >

Using integration:

(i) 0.005

210

0.005

210

1.05

( 210) 0.005

0 ( )

0.34994

x

x

P X e dx

e

e

•-

•-

-

> =

È ˘= -Î ˚

= - -

=

Ú

(ii) 100

0.005

0

1000.005

0

0.5 0

0.5

( 100) 0.005

( )

1 0.39347

x

x

P X e dx

e

e e

e

-

-

-

-

< =

È ˘= -Î ˚

= - - -

= - =

Ú

(iii) 500

0.005

300

5000.005

300

2.5 1.5

1.5 2.5

(300 500) 0.005

( )

0.14105

x

x

P X e dx

e

e e

e e

-

-

- -

- -

< < =

È ˘= -Î ˚

= - - -

= - =

Ú



Solution 10.13

The cumulative density function for ~ (0.5)X Exp is:

0.5( ) ( ) 1 xF x P X x e-= < = -

(i)

1.5

( 3) (3)

1

0.77687

P X F

e-

< =

= -

=

(ii)

( )0.25

0.25

( 0.5) 1 ( 0.5)

1 (0.5)

1 1

0.77880

P X P X

F

e

e

-

-

> = - <

= -

= - -

=

=

(iii)

( ) ( )4 1

1 4

(2 8) ( 8) ( 2)

(8) (2)

1 1

0.34956

P X P X P X

F F

e e

e e

- -

- -

< < = < - <

= -

= - - -

= -

=



Solution 10.14

We have an exponential distribution with mean 500 hours:

1 1

500 0.002500

m ll

= = fi = =

The PDF for this exponential distribution is:

0.002( ) 0.002 0xf x e x-= >

We can now find the required probabilities:

(i) 0.002

700

0.002

700

1.4

( 700) 0.002

0 ( )

0.24660

x

x

P X e dx

e

e

•-

•-

-

> =

È ˘= -Î ˚

= - -

=

Ú

or using the cumulative density function:

( )1.4 1.4( 700) 1 ( 700) 1 (700) 1 1 0.24660P X P X F e e- -> = - < = - = - - = =

(ii) 600

0.002

400

6000.002

400

1.2 0.8

0.8 1.2

(400 600) 0.002

( )

0.14813

x

x

P X e dx

e

e e

e e

-

-

- -

- -

< < =

È ˘= -Î ˚

= - - -

= - =

Ú


( ) ( )1.2 0.8 0.8 1.2

(400 600) ( 600) ( 400) (600) (400)

1 1 0.14813

P X P X P X F F

e e e e- - - -

< < = < - < = -

= - - - = - =



Solution 10.15

(i) We are given:

40 40

4040

( 40) 0 ( )x xP X e dx e e el l l ll• •- - - -È ˘> = = - = - - =Î ˚Ú


( )40 40( 40) 1 (40) 1 1P X F e el l- -> = - = - - =

But since ( 40) 0.7P X > = , we have:

40 1400.7 ln 0.7 0.00892e l l- = fi = - =

(ii) Using the conditional probability formula, we get:

( 105 and 80)( 105 | 80)

( 80)

( 105)

( 80)

P X XP X X

P X

P X

P X

> >> > =>

>=>

So we need to calculate ( 80)P X > and ( 105)P X > :

0.00892 80

8080

( 80) 0 ( ) 0.49x xP X e dx e el ll• •- - - ¥È ˘> = = - = - - =Î ˚Ú

0.00892 105

105105

( 105) 0 ( ) 0.392x xP X e dx e el ll• •- - - ¥È ˘> = = - = - - =Î ˚Ú

Hence:

0.392

( 105 | 80) 0.80.49

P X X> > = =



Solution 10.16

(i) The number of calls per minute arriving at the switchboard has the distribution: ~ (5)X Poi

So the waiting time between calls, in minutes, has the distribution:

~ (5)T Exp

(ii) 5 5 5

11

( 1) 5 0 ( ) 0.00674t tP T e dt e e• •- - -È ˘> = = - = - - =Î ˚Ú




P10.1 (i) The PDF for ~ (15,21)X U is:

16( ) 15 21f x x= £ £

The graph is:

U(15,21)

f(x)

x 15 21

1

6

(ii) (a) 21

211 1 1 16 6 6 218

18

( 18) (21 18)P X dx xÈ ˘> = = = - =Î ˚Ú

Alternatively, using the cumulative density function gives:

18 15 121 15 2( 18) 1 ( 18) 1 (18) 1P X P X F --> = - < = - = - =

(b) 20.3

20.31 1 16 6 619.5

19.5

(19.5 20.3) (20.3 19.5) 0.13P X dx xÈ ˘< < = = = - =Î ˚Ú

Alternatively, using the cumulative density function gives:

20.3 15 19.5 1521 15 21 15

(19.5 20.3) ( 20.3) ( 19.5)

(20.3) (19.5)

0.13

P X P X P X

F F

- -- -

< < = < - <

= -

= - =



(iii) Using our formulae:

12

2112

( ) (15 21) 18

var( ) (21 15) 3

E X

X

= + =

= - =

Alternatively, from first principles:

21212 2 21 1 1

6 12 121515

( ) (21 15 ) 18E X x dx xÈ ˘= = = - =Î ˚Ú

21

212 2 3 3 31 1 16 18 1815

15

( ) (21 15 ) 327E X x dx xÈ ˘= = = - =Î ˚Ú

2 2 2var( ) ( ) ( ) 327 18 3X E X E Xfi = - = - =

P10.2 Subject C1, September 1999, Q10 (part) The PDF of ~ (2,4)X U is:

1 1

( ) 2 44 2 2

f x x= = < <-

(i) ( ) 3E X = by symmetry.

(ii) 4

42 2 3 3 31 1 1 12 6 6 32

2

( ) (4 2 ) 9E X x dx xÈ ˘= = = - =Î ˚Ú

2 2 21 1

3 3var( ) ( ) ( ) 9 3X E X E Xfi = - = - =



P10.3 (i) For ~ ( , )Y U a b we have:

12

2 2112

( ) ( ) 7

var( ) ( ) 5

E X a b

X b a

= + =

= - =

Rearranging the mean gives 14b a= - . Substituting this into the variance:

2 2112 (14 ) 25 (14 2 ) 300a a a- - = fi - =

Solving this:

14 2 300

14 300

2

1.66 or 15.66

a

a

- = ±

±=

= -

14 1.66 15.66 or 14 15.66 1.66b bfi = - - = = - = -

Since a b< , we have 1.66a = - and 15.66b = . Hence:

117.32( ) 1.66 15.66f y y= - < <


( 5 and 0) ( 5)

( 5 | 0)( 0) ( 0)

P Y Y P YP Y Y

P Y P Y

> > >> > = => >

15.66

15.661 117.32 17.32 5

5

( 5) 0.61547P Y dx xÈ ˘> = = =Î ˚Ú

15.66

15.661 117.32 17.32 0

0

( 0) 0.90415P Y dx xÈ ˘> = = =Î ˚Ú

Hence:

0.61547

( 5 | 0) 0.6810.90415

P Y Y> > = =



P10.4 Subject 101, April 2002, Q2 Be careful! The mean is £1,000, so:

1 1

1,000 0.0011,000

m ll

= = fi = =

This gives a PDF of:

0.001( ) 0.001 0xf x e x-= >

(i) We find the probability by integrating the PDF:

0.001 0.001

5,0005,000

5

( 5,000) 0.001

0 ( ) 0.00674

x xP X e dx e

e

• •- -

-

È ˘> = = -Î ˚

= - - =

Ú

or by using the cumulative density function:

5( 5,000) 1 ( 5,000) 1 (5,000) 0.00674P X P X F e-> = - < = - = =


( 5,000 and 1,000)( 5,000 | 1,000)

( 1,000)

( 5,000)

( 1,000)

P X XP X X

P X

P X

P X

> >> > =>

>=>

So we need to calculate ( 1,000)P X > by integrating the PDF:

0.001 0.001

1,0001,000

1

( 1,000) 0.001

0 ( ) 0.36788

x xP X e dx e

e

• •- -

-

È ˘> = = -Î ˚

= - - =

Ú


1( 1,000) 1 ( 1,000) 1 (1,000) 0.36788P X P X F e-> = - < = - = =



Substituting this into our conditional probability formula gives:

0.00674

( 5,000 | 1,000) 0.01830.36788

P X X> > = =

P10.5 Subject C1, April 1997, Q10 Be careful! The mean is £50, so:

1 1

5050

m ll

= = fi =

This gives a PDF of:

1501

50( ) 0x

f x e x-= >

(i) We find the probability by integrating the PDF:

1 150 50 11

505050

( 50) 0 ( ) 0.36788x x

P X e dx e e• •

- - -È ˘> = = - = - - =Í ˙Î ˚Ú


1( 50) 1 ( 50) 1 (50) 0.36788P X P X F e-> = - < = - = =

(ii) We find the probability by integrating the PDF:

1 150 50 41

50200200

( 200) 0 ( ) 0.01832x x

P X e dx e e• •

- - -È ˘> = = - = - - =Í ˙Î ˚Ú


4( 200) 1 ( 200) 1 (200) 0.01832P X P X F e-> = - < = - = =



(iii) Using the conditional probability formula:

( 200 and 50) ( 200)( 200 | 50)

( 50) ( 50)

0.018320.0498

0.36788

P X X P XP X X

P X P X

> > >> > = => >

= =

P10.6 Subject C1, April 1998, Q11 Using ( ) 0.5P X m< = , we get:

0

( ) 0.5m

xP X m e dxll -< = =Ú

120

12

12

12

1

1

ln

ln 2

ln 2

mx

m

m

e

e

e

m

m

m

l

l

l

l

l

l

-

-

-

È ˘fi - =Î ˚

- =

=

- =

=

=

Since the mean is 1lm = , we can see the median is ln 2 0.693= times the mean.



P10.7 Subject C1, April 1998, Q6 (adapted) 2 accidents per 5 days is 0.4 accidents per day. So, if X is the number of claims per day we have: ~ (0.4)X Poi

This means the waiting time, T, between these claims is: ~ (0.4)T Exp

The probability that the waiting time is greater than 3 days can be found by integration:

0.4 0.4 1.2

33

( 3) 0.4 0 ( ) 0.30119t tP T e dt e e• •- - -È ˘> = = - = - - =Î ˚Ú

Alternatively, we could use the cumulative density function:

1.2( 3) 1 ( 3) 1 (3) 0.30119P T P T F e-> = - < = - = =

Stats Pack-11: The normal distribution Page 1


Chapter 11

The normal distribution

Links to CT3: Chapter 2 Section 5.4. Syllabus objectives: (ii)2. Evaluate probabilities (by calculation or by referring to Tables as appropriate)


0 Introduction

In this chapter we look at the normal distribution. This is a continuous distribution that occurs naturally, for example the weights of babies at birth, the time taken to get to work every day, or the height of adult males. These variables all follow the same pattern. There will be some low values and some high values, but the majority of values will lie somewhere in the ‘middle’. This distribution has a number of applications in Subject CT3 and as such occurs in many other topic areas.

Stats Pack-11: The normal distribution


1 Features of the normal distribution

1.1 General features

If we were to plot a histogram of the heights of all female student actuaries, we would probably end up with a histogram like this:

x

height (cm)130 140 150 160 170 180 190 200

y

perc

enta

ge

2

4

6

8

10

12

14

As we sample a larger and larger group (and use smaller classes) we will approach the following graph:

x

height (cm)130 140 150 160 170 180 190 200

y

perc

enta

ge

2

4

6

8

10

12

14

The distribution is symmetrical, with most heights around the average of 165cm and fewer and fewer people as we approach the extremes. This symmetrical bell-shaped distribution is called the normal distribution. It occurs naturally in many other areas, for example: weights, IQ’s, exam scores, and so on.



1.2 The PDF of the normal distribution

The normal distribution depends only on two parameters, m and 2s , which you will

recognise as the mean and variance. So m and 2s are the only parameters that appear

in the probability density function (PDF):

2

2( )

22

1( )

2

x

f x e x

ms

ps

--= -• < < •

The shortcut way of writing ‘X has a normal distribution with mean m and variance

2s ’ is:

2~ ( , )X N m s

Now the PDF is grotty, but it does give the lovely symmetrical bell-shaped curve that we saw earlier. To get a better ‘feel’ for the shape of the PDF, we’ll look at a number

of normal distributions with different values of m and 2s .

Let’s start with the distribution of the heights of our female actuaries, which had a mean

of 165cm and a standard deviation of 12cm (so variance of 212 2cm ), ie 2~ (165,12 )X N :

0

0.05

110 120 130 140 150 160 170 180 190 200 210 220

x

f(x)

N(165,12²)

We can see that the PDF is greatest around the mean of 165cm and decreases as it moves further and further away from the mean.



So what happens if we increase the mean from 165cm to 175cm? Well, we would expect the female actuaries to be taller on average:

0

0.05

110 120 130 140 150 160 170 180 190 200 210 220

x

f(x)

N(165,12²) N(175,12²)

As you can see, the graph is simply shifted to the right, as all female actuaries are now 10 cm taller on average. Since we didn’t alter the spread (the variance) the shape is still the same. Similarly, if we were to reduce the mean from 165 cm to 150 cm we would see that the graph shifts to the left. All the female actuaries are 15 cm shorter on average. Again, the shape is the same, as we didn’t change the variance.

0

0.05

110 120 130 140 150 160 170 180 190 200 210 220

x

f(x)

N(165,12²) N(150,12²)

In summary, changing the mean, m , simply moves the position of the normal PDF

along the x-axis – it doesn’t change the shape of the PDF.



So what happens if we increase the variance from 2 212 cm to 2 215 cm ? Well, we

would expect the heights of the female actuaries to be more spread out:

0

0.05

110 120 130 140 150 160 170 180 190 200 210 220

x

f(x)

N(165,12²) N(165,15²)

Here we can see that the heights of the actuaries are spread over more values than before. Consequently, there is a smaller percentage of people in the middle due to the greater variety in the heights.

If we now reduce the variance from 2 212 cm to 2 210 cm , we should see a smaller range

of heights. Consequently, there will be a larger percentage of people in the middle as more people have a similar height:

0

0.05

110 120 130 140 150 160 170 180 190 200 210 220

x

f(x)

N(165,12²) N(165,10²)

In summary, changing the variance, 2s , alters the shape of the PDF of the normal distribution. A smaller variance ‘squashes’ it, whereas a larger variance ‘stretches’ it.



One last thing to note is that nearly all (in fact 99.7%) of the results are within

3 standard deviations of the mean. So for our female actuaries, with 2(165,12 )N ,

nearly all the heights are between (165 3 12,165 3 12) (129,201)- ¥ + ¥ = :

0

0.05

110 120 130 140 150 160 170 180 190 200 210 220

Question 11.1

For each part sketch the two graphs on the same diagram:

(i) 2~ (50,5 )X N and 2~ (50,10 )Y N

(ii) 2~ (50,5 )X N and 2~ (60,5 )W N .

Recall that one of the properties of a PDF is that the area under its graph equals to 1:

( ) 1f x dx•

-•

=Ú

The proof of this result for the normal distribution PDF is way off the syllabus! It requires a technique called ‘complex analysis’, which you might have met if you studied Maths at university.

99.7%

3 3



1.3 Moments of the normal distribution

The moments are easily defined for 2~ ( , )X N m s as the mean is just m and the

variance is just 2s !

2

( )

var( )

E X

X

m

s

=

=

Question 11.2

What is 2( )E X for 2~ ( , )X N m s ?

From the graph of the PDF of a normal distribution, it is clear that the mode is the same as the mean. Also, by symmetry the median is the same as the mean.

medianmode

1.4 Probabilities of a normal distribution

To calculate probabilities for a continuous random variable we just integrate the PDF:

( ) ( )b

a


However, as mentioned earlier, integrating the PDF of a normal distribution is a nightmare, so we need another way. We could use Tables, but with so many

combinations of m and 2s which one do we tabulate? It turns out that we only need

(0,1)N .



2 The standard normal distribution

The standard normal distribution is a special case of the normal distribution with a mean of 0 and a variance of 1. We use the letter Z to stand for a random variable with a standard normal distribution: ~ (0,1)Z N

The PDF of the standard normal distribution is obtained by simply substituting 0m =

and 2 1s = into the normal PDF. This gives:

2½1

( )2

zf z e zp

-= -• < < •

The standard normal PDF is often denoted ( )zf . It has the following shape:

0

0.1

0.2

0.3

0.4

0.5

-4 -3 -2 -1 0 1 2 3 4

x

f(x)

We can see that the PDF is symmetrical about zero and that nearly all of the values lie within 3 standard deviations of the mean, ie between (0 3 1,0 3 1) ( 3,3)- ¥ + ¥ = - .

The moments are:

( ) 0

var( ) 1

E Z

Z

=

=

The median is the same as the mean (by symmetry), as is the mode.



3 Probabilities of the standard normal distribution

The standard normal distribution is the only normal distribution that is tabulated. In this section, we will look at how to find probabilities using this table.

3.1 Simple probabilities

The cumulative distribution function of the standard normal is tabulated on pages 160-161 of the Tables. If you have yet to purchase your copy of the Tables, this table has been included in Appendix A. Recall that the cumulative density function, ( )F x , was defined to be:

( ) ( )F x P X x= <

ie the probability that the random variable is less than x. The standard normal cumulative density function is often denoted ( )zF :

( ) ( )z P Z zF = <

Notice how the table only gives ‘less than’ probabilities for positive values of z. Since the area under the curve represents the probability, this is shown as:

0 z

(z) = P(Z < z)

So for probabilities other than ( )P Z z< we will have to be a bit cunning! We will now

go through all four of the possible combinations that could be asked.



Calculating ( )P Z a< Firstly, we’ll find the probability that Z (the standard normal random variable) is less than a positive number, say, a:

0 a

(a) = P(Z < a)

Well, these probabilities are the ones tabulated! So we simply read the values from the table. For example: ( 1.39) 0.91774P Z < =

1.39 is the value of x given (in bold) in the columns of the table on pages 160-161 of the Tables (or in the table in Appendix A). The probability is given by ( )xF ie the next

column (which is not bold).

Question 11.3

Find these probabilities: (i) ( 1.23)P Z <

(ii) ( 2.725)P Z < .



Calculating ( )P Z a> So how do we find the probability that Z (the standard normal random variable) is more than a positive number, say, a:

0 a

P(Z > a)

Remember, from Chapter 8, that the whole area under the graph (ie the total probability) is 1. This gives the following relationship: ( ) ( ) 1P Z a P Z a< + > =

Therefore: ( ) 1 ( )P Z a P Z a> = - <

So all we do is look up the ‘less than’ probability given in the Tables and then subtract it from 1. For example: ( 0.26) 1 ( 0.26) 1 0.60257 0.39743P Z P Z> = - < = - =

Question 11.4

Find these probabilities: (i) ( 2.17)P Z >

(ii) ( 0.08)P Z > .

Using the notation ( ) ( )z P Z zF = < , we could write ‘more than’ probabilities as:

( ) 1 ( )P Z a a> = -F



Calculating ( )P Z a< - Now we’ll find the probability that Z (the standard normal random variable) is less than a negative number, say, a- :

0 – a

P(Z < – a)

The problem is that negative values are not tabulated. So how do we do it? We use the fact that the normal distribution is symmetrical. By symmetry, the area shaded in the diagram above is exactly the same as the area shaded in the diagram below:

0 a

P(Z > a)

This area corresponds to the probability that Z (the standard normal random variable) is greater than the positive number a: ( ) ( )P Z a P Z a< - = >

This is handy! We just worked out how to find ‘more than’ probabilities on the previous page. So: ( ) ( ) 1 ( )P Z a P Z a P Z a< - = > = - <



For example, to find: ( 3.08)P Z < -

We first use symmetry: ( 3.08) ( 3.08)P Z P Z< - = >

We now use the method for calculating ‘more than’ probabilities: ( 3.08) 1 ( 3.08) 1 0.99896 0.00104P Z P Z> = - < = - =

Perhaps the easiest way to remember the symmetry result (other than drawing a diagram) is to note that we swap the sign of the inequality (from ‘less than’ to ‘more than’) and we swap the sign of the number (from negative to positive). So in short we ‘swap the sign and swap the sign’.

Question 11.5

Find these probabilities: (i) ( 1.50)P Z < -

(ii) ( 0.21)P Z < -

(iii) ( 2.05)P Z < - .

Using the notation ( ) ( )z P Z zF = < , we could write these probabilities as:

( ) ( )P Z a a< - = F -

We can then write the relationship as: ( ) 1 ( )a aF - = -F

ie: ( ) 1 ( )P Z a a< - = -F



Calculating ( )P Z a> - Now we’ll find the probability that Z (the standard normal random variable) is greater than a negative number, say, a- :

0 – a

P(Z > – a)

Again, since negative values are not tabulated we use the fact that the normal distribution is symmetrical. By symmetry, the area shaded in the diagram above is exactly the same as the area shaded in the diagram below:

0 a

P(Z < a)

This area corresponds to the probability that Z (the standard normal random variable) is less than the positive number a: ( ) ( )P Z a P Z a> - = <

Hey, this is the probability that’s tabulated! So we can just read this probability from the Tables.



For example, to find: ( 2.62)P Z > -

First we use symmetry: ( 2.62) ( 2.62)P Z P Z> - = <

We now simply read this probability from the Tables: ( 2.62) 0.99560P Z < =

Once again we can remember the symmetry result by noting that we swap the sign of the inequality (from ‘more than’ to ‘less than’) and we swap the sign of the number (from negative to positive). So in short we ‘swap the sign and swap the sign’.

Question 11.6

Find these probabilities: (i) ( 3.94)P Z > -

(ii) ( 0.73)P Z > - .

Using the notation ( ) ( )z P Z zF = < , we could write these probabilities as:

( ) ( )P Z a a> - = F



Summary We have now met all four of the probabilities that you could be asked:

Important result

( )P Z a< read off the Tables.

( )P Z a> 1 ( )P Z a- < since area (and total probability) equals 1.

( )P Z a< - ( )P Z a> using symmetry (swap the sign and swap the sign).

1 ( )P Z a= - < since area (and total probability) equals 1.

( )P Z a> - ( )P Z a< using symmetry (swap the sign and swap the sign).

Since we will be using the normal distribution more than any other, it is vital that you can calculate these probabilities quickly. Take the time to learn the methods above. Try to do the following question without referring to the summary above.

Question 11.7

Find these probabilities: (i) ( 3.6)P Z >

(ii) ( 0.76)P Z > -

(iii) ( 1.98)P Z <

(iv) ( 2.41)P Z < - .



3.2 Compound probabilities

We can now calculate ‘single’ probabilities, like ( 3)P Z < or ( 1)P Z > - , so how can

we use this to calculate ‘compound’ probabilities like ( 1 3)P Z- < < ?

The easiest way to see how to calculate these probabilities is to look at a diagram:

3– 1

The probability ( 1 3)P Z- < < is the shaded area between 1- and 3. How can we get

this from ‘single’ probabilities? Imagine that the normal distribution is a piece of paper that we are cutting out and we want to just leave the shaded part. It can be made by starting with the ( 3)P Z < part and subtracting the ( 1)P Z < - :

3– 1 – 1 3

– =

( 3)P Z < - ( 1)P Z < - = ( 1 3)P Z- < <

So, in general we have:

Important result

( ) ( ) ( )P a Z b P Z b P Z a< < = < - < for a b<

So to calculate a ‘compound’ probability, we first split it up into two ‘single’ probabilities. We then calculate each of those ‘single’ probabilities as before.



Completing our example, we get: ( 1 3) ( 3) ( 1)P Z P Z P Z- < < = < - < -

We can read ( 3)P Z < directly from the Tables:

( 3) 0.99865P Z < =

To get ( 1)P Z < - , we need to use symmetry to get a positive value and then with a little

rearranging we get ( 1)P Z < which we can read from the Tables:

( 1) ( 1)

1 ( 1)

1 0.84134 0.15866

P Z P Z

P Z

< - = >

= - <

= - =

Now we substitute these two answers back into our original equation:

( 1 3) ( 3) ( 1)

0.99865 0.15866

0.83999

P Z P Z P Z- < < = < - < -

= -

=

These questions are common, so it is vital that you can calculate these probabilities quickly. It just takes a little practice…

Question 11.8

Find these probabilities: (i) (1.24 2.19)P Z< <

(ii) ( 0.92 0.83)P Z- < <

(iii) ( 2.92 1.67)P Z- < < - .

Common Error: Many students calculate ( ) ( ) ( )P a Z b P Z b P Z a< < = < - > instead.



3.3 Probabilities involving interpolation

We will look at one last thing before we move on to worrying about how we can calculate probabilities for any normal distribution other than the standard normal! How do we calculate probabilities like ( 0.426)P Z < ?

The Tables only have ( 0.42)P Z < and ( 0.43)P Z < tabulated. So how do we get a

probability for a value that is in between these? We can use linear interpolation like we did in Chapter 2 (when we were calculating the position of the median in a group). Linear interpolation assumes that the probabilities increase linearly between values. This means that we can use proportions to find our ‘in between’ probability. Consider a number line:

0.66276 0.66640p

0.42 0.426 0.43 value

probability

The proportion of the ‘length’ that 0.426 is between 0.42 and 0.43:

0.426 0.42

0.60.43 0.42

- =-

The proportion of the ‘length’ that the probability, p, is between 0.66278 and 0.66640:

0.66276

0.66640 0.66276

p --

For linear interpolation, we assume that the probabilities are spread out linearly between values. This means that the proportions will be equal. Hence:

0.66276

0.60.66640 0.66276

p - =-

Rearranging this we get: 0.66276 0.6 (0.66640 0.66276) 0.66494p = + ¥ - =



You may think “is it worth it?” Well if we had just rounded 0.426 to 0.43, our answer would have been 66.64% instead of 66.49%. This is fairly serious. Not convinced? How about if I told you that all of the examiners solutions to Subject CT3 use interpolation? So now you’re convinced! So we need a quick way of getting to the last line. The first thing to notice is that the proportion 0.6 is just the third decimal place of 0.436. So had we been finding the probability for 1.284 we would have used a proportion of 0.4. In general, we have: (start probability) 0 (difference between probabilities)p = + ¥.

We’ll use this shortcut method to calculate a slightly messier probability: ( 1.793)P Z < -

First we need to change it into a probability we can read from the Tables. Using the ‘swap the sign, swap the sign’ symmetry rule and the ‘more than’ rule gives: ( 1.793) ( 1.793) 1 ( 1.793)P Z P Z P Z< - = > = - <

Now we look up the probabilities each side of 1.793:

( 1.79) 0.96327P Z < = and ( 1.80) 0.96407P Z < =

Now we use our linear interpolation rule: ( 1.79 ) 0.96327 0. (0.96407 0.96327) 0.96351P Z < = + ¥ - =3 3

So we have: ( 1.793) 1 0.96351 0.03649P Z < - = - =

proportion: just use 3rd decimal place

(onwards)



Question 11.9

Calculate these probabilities using linear interpolation: (i) ( 1.048)P Z <

(ii) ( 0.271)P Z >

(iii) ( 2.389)P Z > -

(iv) ( 0.704 0.897)P Z- < < .



4 Probabilities for any normal distribution

So far we have only calculated probabilities for the standard normal distribution, ~ (0,1)Z N . How on earth is this helpful for calculating probabilities for any other

normal distribution?

4.1 Transforming normal distributions

Well, we use the fact that a linear function of a normal distribution is also a normal

distribution. For example, if X has a normal distribution, say 2~ (50,3 )X N , then

2 10X + also has a normal distribution:

0

0.05

0.1

0.15

30 40 50 60 70 80 90 100 110 120 130 140

x

f(x)

X 2X+10

All we need to do now is to identify the new mean and variance of our transformed function. First let’s look at how the mean has changed. Recall from Chapter 9 the following result for any distribution: ( ) ( )E aX b aE X b+ = +

ie if we multiply the random variable by a and then add b, the mean of the random variable is also multiplied by a with b added on.

Now we had 2~ (50,3 )X N with ( ) 50E X = , so 2 10X + should have a mean of:

2 ( ) 10 2 50 10 110E X + = ¥ + =



We can see on the diagram that 2 10X + does indeed have a mean of 110. Next let’s look at how the variance has changed. Recall from Chapter 9 the following result for any distribution:


ie if we multiply the random variable by a and then add b, the variance of the random

variable is multiplied by 2a .

Now we had 2~ (50,3 )X N with 2var( ) 3X = , so 2 10X + should have a variance of:

2 2 2 22 var( ) 2 3 6X = ¥ =

We can see this on the diagram (remembering that the majority of the values of the normal distribution are spread over 3 standard deviations each side of the mean). 2 10X + is spread over (110 3 6,110 3 6) (92,128)- ¥ + ¥ = .

So in general, we have:

Important result

If 2~ ( , )X N m s then 2 2~ ( , )aX b N a b am s+ + .

The formal proof of this result requires ‘moment generating functions’, which are covered in Chapter 5 of the Subject CT3 notes.

Question 11.10

If 2~ (100, 4 )X N , write down the distribution of:

(i) 3 5X + (ii) ½ 20X - (iii) ¼( 100)X - .



4.2 Standardising

We now know how to change one type of normal distribution into another normal distribution. So how does this help us find probabilities for any normal distribution? What we are going to do is transform our normal distribution into the standard normal distribution. We can then look up the probabilities in the Tables. So how do we do it?

Let’s take our normal from before, 2~ (50,3 )X N . We want to change this into the

standard normal ~ (0,1)Z N . First we want to change the mean from 50 to 0. The

easiest way to do this is to subtract 50:

0

0.05

0.1

0.15

-20 -10 0 10 20 30 40 50 60 70

x

f(x)

X X-50

We had 2~ (50,3 )X N with ( ) 50E X = , so 50X - has a mean of:

( 50) ( ) 50 50 50 0E X E X- = - = - =

Next we want to change the variance from 23 to 1. Since only multiplying or dividing affects the variance, we need to divide the random variable by 3:

2 221 1 1

3 3 3var( ) var( ) 3 1X X= = =

Now do we divide by the 3 before or after subtracting the 50? If we did it before we would have a mean of:

1 1 1 13 3 3 3( 50) ( ) 50 50 50 33E X E X- = - = - = - No!



Whereas if we did it after we would have a mean of:

1 1 1 13 3 3 3[ ( 50)] ( 50) [ ( ) 50] (50 50) 0E X E X E X- = - = - = - = Yes!

0

0.1

0.2

0.3

0.4

-20 -10 0 10 20 30 40 50 60 70

x

f(x)

X (X-50)/3

Although it’s squashed because of the scale, we can see that we have now transformed our normal distribution into the standard normal distribution. This process is called standardising the normal distribution.

How do we do this in general? Well for 2~ (50,3 )X N we took away the mean, 50,

first and then we divided it by 3 which was the standard deviation.

So for 2~ ( , )X N m s we would subtract the mean, m , first and then divide by the

standard deviation, s , second:

1

( )X ms

-

This gives us the standard normal distribution, ~ (0,1)Z N .

Standardising

If 2~ ( , )X N m s and X

Zm

s-= then ~ (0,1)Z N .



4.3 Calculating probabilities for any normal distribution

We can now use this idea of subtracting the mean and dividing by the standard deviation to change the probability for any normal distribution into a probability for a standard normal distribution.

Taking our normal distribution of 2~ (50,3 )X N , let’s find ( 56)P X > .

First we subtract the mean of 50 from both sides: ( 56) ( 50 6)P X P X> = - >

Next we divide both sides by the standard deviation of 3:

50

( 56) 23

XP X P

-Ê ˆ> = >Á ˜Ë ¯

Now we use the fact that when we subtract the mean and divide by the standard deviation we get a standard normal distribution: ( 56) ( 2)P X P Z> = >

We can now look this probability up in our standard normal table: ( 2) 1 ( 2) 1 0.97725 0.02275P Z P Z> = - < = - =

So we have: ( 56) 0.02275P X > =

This may seem a little longwinded at the moment (and more so when we need to use interpolation) but it does get quicker with a little practice. Usually we jump straight to:

56 50

( 56) ( 2)3

P X P Z P Z-Ê ˆ> = > = >Á ˜Ë ¯



Because this is the method for calculating probabilities, we’ll work through another example just to make sure it’s totally clear: If ~ (15,25)X N , calculate:

(i) ( 18)P X < (ii) ( 11.8)P X > (iii) (9.8 18.2)P X< <

Working through each of these in turn:

(i) 18 15

( 18) ( 0.6) 0.7257525

P X P Z P Z-Ê ˆ< = < = < =Á ˜Ë ¯

This value is read directly from the Tables.

(ii) 11.8 15

( 11.8) ( 0.64)25

P X P Z P Z-Ê ˆ> = > = > -Á ˜Ë ¯

Since we have a negative number we use ‘swap the sign, swap the sign’, we can

then read the resulting probability directly from the Tables: ( 11.8) ( 0.64) ( 0.64) 0.73891P X P Z P Z> = > - = < =

(iii) We first need to split up this ‘compound’ probability: (9.8 18.2) ( 18.2) ( 9.8)P X P X P X< < = < - <

Now we standardise and calculate each part:

18.2 15

( 18.2) ( 0.64) 0.7389125

P X P Z P Z-Ê ˆ< = < = < =Á ˜Ë ¯

This value is read directly from the Tables.

9.8 15

( 9.8) ( 1.04)25

P X P Z P Z-Ê ˆ< = < = < -Á ˜Ë ¯



Using the ‘swap the sign, swap the sign’ rule and the fact that less than and more than probabilities sum to 1 we get:

( 1.04) ( 1.04)

1 ( 1.04)

1 0.85083 0.14917

P Z P Z

P Z

< - = >

= - <

= - =

Hence: (9.8 18.2) 0.73891 0.14917 0.58974P X< < = - =

Question 11.11

If ~ (100,16)X N , calculate these probabilities using standardisation:

(i) ( 110)P X >

(ii) ( 87)P X >

(iii) (95 107)P X< < .

Common Error: Many students divide by the variance instead of the standard deviation.

This next question involves interpolation as well as standardising:

Question 11.12

The heights of male actuaries are normally distributed with mean 178 cm and variance 250 cm². Calculate the probability that a randomly chosen male actuary has height: (i) less than 186 cm (ii) more than 160 cm (iii) between 150 cm and 190 cm.




This last section covers two types of messy questions involving normal distribution probabilities. Conditional probabilities We can work out conditional probabilities using the conditional probability formula from Chapter 5:

( and )

( | )( )

P A BP A B

P B=

For example, to calculate ( 3 | 6)P X X< < where ~ (5,4)X N , we get:

( 3 and 6) ( 3)

( 3 | 6)( 6) ( 6)

P X X P XP X X

P X P X

< < << < = =< <

We now calculate these probabilities as before and then substitute them into this.

3 5( 3) standardising

4

( 1)

( 1) by symmetry

1 ( 1) using area 1

1 0.84134 from tables

0.15866

P X P Z

P Z

P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= < -

= >

= - < =

= -

=


4

( 0.5)

0.69146 from tables

P X P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= <

=

Hence:

0.15866

( 3 | 6) 0.22950.69146

P X X< < = =



Modulus

The modulus function, x , gives the positive value of x:

0

0

x xx

x x

≥Ï= Ì- <Ó

So if we had:

3x <

this means that either: 3x < if x was positive or: 3 3x x- < fi > - if x was negative

So 3x < corresponds to the following range:

3 3x- < < If we are asked to calculate a probability involving a modulus, we rewrite it as a range:

( 5) ( 5 5)P X P X< = - < <

( 3 10) ( 10 3 10)

( 7 13)

P X P X

P X

- < = - < - <

= - < <

We can then calculate these in the normal (no pun intended) way.

Question 11.13

Calculate ( 5 2)P X - < where ~ (6,1)X N .



(z

) 0.

9998

4 0.

9998

5 0.

9998

5 0.

9998

6 0.

9998

6 0.

9998

7 0.

9998

7 0.

9998

8 0.

9998

8 0.

9998

9 0.

9998

9 0.

9999

0 0.

9999

0 0.

9999

0 0.

9999

1 0.

9999

1 0.

9999

2 0.

9999

2 0.

9999

2 0.

9999

2 0.

9999

3 0.

9999

3 0.

9999

3 0.

9999

4 0.

9999

4 0.

9999

4 0.

9999

4 0.

9999

5 0.

9999

5 0.

9999

5 0.

9999

5 0.

9999

5 0.

9999

6 0.

9999

6 0.

9999

6 0.

9999

6 0.

9999

6 0.

9999

6 0.

9999

7 0.

9999

7 0.

9999

7

z 3.60

3.

61

3.62

3.

63

3.64

3.

65

3.66

3.

67

3.68

3.

69

3.70

3.

71

3.72

3.

73

3.74

3.

75

3.76

3.

77

3.78

3.

79

3.80

3.

81

3.82

3.

83

3.84

3.

85

3.86

3.

87

3.88

3.

89

3.90

3.

91

3.92

3.

93

3.94

3.

95

3.96

3.

97

3.98

3.

99

4.00

(z

) 0.

9993

1 0.

9993

4 0.

9993

6 0.

9993

8 0.

9994

0 0.

9994

2 0.

9994

4 0.

9994

6 0.

9994

8 0.

9995

0 0.

9995

2 0.

9995

3 0.

9995

5 0.

9995

7 0.

9995

8 0.

9996

0 0.

9996

1 0.

9996

2 0.

9996

4 0.

9996

5 0.

9996

6 0.

9996

8 0.

9996

9 0.

9997

0 0.

9997

1 0.

9997

2 0.

9997

3 0.

9997

4 0.

9997

5 0.

9997

6 0.

9997

7 0.

9997

8 0.

9997

8 0.

9997

9 0.

9998

0 0.

9998

1 0.

9998

1 0.

9998

2 0.

9998

3 0.

9998

3 0.

9998

4

z 3.20

3.

21

3.22

3.

23

3.24

3.

25

3.26

3.

27

3.28

3.

29

3.30

3.

31

3.32

3.

33

3.34

3.

35

3.36

3.

37

3.38

3.

39

3.40

3.

41

3.42

3.

43

3.44

3.

45

3.46

3.

47

3.48

3.

49

3.50

3.

51

3.52

3.

53

3.54

3.

55

3.56

3.

57

3.58

3.

59

3.60

(z

) 0.

9974

4 0.

9975

2 0.

9976

0 0.

9976

7 0.

9977

4 0.

9978

1 0.

9978

8 0.

9979

5 0.

9980

1 0.

9980

7 0.

9981

3 0.

9981

9 0.

9982

5 0.

9983

1 0.

9983

6 0.

9984

1 0.

9984

6 0.

9985

1 0.

9985

6 0.

9986

1 0.

9986

5 0.

9986

9 0.

9987

4 0.

9987

8 0.

9988

2 0.

9988

6 0.

9988

9 0.

9989

3 0.

9989

6 0.

9990

0 0.

9990

3 0.

9990

6 0.

9991

0 0.

9991

3 0.

9991

6 0.

9991

8 0.

9992

1 0.

9992

4 0.

9992

6 0.

9992

9 0.

9993

1

z 2.80

2.

81

2.82

2.

83

2.84

2.

85

2.86

2.

87

2.88

2.

89

2.90

2.

91

2.92

2.

93

2.94

2.

95

2.96

2.

97

2.98

2.

99

3.00

3.

01

3.02

3.

03

3.04

3.

05

3.06

3.

07

3.08

3.

09

3.10

3.

11

3.12

3.

13

3.14

3.

15

3.16

3.

17

3.18

3.

19

3.20

(z

) 0.

9918

0 0.

9920

2 0.

9922

4 0.

9924

5 0.

9926

6 0.

9928

6 0.

9930

5 0.

9932

4 0.

9934

3 0.

9936

1 0.

9937

9 0.

9939

6 0.

9941

3 0.

9943

0 0.

9944

6 0.

9946

1 0.

9947

7 0.

9949

2 0.

9950

6 0.

9952

0 0.

9953

4 0.

9954

7 0.

9956

0 0.

9957

3 0.

9958

5 0.

9959

8 0.

9960

9 0.

9962

1 0.

9963

2 0.

9964

3 0.

9965

3 0.

9966

4 0.

9967

4 0.

9968

3 0.

9969

3 0.

9970

2 0.

9971

1 0.

9972

0 0.

9972

8 0.

9973

6 0.

9974

4

z 2.40

2.

41

2.42

2.

43

2.44

2.

45

2.46

2.

47

2.48

2.

49

2.50

2.

51

2.52

2.

53

2.54

2.

55

2.56

2.

57

2.58

2.

59

2.60

2.

61

2.62

2.

63

2.64

2.

65

2.66

2.

67

2.68

2.

69

2.70

2.

71

2.72

2.

73

2.74

2.

75

2.76

2.

77

2.78

2.

79

2.80

(z

) 0.

9772

5 0.

9777

8 0.

9783

1 0.

9788

2 0.

9793

2 0.

9798

2 0.

9803

0 0.

9807

7 0.

9812

4 0.

9816

9 0.

9821

4 0.

9825

7 0.

9830

0 0.

9834

1 0.

9838

2 0.

9842

2 0.

9846

1 0.

9850

0 0.

9853

7 0.

9857

4 0.

9861

0 0.

9864

5 0.

9867

9 0.

9871

3 0.

9874

5 0.

9877

8 0.

9880

9 0.

9884

0 0.

9887

0 0.

9889

9 0.

9892

8 0.

9895

6 0.

9898

3 0.

9901

0 0.

9903

6 0.

9906

1 0.

9908

6 0.

9911

1 0.

9913

4 0.

9915

8 0.

9918

0

z 2.00

2.

01

2.02

2.

03

2.04

2.

05

2.06

2.

07

2.08

2.

09

2.10

2.

11

2.12

2.

13

2.14

2.

15

2.16

2.

17

2.18

2.

19

2.20

2.

21

2.22

2.

23

2.24

2.

25

2.26

2.

27

2.28

2.

29

2.30

2.

31

2.32

2.

33

2.34

2.

35

2.36

2.

37

2.38

2.

39

2.40

(z

) 0.

9452

0 0.

9463

0 0.

9473

8 0.

9484

5 0.

9495

0 0.

9505

3 0.

9515

4 0.

9525

4 0.

9535

2 0.

9544

9 0.

9554

3 0.

9563

7 0.

9572

8 0.

9581

8 0.

9590

7 0.

9599

4 0.

9608

0 0.

9616

4 0.

9624

6 0.

9632

7 0.

9640

7 0.

9648

5 0.

9656

2 0.

9663

8 0.

9671

2 0.

9678

4 0.

9685

6 0.

9692

6 0.

9699

5 0.

9706

2 0.

9712

8 0.

9719

3 0.

9725

7 0.

9732

0 0.

9738

1 0.

9744

1 0.

9750

0 0.

9755

8 0.

9761

5 0.

9767

0 0.

9772

5

z 1.60

1.

61

1.62

1.

63

1.64

1.

65

1.66

1.

67

1.68

1.

69

1.70

1.

71

1.72

1.

73

1.74

1.

75

1.76

1.

77

1.78

1.

79

1.80

1.

81

1.82

1.

83

1.84

1.

85

1.86

1.

87

1.88

1.

89

1.90

1.

91

1.92

1.

93

1.94

1.

95

1.96

1.

97

1.98

1.

99

2.00

(z

) 0.

8849

3 0.

8868

6 0.

8887

7 0.

8906

5 0.

8925

1 0.

8943

5 0.

8961

7 0.

8979

6 0.

8997

3 0.

9014

7 0.

9032

0 0.

9049

0 0.

9065

8 0.

9082

4 0.

9098

8 0.

9114

9 0.

9130

8 0.

9146

6 0.

9162

1 0.

9177

4 0.

9192

4 0.

9207

3 0.

9222

0 0.

9236

4 0.

9250

7 0.

9264

7 0.

9278

5 0.

9292

2 0.

9305

6 0.

9318

9 0.

9331

9 0.

9344

8 0.

9357

4 0.

9369

9 0.

9382

2 0.

9394

3 0.

9406

2 0.

9417

9 0.

9429

5 0.

9440

8 0.

9452

0

z 1.20

1.

21

1.22

1.

23

1.24

1.

25

1.26

1.

27

1.28

1.

29

1.30

1.

31

1.32

1.

33

1.34

1.

35

1.36

1.

37

1.38

1.

39

1.40

1.

41

1.42

1.

43

1.44

1.

45

1.46

1.

47

1.48

1.

49

1.50

1.

51

1.52

1.

53

1.54

1.

55

1.56

1.

57

1.58

1.

59

1.60

(z

) 0.

7881

4 0.

7910

3 0.

7938

9 0.

7967

3 0.

7995

5 0.

8023

4 0.

8051

1 0.

8078

5 0.

8105

7 0.

8132

7 0.

8159

4 0.

8185

9 0.

8212

1 0.

8238

1 0.

8263

9 0.

8289

4 0.

8314

7 0.

8339

8 0.

8364

6 0.

8389

1 0.

8413

4 0.

8437

5 0.

8461

4 0.

8484

9 0.

8508

3 0.

8531

4 0.

8554

3 0.

8576

9 0.

8599

3 0.

8621

4 0.

8643

3 0.

8665

0 0.

8686

4 0.

8707

6 0.

8728

6 0.

8749

3 0.

8769

8 0.

8790

0 0.

8810

0 0.

8829

8 0.

8849

3

z 0.80

0.

81

0.82

0.

83

0.84

0.

85

0.86

0.

87

0.88

0.

89

0.90

0.

91

0.92

0.

93

0.94

0.

95

0.96

0.

97

0.98

0.

99

1.00

1.

01

1.02

1.

03

1.04

1.

05

1.06

1.

07

1.08

1.

09

1.10

1.

11

1.12

1.

13

1.14

1.

15

1.16

1.

17

1.18

1.

19

1.20

(z

) 0.

6554

2 0.

6591

0 0.

6627

6 0.

6664

0 0.

6700

3 0.

6736

4 0.

6772

4 0.

6808

2 0.

6843

9 0.

6879

3 0.

6914

6 0.

6949

7 0.

6984

7 0.

7019

4 0.

7054

0 0.

7088

4 0.

7122

6 0.

7156

6 0.

7190

4 0.

7224

0 0.

7257

5 0.

7290

7 0.

7323

7 0.

7356

5 0.

7389

1 0.

7421

5 0.

7453

7 0.

7485

7 0.

7517

5 0.

7549

0 0.

7580

4 0.

7611

5 0.

7642

4 0.

7673

0 0.

7703

5 0.

7733

7 0.

7763

7 0.

7793

5 0.

7823

0 0.

7852

4 0.

7881

4

z 0.40

0.

41

0.42

0.

43

0.44

0.

45

0.46

0.

47

0.48

0.

49

0.50

0.

51

0.52

0.

53

0.54

0.

55

0.56

0.

57

0.58

0.

59

0.60

0.

61

0.62

0.

63

0.64

0.

65

0.66

0.

67

0.68

0.

69

0.70

0.

71

0.72

0.

73

0.74

0.

75

0.76

0.

77

0.78

0.

79

0.80

(z

) 0.

5000

0 0.

5039

9 0.

5079

8 0.

5119

7 0.

5159

5 0.

5199

4 0.

5239

2 0.

5279

0 0.

5318

8 0.

5358

6 0.

5398

3 0.

5438

0 0.

5477

6 0.

5517

2 0.

5556

7 0.

5596

2 0.

5635

6 0.

5674

9 0.

5714

2 0.

5753

5 0.

5792

6 0.

5831

7 0.

5870

6 0.

5909

5 0.

5948

3 0.

5987

1 0.

6025

7 0.

6064

2 0.

6102

6 0.

6140

9 0.

6179

1 0.

6217

2 0.

6255

2 0.

6293

0 0.

6330

7 0.

6368

3 0.

6405

8 0.

6443

1 0.

6480

3 0.

6517

3 0.

6554

2

z 0.00

0.

01

0.02

0.

03

0.04

0.

05

0.06

0.

07

0.08

0.

09

0.10

0.

11

0.12

0.

13

0.14

0.

15

0.16

0.

17

0.18

0.

19

0.20

0.

21

0.22

0.

23

0.24

0.

25

0.26

0.

27

0.28

0.

29

0.30

0.

31

0.32

0.

33

0.34

0.

35

0.36

0.

37

0.38

0.

39

0.40

Ap

pen

dix

A:

Pro

bab

iliti

es f

or

the

stan

dar

d n

orm

al d

istr

ibu

tio

n0

z

I(z

)



This page has been left blank so that you can pull out the normal tables for reference.



Extra practice questions Section 1: Features of the normal distribution

P11.1 Sketch a graph of (100,100)N .

Section 3: Probabilities of the standard normal distribution

P11.2 Find: (i) ( 1.6)P Z >

(ii) ( 2.84)P Z <

(iii) ( 0.42)P Z < -

(iv) ( 2.61)P Z > -

(v) ( 1.69 0.52)P Z- < <

(vi) ( 3.05 1.7)P Z- < < - .

P11.3 Calculate the following using interpolation: (i) ( 0.371)P Z <

(ii) ( 2.598)P Z >

(iii) ( 0.904)P Z > -

(iv) ( 2.319)P Z < -

(v) (1.572 3.087)P Z< <

(vi) ( 1.382 0.493)P Z- < < .



Section 4: Probabilities for any normal distribution

P11.4 Calculate the following, using interpolation:

(i) ( 117)P X > where 2~ (128,8 )X N

(ii) ( 30)P Y < where 2~ (23,5 )Y N

(iii) ( 297)P W > where ~ (285,34)W N

(iv) ( 90)P V < where ~ (96,12)V N

(v) (3 8)P T< < where ~ (5,2)T N .

P11.5 Given that ~ (17,5)X N , evaluate:

(i) (| 17 | 2)P X - <

(ii) ( 15 | 16)P X X< < .

P11.6 Subject C1, April 1995, Q4 (adapted) The sizes of claims, which arise under policies of a certain type, are normally distributed with mean £4,000m = and standard deviation £600s = . The size of a

particular claim is known to be greater than £3,400. What is the probability that this claim size is greater than £4,000, the average of all claims sizes? [3] The many other exam questions that contain the normal distribution use it in a different topic area.



Chapter 11 Summary The normal distribution

If X has a normal distribution with mean m and variance 2s then we write 2~ ( , )X N m s . This distribution occurs naturally and has a symmetrical, bell-shaped

PDF:

– 3 + 3

x

f(x)

The PDF for 2~ ( , )X N m s is:

2

2( )

22

1( )

2

x

f x e x

ms

ps

--= -• < < •

The moments are:

2

( )

var( )

E X

X

m

s

=

=

The median is the same as the mean (by symmetry) as is the mode. Probabilities can only be found by standardising the normal distribution (ie transforming it into a standard normal distribution) using:

X

Zm

s-=

We then use the standard normal distribution tables.



The standard normal distribution This is a normal distribution with mean 0 and variance 1. We write: ~ (0,1)Z N

It has PDF:

2½1

( ) ( )2

zz f z efp

-= =

The cumulative density function ( ) ( )z P Z zF = < is tabulated on pages 160-161 of the

Tables. We can use this to calculate probabilities as follows: ( ) read off the tablesP Z a<

( ) 1 ( ) using area (and total probability) equals 1P Z a P Z a> = - <

( ) ( ) using symmetry

1 ( ) using area (and total probability) equals 1

P Z a P Z a

P Z a

< - = >

= - <

( ) ( ) using symmetryP Z a P Z a> - = <

‘Compound’ probabilities can be found using: ( ) ( ) ( )P a X b P X b P X a< < = < - <

Probabilities involving a modulus can be rewritten as a range:

( ) ( )P X a P a X a< = - < <

Linear interpolation It is expected that standard normal values are given to at least 3 decimal places and a more accurate answer is obtained using linear interpolation between the tabulated values: (start probability) 0 (difference between probabilities)p = + ¥.

proportion:

just use 3rd decimal place (onwards)




(i) The graph for 2~ (50,5 )X N is:

0

0.05

0.1

0 10 20 30 40 50 60 70 80 90 100

x

f(x)

N(50,5²)

The majority of the values lie between (50 3 5,50 3 5) (35,65)- ¥ + ¥ = .

The height can be found by 0

2

1(50) 0.080

2 5f e

p= = .

Now 2~ (50,10 )Y N has the same mean but a larger variance, so the PDF will

have the same position/centre but will be more spread out.

0

0.05

0.1

0 10 20 30 40 50 60 70 80 90 100

x

f(x)

N(50,5²) N(50,10²)

The majority of the values lie between (50 3 10,50 3 10) (20,80)- ¥ + ¥ = . Since

it’s twice as spread out, it should be half as high.



(ii) Compared to 2(50,5 )N , the mean of 2(60,5 )N has increased by 10 but the

variance is the same, so the PDF will have the same shape/spread but will be shifted 10 to the right.

0

0.05

0.1

0 10 20 30 40 50 60 70 80 90 100

x

f(x)

N(50,5²) N(60,5²)

Solution 11.2

Since:

2 2var( ) ( ) ( )X E X E X= -

This gives:

2 2

2 2

( ) var( ) ( )E X X E X

s m

= +

= +



Solution 11.3

(i) Reading from the standard normal Tables on pages 160-161 of the Tables (or from Appendix A):

( 1.23) 0.89065P Z < =

(ii) 2.725 is halfway between 2.72 and 2.73. Now reading from the standard normal

Tables:

( 2.72) 0.99674

( 2.73) 0.99683

P Z

P Z

< =

< =

Since 2.725 is halfway between these it will be:

0.99674 0.99683

( 2.725) 0.9967852

P Z+< = =

Don’t worry if you didn’t get this result. We will be covering how to calculate

probabilities like these in Section 3.3. Solution 11.4

(i) Using the fact that the area under the graph is 1, we have: ( 2.17) 1 ( 2.17)P Z P Z> = - <

Now reading ( 2.17)P Z < from the standard normal Tables:

( 2.17) 1 0.98500 0.01500P Z > = - =

(ii) Similarly:

( 0.08) 1 ( 0.08) using area 1


0.46812

P Z P Z> = - < =

= -

=



Solution 11.5

(i) Using our ‘swap the sign, swap the sign’ symmetry result: ( 1.50) ( 1.50)P Z P Z< - = >

Now using the fact that the area under the graph is 1: ( 1.50) ( 1.50) 1 ( 1.50)P Z P Z P Z< - = > = - <

Reading the value of ( 1.50)P Z < from the Tables:

( 1.50) 1 0.93319 0.06681P Z < - = - =

(ii) Similarly,

( 0.21) ( 0.21) by symmetry

1 ( 0.21) using area 1


0.41683

P Z P Z

P Z

< - = >

= - < =

= -

=

(iii) We get:

( 2.05) ( 2.05) by symmetry



0.02018

P Z P Z

P Z

< - = >

= - < =

= -

=



Solution 11.6

(i) Using our ‘swap the sign, swap the sign’ symmetry result: ( 3.94) ( 3.94)P Z P Z> - = <

We can read this value directly from the Tables: ( 3.94) ( 3.94) 0.99996P Z P Z> - = < =

(ii) Similarly,

( 0.73) ( 0.73) by symmetry

0.76730 from tables

P Z P Z> - = <

=

Solution 11.7

(i) ( 3.6) 1 ( 3.6) using area 1


0.00016

P Z P Z> = - < =

= -

=

(ii) ( 0.76) ( 0.76) by symmetry

0.77637 from tables

P Z P Z> - = <

=

(iii) ( 1.98) 0.97615 from tablesP Z < =

(iv) ( 2.41) ( 2.41) by symmetry



0.00798

P Z P Z

P Z

< - = >

= - < =

= -

=



Solution 11.8

(i) Splitting up the ‘compound’ probability: (1.24 2.19) ( 2.19) ( 1.24)P Z P Z P Z< < = < - <

Both of these probabilities can be read directly from the Tables: (1.24 2.19) 0.98574 0.89251 0.09323P Z< < = - =

(ii) Similarly: ( 0.92 0.83) ( 0.83) ( 0.92)P Z P Z P Z- < < = < - < -

Now: ( 0.83) 0.79673 from tablesP Z < =

( 0.92) ( 0.92) by symmetry



0.17879

P Z P Z

P Z

< - = >

= - < =

= -

=

So this gives: ( 0.92 0.83) 0.79673 0.17879 0.61794P Z- < < = - =



(iii) Splitting up the ‘compound’ probability: ( 2.92 1.67) ( 1.67) ( 2.92)P Z P Z P Z- < < - = < - - < -

Now: ( 1.67) ( 1.67) by symmetry



0.04746

P Z P Z

P Z

< - = >

= - < =

= -

=

( 2.92) ( 2.92) by symmetry



0.00175

P Z P Z

P Z

< - = >

= - < =

= -

=

This gives: ( 2.92 1.67) 0.04746 0.00175 0.04571P Z- < < - = - =



Solution 11.9

(i) To calculate ( 1.048)P Z < we first look up the tabulated probabilities either

side:

( 1.04) 0.85083

( 1.05) 0.85314

P Z

P Z

< =

< =

Using linear interpolation:

( 1.04 ) 0.85083 0. (0.85314 0.85083)

0.85268

P Z < = + ¥ -

=

8 8

Note that this figure is given to only 5 DP as it would be daft to write the full

6 DP answer when the Tables are only accurate to 5 DP! (ii) Now: ( 0.271) 1 ( 0.271) using area 1P Z P Z> = - < =

The tabulated probabilities either side of 0.271:

( 0.27) 0.60642

( 0.28) 0.61026

P Z

P Z

< =

< =


( 0.27 ) 0.60642 0. (0.61026 0.60642)

0.60680

P Z < = + ¥ -

=

1 1

So: ( 0.271) 1 0.60680 0.39320P Z > = - =



(iii) Now: ( 2.389) ( 2.389) by symmetryP Z P Z> - = <

The tabulated probabilities either side of 2.389 are:

( 2.38) 0.99134

( 2.39) 0.99158

P Z

P Z

< =

< =


( 2.38 ) 0.99134 0. (0.99158 0.99134)

0.99156

P Z < = + ¥ -

=

9 9

So: ( 2.389) 0.99156P Z > - =



(iv) This may take some time! First we split up the ‘compound’ probability: ( 0.704 0.897) ( 0.897) ( 0.704)P Z P Z P Z- < < = < - < -

For ( 0.897)P Z < the tabulated probabilities either side are:

( 0.89) 0.81327

( 0.90) 0.81594

P Z

P Z

< =

< =


( 0.89 ) 0.81327 0. (0.81594 0.81327)

0.81514

P Z < = + ¥ -

=

7 7

Now:

( 0.704) ( 0.704) by symmetry

1 ( 0.704) using area 1

P Z P Z

P Z

< - = >

= - < =

The tabulated probabilities either side of ( 0.704)P Z < are:

( 0.70) 0.75804

( 0.71) 0.76115

P Z

P Z

< =

< =


( 0.70 ) 0.75804 0. (0.76115 0.75804)

0.75928

P Z < = + ¥ -

=

4 4

This gives:

( 0.704) 1 0.75928 0.24072P Z < - = - =

Hence: ( 0.704 0.897) 0.81514 0.24072 0.57442P Z- < < = - =

Well done – you deserve a nice cuppa after that!



Solution 11.10

(i) 2 2 23 5 ~ (3 100 5,3 4 ) (305,12 )X N N+ ¥ + ¥ =

(ii) 2 2 2½ 20 ~ (½ 100 20,½ 4 ) (30, 2 )X N N- ¥ - ¥ =

(iii) ( )2 2¼( 100) ~ ¼ (100 100),¼ 4 (0,1)X N N- ¥ - ¥ =

We have transformed 2(100, 4 )N into the standard normal. This process is

called standardising. We shall be using it to calculate probabilities for any normal distribution.



Solution 11.11

(i) ( 110) ( 100 110 100) subtracting the mean

100 110 100dividing by the sd

16 16

( 2.5)


1 0.99379 0.00621 from tables

P X P X

XP

P Z

P Z

> = - > -

- -Ê ˆ= >Á ˜Ë ¯

= >

= - < =

= - =

(ii) Following the same steps:

( 87) ( 100 87 100) subtracting the mean

100 87 100dividing by the sd

16 16

( 3.25)

( 3.25) by symmetry

0.99942 from tables

P X P X

XP

P Z

P Z

> = - > -

- -Ê ˆ= >Á ˜Ë ¯

= > -

= <

=



(iii) (95 107) ( 107) ( 95)P X P X P X< < = < - <

Now:


16

( 1.75)

0.95994 from tables

P X P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= <

=


16

( 1.25)

( 1.25) by symmetry


1 0.89435 using tables

0.10565

P X P Z

P Z

P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= < -

= >

= - < =

= -

=

Therefore: (95 107) 0.95994 0.10565 0.85429P X< < = - =



Solution 11.12

Let X be the height of a male actuary, so we have ~ (178,250)X N

(i) 186 178

( 186) standardising250

( 0.506) 3 DP

P X P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= <


( 0.50) 0.69146

( 0.51) 0.69497

P Z

P Z

< =

< =

Using linear interpolation: ( 0.506) 0.69146 0.6 (0.69497 0.69146) 0.69357P Z < = + ¥ - =

Hence: ( 186) 0.69357P X < =

Note that we rounded the standardised value to 3 DP. We could have kept the

complete figure of 0.5059644…, in which case we would have obtained:

( 0.5059644 ) 0.69146 0.59644 (0.69497 0.69146)

0.69355

P X < = + ¥ -

=

This does make a difference. The Subject CT3 exam requires that you use at

least 3 DP – we will give both the rounded and full solutions from now on.



(ii) 160 178


( 1.138) 3 DP

( 1.138) by symmetry

P X P Z

P Z

P Z

-Ê ˆ> = >Á ˜Ë ¯

= > -

= <


( 1.13) 0.87076

( 1.14) 0.87286

P Z

P Z

< =

< =


( 1.138) 0.87076 0.8 (0.87286 0.87076) 0.87244P Z < = + ¥ - =

So:

( 160) 0.87244P X > =

Using the complete standardised figure would give an answer of 0.87253.



(iii) (150 190) ( 190) ( 150)

190 178 150 178standardising

250 250

( 0.759) ( 1.771)

P X P X P X

P Z P Z

P Z P Z

< < = < - <

- -Ê ˆ Ê ˆ= < - <Á ˜ Á ˜Ë ¯ Ë ¯

= < - < -

( 0.759)P Z < can be found directly from the Tables using interpolation. The

tabulated probabilities either side of 0.759 are:

( 0.75) 0.77337

( 0.76) 0.77637

P Z

P Z

< =

< =

This gives: ( 0.759) 0.77337 0.9 (0.77637 0.77337) 0.77607P Z < = + ¥ - =

Now:

( 1.771) ( 1.771) by symmetry

1 ( 1.771) using area 1

P Z P Z

P Z

< - = >

= - < =


( 1.77) 0.96164

( 1.78) 0.96246

P Z

P Z

< =

< =

This gives: ( 1.771) 0.96164 0.1 (0.96246 0.96164) 0.96172P Z < = + ¥ - =

So: ( 1.771) 1 0.96172 0.03828P Z < - = - =

Therefore: (150 190) 0.77607 0.03828 0.73779P X< < = - =

Using the complete standardised figures would give 0.73777.



Solution 11.13

Rewriting this probability as a range:

(| 5 | 2) ( 2 5 2)

(3 7)

( 7) ( 3)

P X P X

P X

P X P X

- < = - < - <

= < <

= < - <


1

( 1) from tables

0.84134

P X P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= <

=


1

( 3)

( 3) by symmetry

1 ( 3) using area 1


0.00135

P X P Z

P Z

P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= < -

= >

= - < =

= -

=

Hence: (| 5 | 2) 0.84134 0.00135 0.83999P X - < = - =




P11.1 The graph for ~ (100,100)X N is:

0

0.05

50 60 70 80 90 100 110 120 130 140 150

x

f(x)

N(100,100)

The majority of the values are between (100 3 10,100 3 10) (70,130)- ¥ + ¥ = .

The height can be found from 01(100) 0.040

2 100f e

p= = .

P11.2 (i) ( 1.6) 1 ( 1.6) using area 1


0.05480

P Z P Z> = - < =

= -

=

(ii) ( 2.84) 0.99774 from tablesP Z < =

(iii) ( 0.42) ( 0.42) by symmetry



0.33724

P Z P Z

P Z

< - = >

= - < =

= -

=

(iv) ( 2.61) ( 2.61) by symmetry

0.99547 from tables

P Z P Z> - = <

=



(v) ( 1.69 0.52) ( 0.52) ( 1.69)P Z P Z P Z- < < = < - < -

Now: ( 0.52) 0.69847 from tablesP Z < =

( 1.69) ( 1.69) by symmetry



0.04551

P Z P Z

P Z

< - = >

= - < =

= -

=

Hence: ( 1.69 0.52) 0.69847 0.04551 0.65296P Z- < < = - =

(vi) ( 3.05 1.7) ( 1.7) ( 3.05)P Z P Z P Z- < < - = < - - < -

Now:

( 1.7) ( 1.7) by symmetry



0.04457

P Z P Z

P Z

< - = >

= - < =

= -

=

( 3.05) ( 3.05) by symmetry



0.00114

P Z P Z

P Z

< - = >

= - < =

= -

=

Hence: ( 3.05 1.7) 0.04457 0.00114 0.04343P Z- < < - = - =



P11.3 (i) The tabulated probabilities either side of ( 0.371)P X < are:

( 0.37) 0.64431

( 0.38) 0.64803

P Z

P Z

< =

< =


(ii) ( 2.598) 1 ( 2.598) using area 1P Z P Z> = - < =


( 2.59) 0.99520

( 2.60) 0.99534

P Z

P Z

< =

< =


This gives: ( 2.598) 1 0.99531 0.00469P Z > = - =

(iii) ( 0.904) ( 0.904) by symmetryP Z P Z> - = <


( 0.90) 0.81594

( 0.91) 0.81859

P Z

P Z

< =

< =


This gives: ( 0.904) 0.8170P Z > - =



(iv) ( 2.319) ( 2.319) by symmetry

1 ( 2.319) using area 1

P Z P Z

P Z

< - = >

= - < =


( 2.31) 0.98956

( 2.32) 0.98983

P Z

P Z

< =

< =

Using liner interpolation: ( 2.319) 0.98956 0.9 (0.98983 0.98956) 0.98980P Z < = + ¥ - =

This gives: ( 2.319) 1 0.98980 0.01020P Z < - = - =

(v) (1.572 3.087) ( 3.087) ( 1.572)P Z P Z P Z< < = < - <


( 3.08) 0.99896

( 3.09) 0.99900

P Z

P Z

< =

< =



( 1.57) 0.94179

( 1.58) 0.94295

P Z

P Z

< =

< =


Hence: (1.572 3.087) 0.99899 0.94202 0.05697P Z< < = - =



(vi) ( 1.382 0.493) ( 0.493) ( 1.382)P Z P Z P Z- < < = < - < -


( 0.49) 0.68793

( 0.50) 0.69146

P Z

P Z

< =

< =


Now:

( 1.382) ( 1.382) by symmetry

1 ( 1.382) using area 1

P Z P Z

P Z

< - = >

= - < =


( 1.38) 0.91621

( 1.39) 0.91774

P Z

P Z

< =

< =


This gives: ( 1.382) 1 0.91652 0.08348P Z < - = - =

Hence: ( 1.382 0.493) 0.68899 0.08348 0.60551P Z- < < = - =



P11.4 (i) 117 128


( 1.375)


P X P Z

P Z

P Z

-Ê ˆ> = >Á ˜Ë ¯

= > -

= <


( 1.37) 0.91466

( 1.38) 0.91621

P Z

P Z

< =

< =

Using linear interpolation (or just finding the average):

( 1.375) 0.91466 0.5 (0.91621 0.91466) 0.91544P Z < = + ¥ - =

Hence: ( 117) 0.91544P X > =

(ii) 30 23


( 1.4)

0.91924

P Y P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= <

=

(iii) 297 285


( 2.058) 3 DP

1 ( 2.058) using area 1

P W P Z

P Z

P Z

-Ê ˆ> = >Á ˜Ë ¯

= >

= - < =


( 2.05) 0.97982

( 2.06) 0.98030

P Z

P Z

< =

< =




Hence: ( 297) 1 0.98020 0.01980P W > = - =

(iv) 90 96


( 1.732) 3 DP


1 ( 1.732) using area 1

P V P Z

P Z

P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= < -

= >

= - < =


( 1.73) 0.95818

( 1.74) 0.95907

P Z

P Z

< =

< =


Hence: ( 90) 1 0.95836 0.04164P V < = - =

Using the complete standardised figure would give: ( 90) 1 0.95833 0.04167P V < = - =



(v) (3 8) ( 8) ( 3)


2 2

( 2.121) ( 1.414) 3 DP

P T P T P T

P Z P Z

P Z P Z

< < = < - <

- -Ê ˆ Ê ˆ= < - <Á ˜ Á ˜Ë ¯ Ë ¯

= < - < -


( 2.12) 0.98300

( 2.13) 0.98341

P Z

P Z

< =

< =


Now:

( 1.414) ( 1.414) by symmetry

1 ( 1.414) using area 1

P Z P Z

P Z

< - = >

= - < =


( 1.41) 0.92073

( 1.42) 0.92220

P Z

P Z

< =

< =


This gives: ( 1.414) 1 0.92132 0.07868P Z < - = - =

Therefore: (3 8) 0.98304 0.07868 0.90436P T< < = - =

Using the complete standardised figures would have given: (3 8) 0.98305 0.07865 0.90440P T< < = - =



P11.5 (i) Rewriting this probability as a range:

(| 17 | 2) ( 2 17 2)

(15 19)

( 19) ( 15)


5 5

( 0.894) ( 0.894) 3 DP

P X P X

P X

P X P X

P Z P Z

P Z P Z

- < = - < - <

= < <

= < - <

- -Ê ˆ Ê ˆ= < - <Á ˜ Á ˜Ë ¯ Ë ¯

= < - < -


( 0.89) 0.81327

( 0.90) 0.81594

P Z

P Z

< =

< =


( 0.894) 0.81327 0.4 (0.81594 0.81327) 0.81434P Z < = + ¥ - =

Now:

( 0.894) ( 0.894) by symmetry

1 ( 0.894) using area 1

P Z P Z

P Z

< - = >

= - < =

Since we have just calculated ( 0.894)P Z < we get:

( 0.894) 1 0.81434 0.18566P Z < - = - =

Therefore:

( 17 2) 0.81434 0.18566 0.62868P X - < = - =

Using the complete figures for interpolation would give:

( 17 2) 0.81445 0.18555 0.62890P X - < = - =



(ii) Using the conditional probability formula from Chapter 5:

( 15 and 16) ( 15)( 15 | 16)

( 16) ( 16)

P X X P XP X X

P X P X

< < << < = =< <


5

( 0.894) 3 DP


1 ( 0.894) using area 1

0.18566 from part (i)

P X P Z

P Z

P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= < -

= >

= - < =

=


5

( 0.447) 3 DP


1 ( 0.447) using area 1

P X P Z

P Z

P Z

P Z

-Ê ˆ< = <Á ˜Ë ¯

= < -

= >

= - < =


( 0.44) 0.67003

( 0.45) 0.67364

P Z

P Z

< =

< =


( 0.447) 0.67003 0.7 (0.67364 0.67003) 0.67256P Z < = + ¥ - =

This gives:

( 16) 1 0.67256 0.32744P X < = - =

Hence:

0.18566

( 15 | 16) 0.56700.32744

P X X< < = =

Using the complete standardised figures would give 0.5668.



P11.6 Let X be the sizes of claims, then 2~ (4000,600 )X N .

We want:

( 4,000 and 3,400) ( 4,000)

( 4,000 | 3,400)( 3,400) ( 3,400)

P X X P XP X X

P X P X

> > >> > = => >

Now:

4,000 4,000( 4,000) standardising

600

( 0)

1 ( 0) using area 1

1 0.5 from tables

0.5

P X P Z

P Z

P Z

-Ê ˆ> = >Á ˜Ë ¯

= >

= - < =

= -

=

3, 400 4,000( 3,400) standardising

600

( 1)

( 1) by symmetry

0.84134 from tables

P X P Z

P Z

P Z

-Ê ˆ> = >Á ˜Ë ¯

= > -

= <

=

Therefore:

0.5

( 4,000 | 3,400) 0.594290.84134

P X X> > = =

Stats Pack-12: Correlation and regression Page 1


Chapter 12

Correlation and regression

Links to CT3: Chapter 13 Sections 1, 2.1 and 3.2 Syllabus objectives: (x)1. Draw scatterplots for bivariate data and comment on them. (x)2. Define and calculate the correlation coefficient for bivariate data, explain its

interpretation. (x)3. Explain what is meant by response and explanatory variables. (x)4. Calculate the least squares estimates of the slope and intercept parameters in a

simple linear regression model.

0 Introduction

In this chapter we will be plotting points to see if there is any linear relationship between two variables (eg pints of beer drunk per week and life expectancy). We will then seek to measure the strength of this relationship and draw a line to represent it and make predictions. This chapter covers material from Chapter 13 of the Subject CT3 course.

Stats Pack-12: Correlation and regression


1 Scatterplots

The table below shows the heights (in cm) and weights (in kg) of 12 individuals (A – L): A B C D E F G H I J K L Height (cm) 150 152 155 156 158 160 163 165 170 175 178 180 Weight (kg) 56 62 63 57 64 62 65 66 65 69 66 67

These data are called bivariate data as we have two variables (height and weight) for each individual. A scatterplot (or scatter diagram) is simply a plot of our bivariate data, with one variable (eg height) plotted on the x-axis and the other variable (eg weight) plotted on the y-axis.

50

55

60

65

70

75

145 150 155 160 165 170 175 180 185

Height (cm)

Wei

ght

(kg)

We use a scatterplot to see if there is any kind of relationship or connection between the two variables. In this case we can see that there is a connection between height and weight. We have an increasing pattern in the points – generally the taller a person is, the more they weigh. Since this is not an exact relationship the points plotted are scattered, hence the name scatterplot. In this chapter and in Subject CT3, we will only be looking to see if there is a linear relationship between the two variables.



Question 12.1

An experiment was carried out where a person had to draw a shape whilst looking in a mirror. The time taken to draw the shape (in seconds) and the number of mistakes were recorded. A scatterplot for 10 individual’s results is shown:

0

5

10

15

20

25

30

35

0 10 20 30 40 50 60

Time (secs)

Mis

tak

es

Describe the relationship (if any) between the time taken and the number of mistakes made.

The variable on the x-axis is called the explanatory variable whereas the variable on the y-axis is called the response variable. These names arise from the use of scatterplots in experiments. For example, suppose we measure the length of a spring when we hang weights on it. The weights we hang on the spring are chosen by the experimenter – so they can be ‘explained’, hence they are the explanatory variable. Once the weights are hung on we can then see the effect or response in the length of the spring, hence the length of the spring is the response variable.



2 Correlation

The mathematical term for the relationship or connection between two variables (eg height and weight) is correlation.

2.1 Types of linear correlation

There are three types of linear correlation between two variables.

negativecorrelation

positivecorrelation

nocorrelation

Positive correlation - as one variable increases so does the other variable. This gives a positive sloping (ie upward) graph. For example, we saw that there was positive correlation between people’s height and their weight. As height increases so does their weight. Negative correlation - as one variable increases, the other variable decreases. This gives a negative sloping (ie downward) graph. For example, we saw in Question 12.1 that there was negative correlation between the time people took to complete the drawing and the number of mistakes made. As the time increases so the number of mistakes decrease. No correlation - there is no connection between the two variables. For example, there would be no correlation (I hope!) between the height of a person and the amount paid for their insurance.



Question 12.2

For each pair of variables, state the type of correlation that would be shown: (i) life expectancy and the number of cigarettes smoked per day (ii) distance lived from work and the time taken to get there (iii) number of bedrooms and cost of home insurance (iv) amount of “no claims” discount and cost of car insurance (v) number of exams passed in a sitting and length of hair.

2.2 Strength of linear correlation

In addition to observing the type of correlation between two variables we can also look at how strong the connection or relationship is between them. The strength of the connection or correlation can be seen in how clear the pattern is:

In this scatterplot we can see that there is positive correlation, but the pattern is not very clear (ie the points are quite scattered). We say that there is weak positive correlation.

In this scatterplot there is again positive correlation, but this time the pattern is much clearer (ie the points are less scattered and more linear). We say that there is strong positive correlation.

We now have the positive correlation with the pattern being a perfect straight line. There is an exact linear relationship between the two variables. We say that there is perfect positive correlation.



Question 12.3

Sketch scatterplots showing: (i) weak negative correlation (ii) strong negative correlation (iii) perfect negative correlation.

Question 12.4

Match each of these pairs of variables to one of the scatterplots shown below: (i) pounds exchanged and dollars bought on a single day (ii) height and shoe size of an individual (iii) size of car engine and cost of car insurance.

Scatterplot A Scatterplot B Scatterplot C



2.3 Covariance

We can now draw a scatterplot, state the type of linear correlation shown and describe how strong that correlation is. However, it would be nice to have a single numerical value to summarise the type and strength of the correlation. To do this we first need to define the sample covariance. Recall from Chapter 3 that we defined the sample variance of 1, , nx x to be:

21( )

1 ix xn

-- Â

Similarly, we could calculate the sample variance of the y values:

21( )

1 iy yn

-- Â

But what we want is some way to measure how y varies with x – this is called the sample covariance and is defined by:

1

( )( )1 i ix x y y

n- -

- Â

Taking the height and weight data from Section 1: A B C D E F G H I J K L Height (cm) 150 152 155 156 158 160 163 165 170 175 178 180Weight (kg) 56 62 63 57 64 62 65 66 65 69 66 67

The mean of the heights is:

150 180 1,962

163.512 12

x+ += = =

cm

The mean of the weights is:

56 67 762

63.512 12

y+ += = =

kg



Now using the formula, we get a covariance of:

[ ]

[ ]

111

111

111

311

(150 163.5)(56 63.5) (180 163.5)(67 63.5)

101.25 57.75

344

31 cmkg

= - - + + - -

= + +

= ¥

=

This is a positive result as there is positive correlation between these variables. Now the formula we are using is fine for a small list of numbers, but is extremely tedious for this list of 12 numbers (and would be a real pain if the means were awkward numbers). So we are going to rearrange the formula into a nicer format like we did for the sample variance in Chapter 3. Recall that the sample variance of 1, , nx x can be rewritten as:

( )2 21

1 ix nxn

-- Â

Similarly, the sample variance of the y values can be rewritten as:

( )2 21

1 iy nyn

-- Â

Using a similar method, we can rewrite the covariance as:

( )1

1 i ix y nx yn

-- Â

The proof of this result can be found in Appendix A.

Definition The sample covariance of 1 1( , ), , ( , )n nx y x y is given by:

( )1

1 i ix y nx yn

-- Â



Question 12.5

A garage is selling a particular make of used car. The table below shows the asking price and the mileage for five of these cars:

1 2 3 4 5 Mileage (000’s) 16 25 38 61 79 Price (£000’s) 7.8 5.2 4.3 2.5 1.7

(i) Show that the sample covariance for these data is 59.175- . (ii) What does this value tell us about the type of correlation shown? Note how the covariance gives a positive answer for our positive correlation between height and weight and a negative answer for our negative correlation between price and mileage.

Question 12.6

What value will the covariance take if there is no correlation between two variables?

Appendix B explains why the covariance works to give us a positive answer for positive correlation and a negative answer for negative correlation.



2.4 The correlation coefficient

The scatterplot below shows the number of dollars purchased in exchange for sterling.

£3 $4.50x y= =

2.5 £1.58

5.625 $2.37

x

y

s

s

=

=

covariance = £$3.75

The next scatterplot shows the pay received for the number of hours worked.

3 hrs £15x y= =

2.5 1.58 hrs

62.5 £7.91

x

y

s

s

=

=

covariance = 12.5 £hrs

Both of these scatterplots show perfect positive correlation. However, they have very different covariances! They also have different units for their covariances. The difference in the size of the answers is to do with the spread of the results. This makes the covariance pretty unhelpful when comparing the strength of correlation for two separate cases, as does having different units. What we need to do is to standardise the covariance so that any scatterplots with the same degree of correlation have the same value (regardless of the spread of the data). We also need to get rid of the units so we just have a number (ie a coefficient). Recall that in Chapter 11 we standardised a normal distribution by subtracting the mean (which we have already done when calculating the covariance) and dividing by the standard deviation. Well here we will divide by both standard deviations. This gives us the correlation coefficient, r.

0

2

4

6

8

10

0 1 2 3 4 5

£'s sold

$'s

bou

ght

0

5

10

15

20

25

0 1 2 3 4 5

hrs worked

pay

rec

eive

d (

£)



So the correlation coefficient, r, for the dollars purchased and sterling sold is:

covariance of and 3.75

1( ) ( ) 2.5 5.625

X Yr

sd X sd Y= = =

¥

Question 12.7

Calculate the correlation coefficient for the second scatterplot, pay received for the number of hours worked.

The value of the correlation coefficient ranges from 1- to 1 as follows:

perfect negativecorrelation

nocorrelation

perfect positivecorrelation

r = – 1 r = 0 r = + 1

Question 12.8

Give an approximate value for the correlation coefficient for each of these:

Scatterplot A Scatterplot B Scatterplot C



The formula The correlation coefficient was defined to be:

covariance of and

( ) ( )

X Yr

sd X sd Y=

¥

Using the definitions of the sample standard deviation and covariance, we get:

( )( ) ( )2 2 2 2

11

1 11 1

i i

i i

x y nx ynr

x nx y nyn n

--=- -

- -

Â

Â Â

However the1

1n -’s all cancel so we get:

( )( )2 2 2 2

i i

i i

x y nx yr

x nx y ny

-=

- -

ÂÂ Â

Since we don’t need the 1

1n -’s, it seems a bit pointless calculating them in the first

place. We just need to calculate the sum of squares:

2 2

2 2

xx i

yy i

xy i i

s x nx

s y ny

s x y nx y

= -

= -

= -

ÂÂÂ

This gives us:

Definition The sample correlation coefficient of 1 1( , ), , ( , )n nx y x y is given by:

xy

xx yy

sr

s s=



The formulae for , ,xx yy xys s s and r can be found on pages 24 and 25 of the Tables.

2.5 Correlation and causation

So far we have used scatterplots to see if there is any connection between two variables. We can then quantify the type and strength of that relationship using the correlation coefficient. For example, gas and electricity bills from a particular household over the last few years have a correlation coefficient of 0.8. There is strong positive correlation between the amount charged on each of the bills. This means that as the gas bill increases so does the electricity bill for that household. BUT is the increase in the gas bill the cause of the increase in the electricity bill? Clearly not! Both of them are due to the seasons – in summer we will use less gas for heating and less electricity for lights, whereas in winter we will use more of both. This is the difference between correlation (ie how they change together) and the cause of that correlation. Just because variables are correlated doesn’t necessarily imply that one changing causes the other to change – there might some other variable causing them to change together. So be careful before jumping to conclusions!



Question 12.9

This table below compares the literacy rate (x) and life expectancy (y) of men in various countries:

Alg

eria

An

gola

Irel

and

Ban

glad

esh

Bol

ivia

Iran

Literacy rate (%) 64 56 99 57 85 89 Life expectancy (yrs) 68 45 74 58 60 69

2 2450 35,428 374 23,850 28,745x x y y xy= = = = =Â Â Â Â Â

(i) Show that the sample correlation coefficient for these data is 0.732. (ii) What correlation is shown by the value in part (i)? (iii) Is there cause and effect between literacy and life expectancy (ie does reading

improve your life expectancy)? If so, explain how. If not, state the variable which is causing both of these to change together.



3 Regression

Once we have drawn a scatterplot and shown that there is a (strong) linear relationship between the two variables, we can attempt to represent that linear relationship by drawing a line. If we have perfect correlation the line would pass through all the points. If we don’t have a perfect relationship then the line just shows the general pattern of the points.

3.1 Line of ‘best fit’

The line of ‘best fit’ is drawn on the scatterplot by hand to show the relationship between the two variables. We would expect it to have the same slope as the pattern and to have roughly the same number of points on both sides of the line. We would also expect it to go through mean ( , )x y of the co-ordinates. For example:

50

55

60

65

70

75

145 150 155 160 165 170 175 180 185

Height (cm)

Wei

ght

(kg)

0

5

10

15

20

25

30

35

0 10 20 30 40 50 60

Time (secs)

Mis

tak

es

line of ‘best fit’

line of ‘best fit’



The method for drawing a line of ‘best fit’ by eye is: Plot the mean of the co-ordinates ( , )x y

Place your ruler through the mean point and turn it so that it has the same slope as the pattern of points and has roughly the same number of points on either side

Draw the line.

Once we have drawn the line of ‘best fit’ we can use it to read off predicted values. For example, we could estimate the height of someone who weighs 65 kg:

50

55

60

65

70

75

145 150 155 160 165 170 175 180 185

Height (cm)

Wei

ght

(kg)

We can see, using the line of ‘best fit’, that someone who weighs 65kg would be expected to have a height of about 168 cm. The accuracy of this estimate depends largely on how strong the correlation is. Anyone who has used the age/height chart in the children’s clothes section of Marks & Spencer can testify how depressing it is to be estimated an age of two years younger than you are, based on your height!



Question 12.10

The marks of 10 students in their Subject CT3 mock and their actual Subject CT3 exam are as follows:

1 2 3 4 5 6 7 8 9 10 Subject CT3 mock (x) 36 50 17 42 38 66 30 60 26 45 Subject CT3 exam (y) 49 68 34 55 56 80 46 73 39 60

2 2410 18,850 560 33,308 24,934x x y y xy= = = = =Â Â Â Â Â

A scatterplot of these data is given below:

x

Mock Exam10 20 30 40 50 60

y

Act

ual E

xam

10

20

30

40

50

60

70

80

(i) Plot the mean of the co-ordinates ( , )x y .

(ii) Draw the line of ‘best fit’. (iii) Hence, estimate the final exam score for a student who obtained 56 in their

Subject CT3 mock.



It is extremely useful to calculate the equation of our line of ‘best fit’. This allows us to make predictions without referring to the graph. The equation of a straight line is: y xa b= +

where b is the gradient (ie how many units ‘up’ the line goes for every one unit

‘across’) and a is the y-intercept (ie where the line crosses the y-axis). For example, a scatterplot of the number of litres of water dispensed per week from an office water machine against air temperature is shown below:

x

litres dispensed50 100 150 200

y

tem

pera

ture

ºC

5

10

15

20

25

across 190

up 10

The y-intercept is where the graph crosses the y-axis:

-intecept 15ya = =

The gradient is the how many units ‘up’ the line goes for every one unit ‘across’:

up 10

gradient 0.0526across 190

b = = = =

So we get 15 0.0526y x= + .



Question 12.11

Here is the line of ‘best fit’ for the mock and exam results from Question 12.10:

x

Mock Exam10 20 30 40 50 60

yA

ctua

l Exa

m

10

20

30

40

50

60

70

80

By finding the y-intercept and gradient, write down the equation of this line of ‘best fit’. At school it is likely you would have used the notation y mx c= + where m was the

gradient and c was the y-intercept. The Greek letters and different order is just to confuse you!



3.2 Regression line

The line of ‘best fit’ drawn by eye is certainly not the most accurate way to obtain a line that represents the correlation. Nor is finding the equation of the line by looking at the graph an accurate method! What we need is a way to calculate the y-intercept and the gradient just using the points themselves. The line of ‘best fit’ obtained using this method is called the regression line as we are working backwards (ie regressing) from the points to get the equation of the line. The method involves considering how far ‘out’ each of the points is from our regression line:

x10 20 30 40 50

y

10

20

30

40

50

errors

e2

e1 e3 e4

e5

e6 e7

(x1 , y1)

(x2 , y2)

(x3 , y3)

(x4 , y4)

(x5 , y6)

(x6 , y6) (x7 , y7)

regression liney = + x

If we had perfect correlation, all the points would lie on the regression line:

1 1

2 2

y x

y x

a b

a b

= +

= +

But since we don’t, the first point 1 1( , )x y is a vertical distance of 1e ‘out’ from the

regression line, the second point 2 2( , )x y is a vertical distance of 2e ‘out’ from the

regression line, and so on.



So, we actually have:

1 1 1

2 2 2

y x e

y x e

a b

a b

= + +

= + +

The ie ’s are called the errors or the residuals – these simply tell us how much our

actual y value is ‘out’ from our y value on the regression line. Now to make the regression line fit these points as closely as possible we would like to make all these errors as small as possible. Since we don’t care whether the errors are positive or negative, we shall make their squares as small as possible:

2min ieÂ

So what we are going to do is to find the values of a and b that make this sum of

squares as small as possible. The values obtained are therefore called the least squares estimates of a and b .

Now to find the a and b that minimises 2ieÂ we will differentiate it by a and by

b . But first we need to get some a ’s and b ’s in this equation! Rearranging our

expressions above we get:

1 1 1

2 2 2

e y x

e y x

a b

a b

= - -

= - -

So we want:

2min ( )i iy xa b- -Â

Differentiating this expression by a and by b , and setting each differential equal to

zero (to get the minimum), we get:

ˆ

ˆ xy

xx

y x

s

s

a b

b

= -

=



The complete details of this proof are given in Appendix C. Note the little hats are a mathematical way of saying this is our estimate of the value (rather than the true value). Let’s calculate the least squares estimates of a and b for our scatterplot of the litres of

water dispensed against the temperature outside.

x

litres dispensed50 100 150 200

y

tem

pera

ture

ºC

5

10

15

20

25

The values used in this graph were:

Litres dispensed 41 70 150 50 170 200 Temperature (ºC) 17 19 22 18 24 26

2 2681 100,481 126 2,710 15,507x x y y xy= = = = =Â Â Â Â Â

Now:

681

113.56

xx

n= = =Â and

12621

6

yy

n= = =Â

We also need the sum of squares:

2 2 2100, 481 6 113.5 23,187.5

15,507 6 113.5 21 1, 206

xx i

xy i i

s x nx

s x y nx y

= - = - ¥ =

= - = - ¥ ¥ =

ÂÂ



Hence:

1,206ˆ 0.0520

23,187.5xy

xx

s

sb = = =

and: ˆ 21 0.0520 113.5 15.1y xa b= - = - ¥ =

So our fitted regression line is: ˆ 15.1 0.0520y x= +

This is similar (but obviously more accurate) than our line of ‘best fit’ by eye which was 15 0.0526y x= + .

Question 12.12

The table below compares the GDP (billion US$) and child mortality rates (deaths of under 5’s per 1,000 births) for various countries:

Alg

eria

An

gola

Irel

and

Ban

glad

esh

Bol

ivia

GDP (x) 45.9 7.4 72.7 32.8 8.1 Child mortality (y) 52 191 6 104 84

2 2166.9 8,588.31 437 57,093 8,328x x y y xy= = = = =Â Â Â Â Â

(i) Show that the least squares estimates of a and b are 156.6 and –2.074,

respectively. (ii) Write down the equation of the fitted regression line. (iii) Use this line to estimate the child mortality for a country with a GDP of

60 billion US$.



4 Appendix A – rearranging the covariance formula

The formula for the sample covariance is:

1( )( )

1 i ix x y yn

- -- Â

Multiplying out the brackets and splitting up the sum:

( )

1( )

1

1

1

i i i i

i i i i

x y x y xy x yn

x y x y xy x yn

= - - +-

= - - +-

Â

Â Â Â Â

We can take the x and y terms out of the sums, as they are constants (ie they don’t

depend on i):

( )

( )

11

1

1

1

i i i i

i i i i

x y y x x y x yn

x y y x x y nx yn

= - - +-

= - - +-

Â Â Â Â

Â Â Â

Now we use the fact that:

1

1

i i

i i

x x x nxn

y y y nyn

= fi =

= fi =

Â Â

Â Â

This gives:

( )

( )

1

1

1

1

i i

i i

x y nx y nx y nx yn

x y nx yn

= - - +-

= --

Â

Â



5 Appendix B – why does the covariance formula work?

To show why the covariance formula gives a positive result for variables with positive correlation we’ll look at the height and weight example once more:

50

55

60

65

70

75

145 150 155 160 165 170 175 180 185

Height (cm)

Wei

ght

(kg)

The mean height ( 163.5x = cm) and the mean weight ( 63.5y = kg) have been drawn on

the scatterplot. From this we can see that for positive correlation, most of the points are in the top right and the bottom left quadrant. To calculate the covariance we will multiply ix x- and iy y- for each point. From

the diagram we can see that for points in the top right quadrant we have: ( )( )i ix x y y ve ve ve- - = + ¥ + = +

eg (175,69) ( )( ) (175 163.5)(69 63.5) 11.5 5.5 63.25i ix x y yfi - - = - - = ¥ =

For points in the bottom left quadrant, ix x- and iy y- will both be negative. So:

( )( )i ix x y y ve ve ve- - = - ¥ - = +

(150,56) ( )( ) (150 163.5)(56 63.5) 13.5 7.5 101.25i ix x y yfi - - = - - = - ¥ - =

So for positive correlation, the majority of points will give positive values of ( )( )i ix x y y- - . Therefore, when we total these up we will get a positive covariance.

(175,69)

(150,56)



To show why the covariance formula gives a negative result for variables with negative correlation we’ll look at the example of time taken and mistakes made whilst completing a drawing by looking in the mirror.

0

5

10

15

20

25

30

35

0 10 20 30 40 50 60

Time (secs)

Mis

tak

es

We can see that for negative correlation, most of the points are in the top left and bottom right quadrants. From the diagram we can see that for points in the top left quadrant we have: ( )( )i ix x y y ve ve ve- - = - ¥ + = -

eg (10,29) ( )( ) (10 32)(29 14.5) 22 14.5 319i ix x y yfi - - = - - = - ¥ = -

For points in the bottom right quadrant, we have: ( )( )i ix x y y ve ve ve- - = + ¥ - = -

eg (57,5) ( )( ) (57 32)(5 14.5) 25 9.5 237.5i ix x y yfi - - = - - = ¥ - = -

So for negative correlation, the majority of points will give negative values of ( )( )i ix x y y- - . Therefore, when we total these up we will get a negative covariance.

Finally, for variables with no correlation, the points will be scattered in all four quadrants. Therefore there will be a mixture of positive and negative values for ( )( )i ix x y y- - . When we add these up they will cancel out to give zero or something

near zero.

(57,5)

(10,29)



6 Appendix C – deriving the least squares estimates

We want to find the values of a and b that minimise:

2( )i iy xa b- -Â

Let 2( )i iS y xa b= - -Â . Now differentiating S by a and setting the result equal to

zero (to get the minimum):

2 ( ) 0i iS

y xa ba∂ = - - - =∂ Â

We have used the ‘chain rule’ – multiply by the power of the bracket, reduce the power of the bracket by 1 and then multiply by the derivative of the bracket. Dividing both sides by 2- and then splitting up the summation:

( ) 0

0

i i

i i

y x

y x

a b

a b

- - =

fi - - =

ÂÂ Â Â

Taking out the a and b from the summations (as they don’t depend on i):

1 0

0

i i

i i

y x

y n x

a b

a b

- - =

fi - - =

Â Â ÂÂ Â

Rearranging:

i i

i i

n y x

y x

n n

a b

a b

= -

fi = -

Â ÂÂ Â

Hence: ˆ y xa b= -



Next we will find the value of b that minimises:

2( )i iS y xa b= - -Â

Differentiating by b and setting the result equal to zero (to get the minimum):

2 ( ) 0i i iS

x y xa bb∂ = - - - =∂ Â

Dividing both sides by 2- and then splitting up the summation:

2

2

( ) 0

( ) 0

0

i i i

i i i i

i i i i

x y x

x y x x

x y x x

a b

a b

a b

- - =

fi - - =

fi - - =

ÂÂÂ Â Â

Taking out the a and b from the summations (as they don’t depend on i):

2 0i i i ix y x xa b- - =Â Â Â

Now, earlier we found that y xa b= - . Substituting this in, we get:

2

2

( ) 0

0

i i i i

i i i i i

x y y x x x

x y y x x x x

b b

b b

- - - =

fi - + - =

Â Â ÂÂ Â Â Â

Rearranging:

( )2

2

i i i i i

i i i

i i

x y y x x x x

x y y x

x x x

b

b

- = -

-fi =

-

Â Â Â Â

Â ÂÂ Â

Now using the fact that ix nx=Â (since ixx

n= Â ), we get:

2 2ˆ xyi i

xxi

sx y nx y

sx nxb

-= =

-ÂÂ



Extra practice questions

P12.1 Subject C1, September 1995, Q17 (part) In a study into vehicle emissions eight vehicles were thoroughly tested and the following data on hydrocarbon emissions (grams/metre) and carbon monoxide emissions (grams/metre) were obtained: Hydrocarbons (x) 0.83 0.72 0.65 0.57 0.55 0.51 0.43 0.37 Carbon Monoxide (y) 15.1 16.6 14.7 8.0 10.3 5.1 5.5 4.1

2 24.63, 2.8391, 79.4, 962.82, 50.748x x y y xy= = = = =Â Â Â Â Â

(i) Draw a scatterplot of these data and comment on the relationship between

hydrocarbon and carbon monoxide emissions. [3] (ii) Calculate the correlation coefficient r. [3] (iii) Calculate the fitted regression line using carbon monoxide as the response

variable. [2] [Total 8]

P12.2 Subject C1, April 2000, Q12 A random sample of 200 pairs of observations ( , )x y from a discrete bivariate

distribution ( , )X Y is as follows:

the observation ( 2,2)- occurs 50 times

the observation (0,0) occurs 90 times

the observation (2, 1)- occurs 60 times.

Calculate the sample correlation coefficient for these data. [4]



P12.3 Subject C1, September 1994, Q16 (part) One of the conclusions of a 1980 study appearing in the journal Advances in Cancer Research was that “… none of the risk factors for cancer is probably more significant than diet and nutrition”. The following data are from an investigation into the relationship between fat consumption x and prostrate cancer deaths y:

Country Dietary fat x (g/day)

Death rate y (per 100,000)

Philippines Mexico

Colombia Yugoslavia

Panama Romania

Czechoslovakia Spain

Finland United Kingdom

Canada France

Australia United States

Sweden

29 57 47 72 58 67 96 97 112 143 142 137 129 147 132

1.3 4.5 5.4 5.6 7.8 8.8 9.1 10.1 11.7 12.4 13.4 14.4 15.1 16.3 18.4

2 21465 165,561 154.3 1,915.39 17,578.5x x y y xy= = = = =Â Â Â Â Â

(i) Draw a scatterplot of these data and comment briefly on your findings. [3] (ii) (a) Calculate the fitted regression line using death rate as the response

variable and dietary fat as the explanatory variable. (b) Comment briefly on the interpretation of the fitted slope coefficient. [4] [Total 7]



P12.4 Subject C1, September 2000, Q16 (part) At the end of the skiing season the tourist board in a mountain region examines the records of ten ski resorts. For each one it obtains the total number (y, thousands) of visitor-days during the season as a measure of the resort’s popularity, and the ski-lift capacity (x, thousands), being the maximum number of skiers that can be transported per hour. The resulting data are given in the following table: Resort A B C D E F G H I J Lift Capacity x: 1.9 3.3 1.2 4.2 1.5 2.2 1.0 5.6 1.9 3.8 Visitor-days y: 15.1 22.6 9.2 37.5 8.9 21.2 5.8 41.0 9.2 32.4

26.6x =Â , 2 91.08x =Â , 202.8y =Â , 2 5,603.12y =Â , 707.58xy =Â

(i) Draw a scatterplot of y against x and comment briefly on any relationship

between a resort’s popularity and its ski-lift capacity. [2] (ii) Calculate the correlation coefficient between x and y and comment briefly in the

light of your comment in part (i). [3] (iii) Calculate the fitted linear regression equation of y on x. [2] [Total 7]



Chapter 12 Summary Scatterplot Bivariate data are data that have two variables (eg height and weight). A scatterplot (or scatter diagram) is a plot of our bivariate data, with one variable (eg height) plotted on the x-axis (called the explanatory variable) and the other variable (eg weight) plotted on the y-axis (called the response variable). A scatterplot is used to see if there is any relationship or connection (called correlation) between the two variables. Correlation There are three types of linear correlation:

negativecorrelation

nocorrelation

positivecorrelation

The clearer the pattern, the stronger the correlation. If there is an exact linear relationship we say that there is perfect linear correlation. The sample correlation coefficient, r, measures the type and strength of the connection between two variables:

xy

xx yy

sr

s s=

where:

2 2 2 2xx i yy i xy i is x nx s y ny s x y nx y= - = - = -Â Â Â



The value of the correlation coefficient ranges from 1- to 1:

perfect negativecorrelation

nocorrelation

perfect positivecorrelation

r = – 1 r = 0 r = + 1

Correlation between two variables does not necessarily imply causation! Regression If there is a (strong) linear relationship between the two variables we can represent that linear relationship with a regression line: y xa b= +

where b is the gradient (ie how many units ‘up’ the line goes for every one unit

‘across’) and a is the y-intercept (ie where the line crosses the y-axis). The least squares estimates of a and b are given by:

ˆ

ˆ xy

xx

y x

s

s

a b

b

= -

=

They are calculated by minimising the sum of the squares of the residuals or errors (that is how much each y value is ‘out’ from the regression line).




We have a decreasing pattern in the points – generally as the time taken increases, the number of mistakes made decreases. However, it’s not entirely clear whether this relationship is linear or not. Solution 12.2

(i) Negative correlation – as the number of cigarettes smoked increases, life expectancy would decrease.

(ii) Positive correlation – the further one lives from work, the greater the time taken

to get there. (iii) Positive correlation. More bedrooms means the house is bigger and so the

insurance charged on it will increase. (iv) Negative correlation – as the amount of “no claims” discount increases the cost

of car insurance decreases. (v) No correlation – there should be no connection between the number of exams

passed in a sitting and the length of hair. Solution 12.3

weak negative correlation

strong negative correlation

perfect negative correlation



Solution 12.4

(i) There is an exact relationship between pounds exchanged and dollars bought (assuming that there are no fluctuations in the exchange rate over the day). This is Scatterplot C.

(ii) There will be a very weak relationship between height and shoe size (as there are

many examples of small people with big feet and vice versa). This is Scatterplot A.

(iii) We would expect there to be a fairly strong relationship between engine size and

the cost of car insurance. This is Scatterplot B. Solution 12.5

(i) The formula for the sample covariance is:

{ }1

1 i ix y nx yn

-- Â

Calculating the means:

16 25 38 61 79 21943.8

5 5

7.8 5.2 4.3 2.5 1.7 21.54.3

5 5

xx

n

yy

n

+ + + += = = =

+ + + += = = =

Â

Â

Calculating i ix yÂ :

(16 7.8) (25 5.2) (38 4.3) (61 2.5) (79 1.7) 705¥ + ¥ + ¥ + ¥ + ¥ =

This gives us a covariance of:

{ }1 236.7705 5 43.8 4.3 59.175

4 4

-- ¥ ¥ = = -

(ii) We have a negative value for the covariance which means that there is negative

correlation between the variables. As the mileage of the car increases, the price will decrease.



Solution 12.6

The covariance will be zero or close to zero. Solution 12.7

covariance of and 12.51

( ) ( ) 2.5 62.5

X Yr

sd X sd Y= = =

¥

There is perfect positive correlation between the pay received and the number of hours worked. Therefore the correlation coefficient is also 1. Solution 12.8

Scatterplot A shows strong positive correlation, 0.8r . Scatterplot B shows very strong negative correlation, 0.95r - . Scatterplot C shows weak negative correlation, 0.65r - . The value for Scatterplot C is probably more negative than you might think. The pattern is fairly clear and there are no outliers (ie points that are way off the general pattern).



Solution 12.9

(i) First calculating the means:

450

756

ixx

n= = =Â

13

37462

6iy

yn

= = =Â

Next we calculate the sum of squares:

( )

2 2 2

22 2 1 13 3

13

35,428 6 75 1,678

23,850 6 62 537

28,745 6 75 62 695

xx i

yy i

xy i i

s x nx

s y ny

s x y nx y

= - = - ¥ =

= - = - ¥ =

= - = - ¥ ¥ =

Â

ÂÂ

This gives:

13

6950.732

1,678 537

xy

xx yy

sr

s s= = =

¥

(ii) There is fairly strong positive correlation, ie as literacy rates increase so does life

expectancy. (iii) No! It is to do with how economically developed the country is. A less

economically developed country has low literacy and low life expectancy whereas a more economically developed country will have higher literacy and higher life expectancy.



Solution 12.10

(i) The mean of the co-ordinates ( , )x y is (41,56) .

(ii) The line of ‘best fit’ is:

x

Mock Exam10 20 30 40 50 60

y

Act

ual E

xam

10

20

30

40

50

60

70

80

(iii) Reading off the graph, we get a value of about 71. Solution 12.11

The y-intercept is about 16. Depending on the points you consider, the gradient will be between 0.93 to 1. This will give a line of ‘best fit’ of, say, 16 0.95y x= + .

The true equation of the line is 16.3 0.968y x= + – our accuracy is limited by the plot

of our graph and our ability to read from it!



Solution 12.12

(i) First calculating the means:

166.9

33.385

ixx

n= = =Â

437

87.45

iyy

n= = =Â

Next we calculate the sum of squares:

2 2 28,588.31 5 33.38 3,017.188

8,328 5 33.38 87.4 6, 259.06

xx i

xy i i

s x nx

s x y nx y

= - = - ¥ =

= - = - ¥ ¥ = -

ÂÂ

So we get:

6,259.06ˆ 2.074

3,017.188xy

xx

s

sb -= = = -

and: ˆ 87.4 ( 2.074) 33.38 156.6y xa b= - = - - ¥ =

(ii) The fitted regression line is: ˆ 156.6 2.074y x= -

(iii) Substituting 60x = into our line of regression, we get: ˆ 156.6 2.074 60 32.2y = - ¥ =




P12.1 (i) The scatterplot for these data is shown below:

0

2

4

6

8

10

12

14

16

18

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Hydrocarbons

Car

bon

Mon

oxid

e

The scatterplot shows positive linear correlation, ie cars emitting more

hydrocarbons also emit more carbon monoxide. (ii) The sample correlation coefficient is given by:

xy

xx yy

sr

s s=

So we require ,xx yys s and xys which in turn require x and y .

4.63 79.4

0.57875 and 9.9258 8

x yx y

n n= = = = = =Â Â

2 2 2

2 2 2

2.8391 8 0.57875 0.1594875

962.82 8 9.925 174.775

50.748 8 0.57875 9.925 4.79525

xx

yy

xy

s x nx

s y ny

s xy nx y

= - = - ¥ =

= - = - ¥ =

= - = - ¥ ¥ =

ÂÂÂ

Hence:

4.79525

0.908 (3 SF)0.1594875 174.775

r = =¥



(iii) We have:

4.79525ˆ 30.07

0.1594875xy

xx

s

sb = = =

and: ˆ 9.925 30.07 0.57875 7.48y xa b= - = - ¥ = -

Hence, the fitted regression line is: ˆ 7.48 30.07y x= - +



P12.2 The sample correlation coefficient is given by:

xy

xx yy

sr

s s=


(50 2) (90 0) (60 2) 200.1

50 90 60 200

fxx

f

¥ - + ¥ + ¥= = = =+ +

ÂÂ

(50 2) (90 0) (60 1) 400.2

50 90 60 200

fyy

f

¥ + ¥ + ¥ -= = = =+ +

ÂÂ

Now since there are only 3 different values for x and y it’s probably slightly easier to

use 2 2( ) , ( )xx yys x x s y y= - = -Â Â and ( )( )xys x x y y= - -Â . This gives:

2 2 250 ( 2 0.1) 90 (0 0.1) 60 (2 0.1) 438xxs = ¥ - - + ¥ - + ¥ - =

2 2 250 (2 0.2) 90 (0 0.2) 60 ( 1 0.2) 252yys = ¥ - + ¥ - + ¥ - - =

50 ( 2 0.1)(2 0.2) 90 (0 0.1)(0 0.2) 60 (2 0.1)( 1 0.2)

324

xys = ¥ - - - + ¥ - - + ¥ - - -

= -

So the sample correlation coefficient is:

3240.975

438 252r

-= = -¥

Alternatively, using 2 2xxs x nx= -Â , etc gives:

2 2 2 250 ( 2) 90 0 60 2 200 0.1 438xxs È ˘= ¥ - + ¥ + ¥ - ¥ =Î ˚

2 2 2 250 2 90 0 60 ( 1) 200 0.2 252yys È ˘= ¥ + ¥ + ¥ - - ¥ =Î ˚

[ ]50 ( 2 2) 90 (0 0) 60 (2 1) 200 0.1 0.2 324xys = ¥ - ¥ + ¥ ¥ + ¥ ¥ - - ¥ ¥ = -



P12.3 (i) The scatterplot for these data is:

0

2

4

6

8

10

12

14

16

18

20

0 20 40 60 80 100 120 140 160

Dietary fat

Dea

th r

ate

The scatterplot shows positive linear correlation, ie countries where the

population eat more dietary fat have a greater death rate.

(ii) (a) The fitted regression line is ˆˆy xa b= + where:

ˆ

ˆ xy

xx

y x

s

s

a b

b

= -

=

So we require , , xxx y s and xys :

1, 46597.6

15

154.310.286

15

xx

n

yy

n

= = =

= = =

Â

Â

2 2 2165,561 15 97.6 22, 479.3

17,578.5 15 97.6 10.286 2,508.53

xx

xy

s x nx

s xy nx y

= - = - ¥ =

= - = - ¥ ¥ =

ÂÂ



Hence:

2,508.53ˆ 0.1116022,478.3

ˆ 10.286 0.11160 97.6 0.61272

b

a

= =

= - ¥ = -

Therefore, our fitted regression line is: ˆ 0.11160 0.61272y x= -

(b) The slope parameter is the gradient of the regression line, so an increase in 1

g/day of fat increases the death rate by 0.11160 per 100,000.

P12.4 (i) The scatterplot for these data is:

0

5

10

15

20

25

30

35

40

45

0 1 2 3 4 5 6

Lift Capacity

Vis

itor

day

s

The scatterplot shows positive linear correlation, ie resorts with a greater lift

capacity are more popular.



(ii) The formula for the correlation coefficient is:

xy

xx yy

sr

s s=


26.6 202.8

2.66 and 20.2810 10

x yx y

n n= = = = = =Â Â

2 2 2

2 2 2

91.08 10 2.66 20.324

5,603.12 10 20.28 1,490.336

707.58 10 2.66 20.28 168.132

xx

yy

xy

s x nx

s y ny

s xy nx y

= - = - ¥ =

= - = - ¥ =

= - = - ¥ ¥ =

ÂÂÂ

Hence:

168.132

0.96620.324 1490.336

r = =¥

This indicates strong positive correlation, which backs up our theory of a linear

relationship from (i). (iii) The fitted regression line of y on x is y xa b= + where:

ˆ

ˆ xy

xx

y x

s

s

a b

b

= -

=

Hence:

168.132ˆ 8.2720.324

ˆ 20.28 8.27 2.66 1.73

xy

xx

s

s

y x

b

a b

= = =

= - = - ¥ = -

So the equation of the regression line of y on x is: ˆ 8.27 1.73y x= -

Item code: PSTA15

Documents

Transcript of Item code: PSTA15