Distribution of total and sample mean

58
Distribution of Distribution of total and sample total and sample mean mean

description

Distribution of total and sample mean. Sample Statistics & Data display. We can calculate statistics form a sample. These reflect what is happening in the population as a whole. The statistics in the sample reflect the parameters in the population. Example. - PowerPoint PPT Presentation

Transcript of Distribution of total and sample mean

Page 1: Distribution of  total and sample mean

Distribution of Distribution of total and sample meantotal and sample mean

Page 2: Distribution of  total and sample mean

Sample Statistics & Data display

We can calculate statistics form a sample.These reflect what is happening in the population as a whole. The statistics in the sample reflect the parameters in the

population

Notation Population Sample

Mean

Standard deviation

Variance

x

s

2 2sParameters Statistics

Page 3: Distribution of  total and sample mean

Example

25 65

1625kg

35kg

25 people in a lift. They have a mean weight of 65kg and a SD of 7kg. Find the mean and SD of the load

( )E T n

5 7

25 7

( )SD T n

Page 4: Distribution of  total and sample mean

If we repeat an experiment a certain number of times, then T is the sum of n independent random variables.

2

( )

( )

( )

E T n

VAR T n

SD T n

Page 5: Distribution of  total and sample mean

A fruit and vegetable market accepts deliveries of crates of apples. Each crate has a weight that is normally distributed with a mean of 21kg and a standard deviation of 0.4 kg. The crates are delivered in groups of 18 on pallets that weigh exactly 30kg.

a) Calculate the mean total weight of a pallet with 18 crates of apples

b) Calculate the standard deviation of the total weight of a pallet with 18 crates of apples.

Page 6: Distribution of  total and sample mean

Central Limit TheoremConsider a sample size n from a population X with a mean of µ and a SD of σ

sample mean =µ

Sample standard deviation s =

Variance s2= σ2

x

n

is sometimes called the standard error of the sample meann

If n is large (>30) then the distribution of the sample means will be approximately a normal distribution

The Central Limit Theorem states that values of the sample means could be expected to average out to the population mean. There is a certain amount of spread about the mean. This is the standard error or standard deviation of the sample mean

xn

Page 7: Distribution of  total and sample mean

Example

A sample of size 20 is taken from a box of beans. The mean length of the beans in the box is 19 cm with a SD of 2.5cm.

a) What would the expected value of the sample be?

b) What would the variance of the sample be?

c) What would the standard error of the sample be?

) The expected value E(X)=μ

19

a

cm

2

2

) The variance =

(2.5)

200.3125

bn

) The standard error

is the standard deviation

=n

2.5

200.559

c

cm

Page 8: Distribution of  total and sample mean

Random Variable

Mean Variance Standard Deviation

PopulationX

Total of n values T

Samplemean

Summary Table from p. 182

2

n 2n n

X

2

n

n

We need to know the difference between the mean, variance and standard deviation, of the population, total of n values and the sample

Page 9: Distribution of  total and sample mean
Page 10: Distribution of  total and sample mean

When we deal with the sum of a few variables We use:

Probabilities for the total

( )E T n2( )VAR T n

( )SD T n

The distribution of the sum is normal, it will be shaped like the bell curve

Lower

Upper

The probability is the area under the curve

the sum will be within a certain given range

Page 11: Distribution of  total and sample mean

Example

A sample of 16 items is taken from a population X with a mean µ=34, and a SD σ=4 Calculate the probability that the total T of 16 items is below 530

16 34 4n

34 16

544

544

530

16 4

16

SD n

Lower: -1Exp99

Upper: 530

= 16

= 544

0.80921

0.8092(4 )

P

dp

Page 12: Distribution of  total and sample mean

Example

A lift is licensed to carry a maximum of 25 passengers. It is overloaded when the total passenger loads exceeds 1700kg. The weight of single passengers chosen at random have a mean of 65kg and a standard deviation of 7kg. Calculate the probability that the lift is overloaded, assuming the lift is carrying 25 passengers.

Page 13: Distribution of  total and sample mean
Page 14: Distribution of  total and sample mean

Probabilities for the sample mean

Sometimes we need to know the probability of where the sample mean is likely to be in relation to the population mean. The sample mean is likely to have a smaller spread as the standard deviation will be smaller

For this we use: 2

( )

( )

VAR Xn

SD Xn

Page 15: Distribution of  total and sample mean

Probability for samples

A sample size of 36 is taken from a normally distributed population with a mean of 40 and a standard deviation of 12. Calculate the probability that the sample mean is

a) Less then 41

b) Between 37 and 42

Page 16: Distribution of  total and sample mean
Page 17: Distribution of  total and sample mean

Confidence IntervalsRemember we calculate statistics from a sample to estimate

the parameters of the population

Each sample mean will be slightly different for every other sample mean, so it is better to give an interval that we will be confident that the sample mean is within. This is our degree of confidence.

The spread of the values that the sample means take gives an idea of how accurate the estimate is. This is called the confidence interval.

The spread on either side of the mean, the standard deviation of the mean is called the standard error

Page 18: Distribution of  total and sample mean

Using the calculator to find confidence intervals

Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38

95%

0.5

0.475

For the calculator

0.475

Confidence interval between these boundaries 0.5 0.475

0.975

Area

0

Calculator only measures from the far left We can use the

calculator to find Z the number of SDs

Page 19: Distribution of  total and sample mean

Calculating the Sample SizeIf we want to have a certain confidence level that the sample mean of a sample we are going to take, will lie with in given boundaries.

The margin of error is the distance between one of the end points of the interval and the sample mean

Margin of error

e

Eg For 30m<µ<34m, the confidence interval is 32±2m

The margin of error is 2m

n

e=z ×

Margin of error

e

n

e=z ×

µ

Page 20: Distribution of  total and sample mean

A certain make of scientific calculator is known to have a voltage rating with a standard deviation of 0.05v. The mean voltage of 40 of these calculators is 3.02V.

a. Construct a 90% confidence interval for the average voltage.

b. Explain the meaning of this confidence interval

Page 21: Distribution of  total and sample mean

Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38

95%

0.5

0.475

From the calculator

0.475

Confidence interval between these values

1.96Z

26.58 30.02

z zX X

n n

(1.96)(4.38) (1.96)(4.38)28.3 28.3

25 25

28.3 1.717 28.3 1.717

26.58 30.0228.3

Page 22: Distribution of  total and sample mean

Using the Calculator to check your answer

In Stats mode

F4 intr

Z F1

1 s F1

Sample with one mean

F2 Var

Eg #1 Ex14.1

Construct a 95% confidence interval, given n=25, µ=28.3, σ=4.38

Enter values

EXE

26.58 30.02

Page 23: Distribution of  total and sample mean

WB Eg 11

The time taken for an individual to walk to work is to be estimated. On 15 occasions the time in minutes were, 18, 17, 15, 20, 16, 14, 19, 13, 17, 16, 14, 15, 20, 18, 19

a) Find the sample mean and SD

b) Assuming normal distribution and that the sample is sufficiently large, calculate a 95% confidence interval for the mean time to walk to work.

Use the calculator to answer a)16.73(2 ), 2.17(2 )x dp dp

(1.96)(2.17) (1.96)(2.17)16.73 16.73

15 15

16.73 1.10 16.73 1.10

15.6 17.8minutes

95%

0.475+0.5

=0.975

Z=1.96

0.475

z zX X

n n

Page 24: Distribution of  total and sample mean

Interpreting Confidence Intervals

16.73(2 ), 2.17(2 )x dp dp

15.6 17.8minutes

16.7315.6 17.8

There is a 95% probability that the interval 15.6-17.8 contains the true mean.

Page 25: Distribution of  total and sample mean

Ex P75 4.01

Page 26: Distribution of  total and sample mean

Confidence Interval for

Proportions

Page 27: Distribution of  total and sample mean

Confidence Intervals for ProportionsAnother parameter of the population is the population proportion p or π.

This is the probability of success over a large number of trials, which should be similar to the proportion of successes in the population as a whole

The best estimate of the proportion of success for the population is the sample

successes

number of trials

xp

nx

n

X, the random variable for the number of successes in the sample has a approximately a normal distribution.

( )E X np

( ) estimated value of XE X

pp

Page 28: Distribution of  total and sample mean

Example A random sample of 80 households showed that 30% owned

PCs.

Construct a 95% confidence interval for p, the percentage of households that own a PC

There is a 95% probability that the interval 19.96%-40.04%

contains the true population proportion.

(There is 95% probability that the interval 19.96%-40.04% contains the proportion

of households that own PCs.)

Page 29: Distribution of  total and sample mean

150

210p

In a sample of 210 people with high blood pressure a particular drug is found to be effective for 150 of them. Construct a 95% confidence interval for P the proportion of all patients who use this particular drug for high blood pressure

60

210q 1.96z

1 1

2 2

pq pqp Z p p Z

n n

(0.71429)(0.28571) (0.71429)(0.28571)0.71429 (1.96) 0.71429 (1.96)

210 210p

0.71429 0.28571

0.653 0.775p

65.3% 77.5%p

Page 30: Distribution of  total and sample mean

The main purpose of a recent survey was to estimate the proportion of all adult NZers who are opposed to tipping for service in restaurants. The survey used a random sample of 663 adult New Zealanders, of whom 292 indicated that they are opposed to tipping for service.

a) State clearly the parameter of interest in this survey (A)

b) Calculate a 90% confidence interval for the proportion of all adult NZers who oppose tipping.(A)

c) Analyse the effect of increasing the number of adults surveyed on the width of this confidence interval. (E)

d) Suppose 50 independent random samples of adult NZers are taken and 90% confidence interval is constructed from the results of each sample. Analyse the phrase “90% confidence" by making reference to these 50 confidence intervals. (E)

There is 90% probability that the true population proportion lies within the confidenceInterval of any one of the 50 random samples. That is 45 out of 50 confidence intervalscontains the true population proportion.

Page 31: Distribution of  total and sample mean

Motel occupancy rates for July 1997 from a random sample of 35 motels gave the following statistics:

• Sample size 35• Sample mean 0.572• Sample standard deviation:0.0651) Calculate a 95% confidence interval for the mean occupancy rate for July 1997 for the

population sampled. (A)

2) What would be the effect of increasing the level of confidence on the width of this confidence interval? (M)

3) The mean occupancy rate for the same population for July 1996 is 0.585. It is claimed that the mean occupancy rate for July 1997 is the same as the mean occupancy rate for July 1996. Using the confidence interval calculated in (a) at the 95% level of confidence, demonstrate whether the random sample gives us evidence against this claim. (M)

4) Calculate the number of motels needed to be sampled if the mean occupancy rate for July 1997 was to have been estimated to within 0.015 of its true value at the 95% level of confidence. (M)

Page 32: Distribution of  total and sample mean

Confidence interval for the difference between two

means

Page 33: Distribution of  total and sample mean

Confidence interval for the difference between two means

If two populations are the similar then we would expect the difference between their two means to be about zero.

If the populations are different then we would expect the means to be different.

So if two populations are different, the confidence interval of the difference between their means must not contain 0.

Notation mean SD Sample size Sample mean

Population 1

Population 2

1

1

21n

2n

1x

2x

1 2 1 2We use to estimate x x

Page 34: Distribution of  total and sample mean

1 2

1 2

1 2

( ) ( )

( ) ( )

E D E X X

E X E X

1 2

1 2

21 2

1 2

( ) ( )

( ) ( )

VAR D VAR X X

VAR X VAR X

n n

21 2

1 2

( )SD Dn n

21 1 2

1 21 22

Confidence Interval

( )x x Zn n

On formula sheet

1 2

21 2

1 2X X n n

Page 35: Distribution of  total and sample mean

Example A random sample of 30 objects is taken from a normally

distributed population with a SD of 6, another sample of 50 objects is taken from a population with a SD of 8. The mean of the first sample is 115, and that of the second is 108.

1) Construct a 96% confidence interval for µ1- µ2.

2) Explain whether its likely that the two groups have the same mean.

1 23.77 10.23 Is the 96% confidence interval for the difference between the two means.

The interval does not contain 0, so it is not likely that the two means are equal. We can say this with at least 96% confidence.

Page 36: Distribution of  total and sample mean

Students are told to measure the area of the classroom, they provide estimates which are approximately normally distributed with SD=0.15m2. 31 students measured one classroom obtained a mean of 29.76m2 , while 26 students measured another classroom and obtained a mean of 31.23m2. What is the 95% confidence interval for the amount by which the area of the second classroom exceeds that of the first.

2 11.392 1.548

We are 95% confident that the area of the second exceeds that of the first as zero is not in the confidence interval

This is the 95% confidence interval for the amount by which the area of the second classroom exceeds that of the first.

Page 37: Distribution of  total and sample mean

Interpretation

If the confidence interval includes zero then we cannot say that there is a difference between the two samples

If zero is not included then we are confident that there is a difference between the two samples

We need to make the assumptions that the samples are large enough and that they are independently selected and that the population they are selected from is normally distributed

Page 38: Distribution of  total and sample mean

a< μ2– μ1 <b

• If both a and b are positive, it is reasonable to assume that μ2 is larger than μ1 by between a and b units. It’s unlikely two means are the same.

• If both a and b are negative, it is reasonable to assume that μ2 is smaller than μ1 by between -a and -b units. It’s unlikely two means are the same.

• If a and b have opposite signs, it is reasonable to assume that μ2 is smaller than μ1 by –a or μ2 is larger than μ1 by b units or somewhere in between. This includes the possibility that the two means are equal.

Page 39: Distribution of  total and sample mean

True or false

A 99% confidence interval for the difference between two means is calculated from sample data. -3.5< μ2– μ1 <9.4.

a. There is a 99% probability that the means are equal because the interval includes 0.

b. 99% of intervals calculated in the same way will include the difference of the two means.

Page 40: Distribution of  total and sample mean

Below is a random sample of times for both male and female competitors to complete the annual Mountain Biking Race.

a) Calculate a 95% confidence interval for the difference between the mean time for males to complete the race and the mean time for females to complete the race.

b) In last year’s race, a similar 95% confidence interval for the difference between µmand µf was calculated and found to be -6.25< µm - µf <1.36. Based on this confidence interval, demonstrate whether there is a significant difference between the mean race times for males and females.

0 is in the 95% interval (-6.25< µm - µf <1.36) so it can be concluded that there is no significant difference between the mean race time for males and females.

Sample size Mean Standard deviation

Male 30 57min 10min

female 30 65min 14min

Page 41: Distribution of  total and sample mean

Below is the summary stats for the length of the snapper surveyed in each region are shown in the table below.

a) Calculate a 95% confidence interval for the difference between the mean length of snapper in the reserve and the non-reserve regions.

b) It is claimed that the ‘average snapper’ in the reserve is at least 130mm longer than the “average snapper” in the non-reserve region. Use the 95% confidence interval from a to analyse the validity of this claim.

95% of the confidence interval between 85.03 and 121.15 contains the difference between the non-reserve and reserve snappar. 130 mm is not in this interval and so one can be 95% sure that this claim is invalid.

Sample size Sample mean Sample standard deviation

Reserve 897 360.18 94.48

Non-reserve’ 47 257.09 59.35

Page 42: Distribution of  total and sample mean

Interpretation of Confidence IntervalsInterpretation of Confidence Intervals The company produces two different models of batteries. ‘power’

and ‘super’. 95 people were interviewed who have used both ‘power’ and ‘super’ batteries, to find out which of the two models these people prefer to use in their torches. Of the 95 people, 63 said that they prefer to use the ‘power’ model in their torches.

a) Find a 95% confidence interval for the proportion of all people who have used both ‘power’ and ‘super’ batteries and prefer to use the ‘power’ model of battery in their torches.

0.568<π<0.758

b) Write a clear description that gives the meaning of this confidence interval.

95% of the confidence intervals from 0.568 and 0.758 contain the true proportion of people who prefer the ‘power’ model.

Page 43: Distribution of  total and sample mean

Calculating Sample Size

Page 44: Distribution of  total and sample mean

If we are given a particular level of confidence we can calculate the sample size (n) to give the required margin of error (e)

first we need to find Z the number of SDs

95%

0.975 for calc

Z=1.96

0.4754

2 1.96n

41.96

2n

3.92n

15.366

16

n

n

n

e=z × 95% confidence interval, σ=4, margin of error e=2

How big is the sample size?

Page 45: Distribution of  total and sample mean

A random variable is known to have a standard deviation of 14. What sample size would be required to be 90% confident that an estimate of the mean was within 2 units of its true value

0.45

0.95 for the calc

1.6448Z

142 1.6448

n

141.6448

2n

11.5136n

132.56n

133n

n

e=z ×

Page 46: Distribution of  total and sample mean

Calculate sample sizefor proportion

A pilot survey from a few tax returns has shown that approximately 12% of all taxpayers are in ‘high-income’ category. If the Inland Revenue Department wishes to estimate this percentage to within 1%, with 96% confidence, how many tax returns should it sample?

Page 47: Distribution of  total and sample mean

Calculating sample size for proportion

A market research company wishes to estimate the percentage of people in a certain age bracket who read a current-affairs magazine. The degree of confidence required for this estimate is 90%. What sample size should be taken to estimate the percentage to within 4%.4%

0.04

e

90%

0.9

unstated so use 0.5

p=0.5

p0.5q

p

For Calculator

0.5 0.45

0.5 0.45

0.95

1.6448

1.645

Z

1

2

pqe Z

n

It is easier to rearrange the formula first

e pq

Z n

2

2

e pq

Z n

2

2

pqZn

e

2

2

(0.5)(0.5)(1.645)

(0.04)n

422.8n 423n

minimum sample size

is 423

Page 48: Distribution of  total and sample mean

Calculating Sample Size

What size of sample should be taken from a population of packets of butter, when the standard deviation of the weights of packets is 4 g, if the mean weight is to be estimated to within 0.5 g with 95% accuracy.

σ ze

2) n=1)Use inverse norm to find out Z value σ=1 μ=0

Page 49: Distribution of  total and sample mean

Sample size for population proportion

Radio Sport wishes to conduct an opinion poll on whether the captain of the New Zealand netball team should be replaced. The degree of confidence required for this poll is 95%. What size sample should be used to obtain the percentage to within 5% accuracy? 1)Use inverse norm to find out Z value σ=1 μ=0

pq z2

e22) n=

Page 50: Distribution of  total and sample mean

Sample size for proportion and sample mean

• An opinion poll with a level of confidence of 95% and an estimated value of p of 0.5 has a margin of error of 4%.

How many people would have taken part in the poll?

• A sample of containers of car parts has a mean weight of 40kg and a standard deviation of 5 kg. How many containers would need to have been in the sample to ensure at the 95% level of confidence that the sample was within 0.5kg of the population?

Page 51: Distribution of  total and sample mean

Confidence Interval Revision• Sample mean μ• Sample proportion p• Difference of Means μ1- μ2

• Margin of error is Half of the confidence interval• Sample size for sample mean: n= σz e• Sample size for sample proportion n= pqz2

e2

• Sample size for Difference of means when two σ and n are the same n= 2 σ2z2

e2

2

Page 52: Distribution of  total and sample mean

Meaning of confidence interval• Mean (99%) 99% of such interval include the population mean.

• Proportion (99%) 99% of such interval include the population proportion.

• Difference of means (99%) 99% of such interval include the difference of the two population mean.

• Confidence interval for difference of mean If 0 is included in the confidence interval, no difference between the

two means are suggested. If 0 is not included in the confidence interval, a difference of the two

means are suggested.

Page 53: Distribution of  total and sample mean

Confidence Interval Revision• Mean A sample of 120 wire cables is tested. The mean breaking strain

was found to be 5.4 tonnes with a standard deviation of 1.3 tonnes. Calculate a 95% confidence interval for the breaking strain for this type of wire cable.

• Proportion A sample opinion poll of 200 students is taken and 130 students

are found to support the idea of extending opening hours of the library. Calculate a 99% confidence interval for the proportion of all students in the school in favour of extending the library hours.

• Difference between two means A sample of 150 Longlife batteries showed a mean capability of

140 photos and a standard deviation of 12 photos. A sample of 200 Lastshot batteries showed a mean capability of 120 photos and a std devation of 8 photos. Find 95% confidence interval for the difference in the mean life time between the two brands of batteries.

Page 54: Distribution of  total and sample mean

Sample size (use solver)Sample size (use solver)

• The owner of a camera shop knows that 65% of the customers return to his store. How large a sample would the shop owner have to take to be 95% confident that the sample proportion is within 5% of the true value?

• What size of sample should be taken from a population of packets of butter, when the standard deviation of the weights of packets is 4 g, if the mean weight is to be estimated to within 0.5 g with 95% accuracy.

Page 55: Distribution of  total and sample mean

We need to know the difference between T=X1+X2 and Y=2X

1 2

1 2

1 2

2 2

2

is the sum of two random variables,

which can take values.

E(T)=E(X ) E(X )

2

( ) ( ) ( )

2

( )

differen

2

t

T

T X X

VAR T VAR X VAR X

SD T

2

2

Y can represent the outcome

of X multiplied by 2

2

( ) (2 )

2 ( )

2

( ) (2 )

2 ( )

4

( ) 2

Y X

E Y E X

E X

VAR Y VAR X

VAR X

SD Y

ie 2 identical

Page 56: Distribution of  total and sample mean

Normal Distribution

68% of the data is within 1 standard deviation either side of the meanData is likely to be in this region95% of the data is within 2 standard deviations either side of the meanData is very likely to be in this region99% of the data is within 3 standard

deviations either side of the meanData is almost certain to be in this region

Page 57: Distribution of  total and sample mean

T is the outcome of the same variable multiplied by n.

T = nX

E(T)=nμ

VAR(T)=n2σ2

T is the sum of n independent random variables with might take Different values.

1 2 3 ........... nT X X X X

2

( )

( )

( )

E T n

VAR T n

SD T n

Page 58: Distribution of  total and sample mean

The mean petrol usage for a car is 7 litre per day. Standard deviation is 0.3 litre. The cost for petrol is $1.96 per litre. What’s the mean and SD of the cost of petrol per day?

25 people in a lift. They have a mean weight of 65kg and a SD of 7kg. Find the mean and SD of the load

The apples in the baskets have a mean weight of 1.2g each. And a SD of 0.3g each. Find the mean and SD of a basket of 20 apples.

1 kg of apple costs $1.2. A basket of apple produced from ABC factory has a mean weight of 2.5kg and a SD of 3 kg. What’s the cost of one basket of apples?