Stat questions Semester 1

1

SECTION A

Multiple Choice

Questions (MCQ)

3

SCHOOL OF ECONOMICS THE UNIVERSITY OF ADELAIDE

2015 ECON 1008: Business and Economic Statistics I

PRACTICAL QUESTIONS 2 SECTIONS:

SECTION A: Multiple choice questions (MCQ) SECTION B: Worked (short) ANSWER QUESTIONS (WAQ)

All MCQ questions and short answer questions set for practicals (tutorials) are taken from this set of PRACTICAL QUESTIONS.

Each Wednesday, students are asked to read an Announcement on the BES MyUni site letting students know the complete set of questions for the next Week’s practical

(tutorial class). In the practical (tutorial class), students will be assessed on a sample of these questions set for that Week. Students should prepare solutions to the set of MCQ and WAQ in preparation.

Students receive solutions to MCQ and WAQ by attending tutorials and doing the

questions that are tested/submitted and checked by their tutor or asking the study coach during study coach sessions.

PLEASE DO EVERY QUESTION THAT IS REQUIRED TO HAND UP FOR YOUR ASSESMENT.

_______________________________________________

SECTION A: Multiple-choice questions 1) Observations about a continuous quantitative variable,

a) can be made in only two categories. b) must be made in more than two distinct categories.

c) can assume values at all points on a scale of values, with no breaks between possible values.

d) can assume values only at specific points on a scale of values, with inevitable gaps between.

e) can take integer (whole number) values only.

2) A poltical candidate wants to ascertain her chances of winning a seat in the next coming election. There are 50,940 registered voters in this political candidates

electrorate. From a survey of 1,000 registered voters in the electrorate, 52% of these voters stated that they would vote for her. What is the population of interest? a) 52% that would vote for her. b) The 50,940 registered voters.

c) The 1,000 voters surveyed. d) The 49,940 registerd voters that were not surveyed.

e) The 48% of voters surveyed that would not vote for her.

4

3) Which of the following is most likely a population as opposed to a sample?

a) Repondents to a phone poll. b) The first five students, in a class of 50 students, to submit a project.

c) Every tenth person that arrives at a bank. d) Registered voters in a country.

e) 50 people you replied to a survey on the back of a ceral box.

4) Which one of the following represents cross-section data? a) Listings of the closing prices of different securities traded at the New York Stock

Exchange on June 30, 2010. b) Listings of the profits of a given company during each of the last 15 years.

c) Listings of average annual profits of 10 companies over the last 5 years. d) Listings of the closing prices of 5 different securities at the end of each of 5

consecutive trading days.

e) Listings of the closing prices of a particular share at the end of each of 5 successive business days.

5) One of the BES teams for their mini-project, decided to estimate the proportion of

Bachelor of Economics students at the University of Adelaide who regularly buy

their lunch. They took their sample at Hub Central one particular Monday. What is their population?

a) Students at the University of Adelaide who regularly buy their lunch. b) University of Adelaide students. c) Bachelor of Economics students at the University of Adelaide.

d) University of Adelaide students who were at Hub Central that Monday. e) Bachelor of Economics students at University of Adelaide who were at Hub

Central that Monday.

6) Suppose a sporting authority decides to randomly test 400 athletes at a sporting

event for steroid use. They separate the athletes by gender and take a random sample of 20 females and twenty males for testing. What type of sample is this?

a) Stratified b) Simple Random

c) Convenience d) Cluster e) Convenience.

5

7) A population is made up of groups that have wide variation within each group but

have little variation from group to group. Which of the following is the best type of sampling method to use to sample from this population?

a) Systematic. b) Stratified.

c) Convenience. d) Simple Random. e) Cluster.

8) A local radio show asks callers to call and give their opinion on whether live sheep exports should be banned. Which of the following is this survey method most likely

to suffer from? a) Nonresponse bias b) Response bias

c) This survey will only suffer from sampling error. d) This survey will not suffer from any bias.

e) Voluntary response bias.

9) What happens when a random sample is made larger?

a) One can eliminate bias inherent in a smaller sample.

b) One can eliminate sampling error. c) One can reduce, but not eliminate, bias inherent in a smaller sample.

d) One can reduce, but not eliminate, sampling error. e) One can reduce, but not eliminate, sampling error and bias.

10) Which of the following statements best describes the difference between a

sampling frame, the target sample and the actual sample? a) The sampling frame is the list from which the sample is drawn; the target

sample is all the individuals who are asked to participate in the survey; and the actual sample is the respondents.

b) The sampling frame is the list from which the sample is drawn; the target sample is all the individuals who may not participate in the survey; and the

actual sample is the respondents.

c) The sampling frame is the list from which the sample is drawn; the target sample may be smaller than the actual sample.

d) The sampling frame is the list from which the sample is drawn; the target sample is the population; and the actual sample is the respondents.

e) The sampling frame is the population; the target sample is all the individuals

who are asked to participate in the survey; and the actual sample is the

respondents.

6

For Questions 11) to 14) use the following table, that shows people's ages

cross classified against their attitude to proposed new drink driving laws:

11) What proportion of people who agree with the new laws happen to be under 20?

a) 10/250 b) 10/250

c) 10/70 d) 10/50

e) None of these choices is correct.

12) (Refer to the previous table) Of the people who are under 20, what proportion disagree with the new laws?

a) 10/250 b) 20/250

c) 20/70 d) 20/50 e) 50/70

13) (Refer to the previous table) What % of people are under 20 and disagree with the new laws? a) 10/250

b) 20/250 c) 20/70

d) 20/50 e) 50/70

14) (Refer to the previous table) Which of the following best describes the answer 30/100 (Hint: the value 30 comes from the cell in the middle of the table and the

value 100 is a total of one of the rows or columns) a) 30% of people are between 20 to 40

b) 30% of people who disagree with the new laws are between 20 to 40 c) 30% of people disagree with the new laws d) 30% of people who are between 20 to 40 disagree with the new laws

e) 30% of people are 20 to 40 year olds and disagree with the new laws.

Agree Disagree Don't care

Under 20 10 20 20

20 to 40 20 30 50

Over 40 40 20 40

7

15) Combining percentages inappropriately across categories in a contingency table

can yield incorrect conclusions. This is known as a) Bias

b) Frequency c) Non-sampling error

d) Simpson’s Paradox e) Sampling error

16) If a test was generally very easy, except for a few students who had very low scores, then the distribution of scores would be:

a) Normal b) Uniform c) Symmetric

d) Positively skewed e) Negatively skewed

17) Which of the following is true when a frequency distribution exhibits positive

skewness?

a) The median exceeds the mean. b) The mean exceeds the median.

c) The median and mode are both greater than the mean. d) The variance exceeds the standard deviation. e) The standard deviation exceeds the range.

18) In a group of 12 scores, if one of the scores is increased by 36 points. What effect

will this have on the mean of the scores? a) Increase by 12 points. b) Increase by 0.33 points

c) Increase by 3 points. d) Increase by 36 points

e) Remain unchanged.

19) Which of the following may easily be found near the either extreme of a data set?

a) The mean. b) The median.

c) The mode. d) The mean and the mode. e) All of the mean, median and mode.

8

20) The difference between a histogram and a bar chart is that:

a) The histogram reflects qualitative data; the bar chart represents quantitative data.

b) The adjacent rectangles in a histogram have a gap between them; those in a bar chart do not.

c) The histogram reflects actual numbers; the bar chart represents percentages. d) Adjacent rectangles in a bar chart have a gap between them; those in a

histogram do not.

e) There is no practical difference apart from the name.

21) Do men and women run a 5 kilometre race at the same pace? Here are box plots

of the time (in minutes) for a recent race. Which of the following best describes the

box plots?

a) Women appear to run about 3 minutes faster than men, and the two distributions have different IQR.

b) Men appear to run about 3 minutes faster than women, but the two distributions are very similar in shape and spread.

c) Men appear to run about 10 minutes faster than women, but the two distributions are very similar in shape and spread.

d) Women appear to run about 3 minutes faster than men, but the two

distributions are very similar in shape and spread. e) Men appear to run about 3 minutes faster than women, and the two

distributions have different IQR.

22) Anthony’s Pizza offers free delivery of their pizza. The following summary

information concerns the time of deliveries: mean time is 20 minutes, median time is 18 minutes, the first quartile is 12 minutes, and the third quartile is 25 minutes.

What percent of the deliveries take more than 12 minutes? a) 50 percent b) 75 percent

c) 25 percent d) 95 percent

e) Cannot tell without knowing if the normal model is applicable.

9

23) Which of the following statements is true regarding the standard deviation?

a) It cannot assume a negative value. b) If it is zero, then all the data values are the same.

c) It is in the same units as the mean. d) All the above are all correct.

e) None of the above is correct.

24) Find the variance of the sample 4, 4, 5, 6, 6, 7, 10

a) 1.93 b) 4.33 c) 2.38

d) 3.71 e) 6.00

25) A standard score is best described by which of the following statements:

a) The numbers of standard deviations between a particular observation and the

mean of all observations in a data set. b) The difference between an observed value and the standard deviation, divided

by the mean. c) How many means away from the standard deviation a particular observation is

located.

d) The value that would be expected if a randomly chosen observation was selected.

e) The value that would be expected in the long run.

26) Student Alex scored 66% in a macroeconomics exam in which the class average was 52% and the variance 36%2 whilst student Chris scored 52% in a

microeconomics exam in which the class average was 73% and the variance 64%2. Find the standardised scores for each and decide which student’s score was

relatively more unusual. a) Chris’s mark was relatively more unusual than Alex’s; the standardised scores

were 2.33 for Alex and −2.63 for Chris.

b) Chris’s mark was relatively more unusual than Alex’s; the standardised scores were 2.33 for Alex and 2.63 for Chris.

c) Alex’s mark was relatively more unusual than Chris’s; the standardised scores were 2.33 for Alex and −2.63 for Chris.

d) Alex’s mark was relatively more unusual than Chris’s; the standardised scores

were 0.39 for Alex and 0.33 for Chris. e) Alex’s mark was relatively more unusual than Chris’s; the standardised scores

were 0.39 for Alex and −0.33 for Chris.

10

27) You want to compare your scores for two different subjects. You scored 68% for

Macroeconomics, where the class mean was 63% and the variance was 4 %2. You

scored 70% in Microeconomics, where the class average was 60% and the

variance was 25%2. Which of the following correctly describes a comparison of

your final scores in these two subjects?

a) You did better in Macroeconomics with a Z-score of 1.5 compared to a Z-score

of 0.40 with Microeconomics. b) You did better in Macroeconomics with a Z-score of 2.5 compared to a Z-score

of 2 with Microeconomics. c) You did better in Microeconomics with a Z-score of 0.40 compared to a Z-score

of 1.25 with Macroeconomics.

d) You did better in Microeconomics with a Z-score of 70% than in Macroeconomics with a Z-score of 63%.

e) You did better in Microeconomics with a Z-score of 2 compared to a Z-score of 2.5 with Macroeconomics.

28) A science instructor assigns a group of students to investigate the relationship

between the pH of the water of a river and the hardness of the water (measured in grains). Some students wrote these conclusions: "there was a very strong

correlation of 1.45 grains between pH of the water and water's hardness." Is it appropriate to calculate the correlation coefficient in this example? a) No: the correlation coefficient is unit-free.

b) No: correlation cannot be greater than 1. c) No: the relationship may not be linear.

d) All of the above. e) None of the above.

29) Which the correlation coefficient r = 1.00 then which of the following

statements is correct? a) All the data points must fall exactly on a straight line with a slope that equals

1.00 b) All the data points must fall exactly on a straight line with a negative slope c) All the data points must fall exactly on a straight line with a positive slope

d) All the data points must fall exactly on a horizontal straight line. e) All the data points must be identical; there is really only one point.

30) In a scatter diagram, observed data points that lie above the estimated regression line: a) Involve positive residuals

b) Involve negative residuals c) Must be wrong because regression minimises errors

d) Must be outliers e) Involve squared errors because regression minimises squared errors

11

31) Several scatterplots are below, numbered (1) to (4).

Several correlation coefficients are below, labelled A to D. Match the correlation coefficients to the scatterplots.

A = −0.962 B = −0.091 C = 0.719 D = 0.921

(1) (2)

(3) (4)

a) 1A, 2C, 3D, 4B

b) 1C, 2D, 3B, 4A

c) 1B, 2D, 3C, 4D

d) 1A, 2C, 3B, 4D

e) 1D, 2B, 3A, 4C

12

32) Which of the following plots of residuals suggests that a linear model may not be applicable?

I II

III IV

a) IV b) III c) I

d) II e) None of these choices is correct.

USE THE FOLLOWING EXCEL OUTPUT TO ANSWER THE NEXT 5 QUESTIONS

EXCEL output below is the regression of exam marks (%) on attendance (number

of tutorials attended during the semester) for a sample of BES students. SUMMARY OUTPUT

Regression Statistics

Multiple R 0.811

R Square 0.658

Adjusted R Square

0.645

Standard Error 13.983

Observations 27

ANOVA

df SS MS F Significance F

Regression 1 9413.176 9413.176 48.142 0.000

Residual 25 4888.231 195.529

Total 26 14301.407

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 19.914 5.620 3.543 0.002 8.339 31.489

attendance 3.884 0.560 6.938 0.000 2.731 5.037

Vivian Piovesan Cavaiuolo 2015

33) Using the Excel output provided, which of the following is the most correct regression equation (model)?

a) Estimated Attendance = 19.914 + 3.884 Exam marks b) Estimated Exam marks = 19.914 + 3.884 Attendance c) Attendance = 19.914 + 3.884 Exam marks

d) Exam marks = 19.914 + 3.884 Attendance e) None of these choices is correct.

34) According to this model, which of the following best describes the estimated effect

on the exam mark if students attend one extra BES tutorial during the semester? a) All students get an extra 19.914 points.

b) All students get an extra 3.884 points. c) Students get an extra 3.884 points, on average.

d) Students get an extra 19.914 points, on average. e) Students get an extra 23.754 points.

35) One student who attended 6 tutorials during the semester, scored 88 points in the

final exam. What final mark did the regression model predict for this student? a) 23.3 points.

b) 43.2 points. c) 88 points. d) 29.8 points.

e) 23.8 points.

36) What percentage of the variation in exam marks is accounted for by this linear

relationship with tutorial attendance?

a) 81.1% b) 65.8%

c) 64.5% d) 19.9%

e) 13.9%

37) Using this regression output of exam marks on tutorial attendance; find the residual of a student who attended 4 tutorials during the semester, given that they

actually scored 50 points in the exam. a) 50 – 19.914 = 30.09 points.

b) 50 – 3.884(4) = 34.46 points. c) 3.884(4) = 15.54 points. d) 19.914 + 3.884(4) = 35.45 points.

e) 50 – [19.914 + 3.884(4)] = 14.55 points.


38) A random spinner can land on red, green, blue, or yellow. If on the first three spins it lands once each on red, green, and yellow, is it more likely to land on blue

on the fourth spin? a) No, because the spins are disjoint events. b) No, because knowing one outcome will not affect the next.

c) Yes, because the Law of Avrages dictates that all outcomes should occur. d) Yes, because every colour is equally likely to occur.

e) Yes, because the spinner shows randomness.

39) Suppose you roll 2 dice; which of the following are disjoint (mutually exclusive)

events? a) Getting a sum of 6; getting doubles.

b) Getting a sum of 2; getting doubles. c) Getting a 1 on the first die; getting a sum of 6.

d) Getting a 5 on the first die; getting doubles. e) Getting a sum of 7; getting doubles.

40) If A and B are 2 events such that P(A) = 0.6, P(B) = 0.58 and P(A or B) = 0.70, what is P(A and B)?

a) 0.48 b) 1.18 c) 0.70

d) 0.30 e) 0.35

41) If A and B are two independent events such that P(A) = 0.3, P(B) = 0.4 what is P(A given B)?

a) 0.40 b) 0.30

c) 0.12 d) 0.01 e) 0 because events are independent

42) If A and B are two events such that P(A) = 0.35 and P(B) = 0.45 and

P(A and B) = 0.20, what is P(B given A)? a) 0.80 b) 0.60

c) 0.20 d) 0.57



43) Which of the following statements is correct?

a) Independent events cannot be disjoint events. b) Marginal events are conditional events.

c) Joint events are disjoint. d) Disjoint events cannot be independent events. e) Marginal events are joint events.

44) Sue owns a medium-sized business. The probability model below describes the

number of employees that may call in sick on any given day. What is the expected value of the number of employees calling in sick each day?

a) 2.50 employees b) 2.00 employees

c) 1.80 employees d) 1.00 employees

e) 1.85 employees

45) Find the expected value for the random variable X which has probability model:

a) 0.80 units b) 0.20 units

c) 7.67 units d) 5.20 units

e) 7.40 units 46) A new lottery ticket on the market has players pay $5 to have the chance to win

up to $1,000 instantly in cash prizes. Let X be the discrete random variable of cash prizes available. The table below shows the discrete probability distribution of the

possible cash prizes available to players, with all of the corresponding probabilities. Find the standard deviation of the cash prize per game, for this $5 instant lottery

ticket, given that the expected value is $3.24.

a) 1,005.65 dollars2

b) 3.24 dollars c) 31. 71 dollars

d) 1,016.15 dollars2 e) 10.50 dollars2

X P(X)

4 0.3

8 0.5

11 ?

X P(X)

0 0.65

5 0.25

10 0.099

1,000 0.001


47) A discrete random variable X takes the values −2, −1, 1 and 2 but we are not

given the corresponding probability distribution. Which of the following statements is true?

a) The expected value could be 0.75 b) The expected value must be one of −2, −1, 1 or 2 c) The expected value could be 3

d) The expected value can only be 0 e) The expected value can only be 1.5

48) If X and Y are independent random variables with E[X] = 10, E[Y] = 12, V[X] = 9 and V[Y] = 16, then find the expected value of X – Y.

a) 2

b) −2 c) 22 d) −7


49) If X and Y are independent random variables with E[X] = 10, E[Y] = 12, V[X] = 9 and V[Y] = 16, then find the standard deviation of 2X – Y + 4.

a) √68 b) √52

c) √36 d) √20 e) √12

50) A traveller visited Europe and stays 30 days in 30 different hotels, paying each day with a credit card. The hotels charge an average of 50 Euros with standard deviation of 10 Euros. When the charges appear on the credit card, the Bank has

converted them to AUD by saying that 1 euro is $1.40, and has added a $5 fee for each transaction. What are the mean and standard deviation of the 30 day hotel

charges in AUD including the transactions fee? a) Mean $50, standard deviation $19

b) Mean $70, standard deviation $14 c) Mean $70, standard deviation $19

d) Mean $75, standard deviation $14 e) Mean $75, standard deviation $19


51) In a standard Normal model, state what values of z cuts off the middle 96%?

a) −3.00 to 3.00 b) 0 to 2.05

c) −1.75 to 1.75 d) −2.33 to 2.33 e) −2.05 to 2.05

52) Given that a normally distributed random variable has a mean of 85 and a

standard deviation of 8, what is the probability that one randomly selected value

will be less than 63? a) 0.9970

b) 1 c) 0

d) 0.0030 e) −2.75

53) The mean of a normal probability distribution is 50 with a variance of 36. What is the probability that one randomly selected value will be greater than 60?

a) 0.9525 b) 0.3897 c) 0.0475

d) 0.6103 e) 1.67

54) The mean of a normal probability distribution is 87 and the standard deviation is 4. What is the probability of a value between 82 and 90?

a) 0.7734 b) 0.1056

c) 0.8944 d) 0.2266 e) 0.6678

55) Find σ in a Normal model with μ = 0.38 if 20.05% of values are above 0.50 a) 0.20

b) 0.1 c) 0.14 d) 0.84

e) 1.43


56) The proportion of people who agree with building a hotel at Glenelg is 0.40. A

random sample of size 100 is drawn. What is the standard deviation of the sampling distribution of the sample proportion?

a) 0.002 b) 0.049 c) 0.240

d) 0.005 e) 0.490

57) It is known that 60% of voters in a large electorate are in favour of a particular policy. A random sample of 48 voters is taken from the electorate. What is the

probability that less than 24 of those sampled are in favour of the policy? a) 0.079

b) 0 c) 0.421 d) 0.579

e) 0.921

58) Which of the following best describes what we mean by the sampling distribution

of the sample mean?

a) The sampling distribution of the sample mean will be approximately normal even if the original population is not normal.

b) The sampling distribution of the sample mean is the population from which the sample is drawn.

c) The sampling distribution of the sample mean shows us the distribution of

possible values that the sample mean could take, generated by calculating the sample mean for repeated samples, all of the same size, from the original

population. d) The sampling distribution of the sample mean will be approximately normal if

we take many samples. e) None of these choices is correct.

59) A random sample of size 3 is drawn from a population with mean 7 and variance 9. What is the standard deviation of the sampling distribution of the sample mean?

a) 9 b) 1.73 c) 3.0

d) 5.2 e) 1.0


60) A random sample of n=10 is drawn from a normal population with mean 75 &

variance 90: find the probability that the sample mean of this sample exceeds 81 a) 0.9772

b) 0.9721 c) 0.7643 d) 0.0228

e) 2

61) In one of the mini-projects, a team calculated a 95% confidence interval in order

to estimate the proportion of people in Adelaide who own Apple branded phones. Assume all required conditions are met. What is the confidence interval if their

sample shows that 28 out of 55 people in Adelaide own Apple phones?

a) 0.5091 ± 1.645(0.0045) = (0.5017, 0.5165) b) 0.5091 ± 1.96(0.0674) = (0.377, 0.6412) c) 0.5091 ± 2.33(0.0674) = (0.3521, 0.6661)

d) 0.5091 ± 1.645(0.0674) = (0.3982, 0.62) e) 0.5091 ± 1.96(0.0045) = (0.5003, 0.5179)

62) In one of the mini-projects, a BES team wanted to estimate the proportion of Business students who think that knowing a foreign language is useful in today’s workforce. What sized sample should they have taken if they wanted a 95% CI to

have a margin of error of 4%? a) 25

b) 2401 c) 151 d) 423

e) 601

63) Data collected by child development scientists produced the following 90%

confidence interval for the average age (in months) at which children say their first word: 10.4 < μ(age) < 13.8.

Interpret the confidence interval: a) If we took many random samples of children, about 90% of them would

produce this confidence interval. b) We can say with 90% confidence that the mean age at which children say their

first word is between 10.4 and 13.8 months. c) We are 90% confident that a child will say his first word when he is between

10.4 and 13.8 months old.

d) 90% of the children in this sample said their first word when they were between 10.4 and 13.8 months old.

e) We are 90% confident that the average age at which children in this sample said their first word was between 10.4 and 13.8 months.


64) When constructing confidence intervals, for a given level of confidence, if the

sample size is decreased, a) the interval will include the parameter less often.

b) the width of the interval remains the same. c) the width of the interval increases. d) the width of the interval decreases.


65) Suppose we use z tables instead of t-tables in estimating a CI for the mean, when

the t-table was appropriate. Which of the following is true? Assume everything else is unchanged. a) Since the t-distribution involves more variability than z, the CI using the z tables

will be wider than it should be b) Since the t-distribution involves less variability than z, the CI using the z tables

will be wider than it should be c) Since the t-distribution involves more variability than z, the CI using the z tables

will be narrower than it should be d) Since the t-distribution involves less variability than z, the CI using the z tables

will be narrower than it should be e) The answer depends on the level of confidence

66) Calculate the width of a 90% CI for μ if a sample of 16 gave a mean of 25 and variance 36. Assume the conditions are met.

a) 1.753 × (6/4) = 2.6295 b) 2 × [1.753 × (36/4)] = 31.554

c) 1.645 × (6/4) = 2.4675 d) 2 × [1.645 ×(6/4)] = 4.935 e) 2 × [1.753 × (6/4)] = 5.259

67) Suppose we wanted to estimate the average amount that students spend on lunch each week of the semester. A random sample of 25 students gave an average of

$37.50 with standard deviation $3.20. What are the correct calculations for an 80% CI? (Assume all conditions are met). a) 37.50 ± 1.282(3.20/√25) = (36.68, 38.32) in dollars

b) 37.50 ± 1.318(3.20/√25) = (36.66, 38.34) in dollars c) 37.50 ± 1.282(3.20/25) = (37.34, 37.66) in dollars

d) 37.50 ± 1.318(3.20/25) = (37.33, 37.67) in dollars e) None of these choices is correct.

68) Which set of circumstances is most likely to result in a narrow confidence interval?

a) Large n and a 95% confidence interval b) Large n and a 99% confidence interval

c) Small n and a 95% confidence interval d) Small n and a 99% confidence interval

e) Any n and a 99.5% confidence interval


69) Which of the following is true about the null hypothesis, symbolized by Hₒ?

a) It represents a proposition about an unknown population parameter that is tentatively assumed to be true.

b) It represents a proposition about an unknown statistic that is tentatively assumed to be true.

c) It never represents a proposition about an unknown population parameter that is tentatively assumed to be true.

d) It is the hypothesis that is declared null and void at the completion of a hypothesis test.

e) It is the hypothesis that is of no interest.

70) A company is testing the proportion of defective parts in a manufacturing process. If there are more than 5% of parts that are defective, the machinery must be stopped and serviced before production can resume. Which of the following is the

correct set of hypotheses that would be used to test whether the machinery needs to be serviced? Hint: p = population proportion of defective parts

(a) HO: p= 0.05 vs HA: p ≠ 0.05 (b) HO: p= 0.05 vs HA: p < 0.05 (c) HO: p > 0.05 vs HA: p = 0.05

(d) HO: p < 0.05 vs HA: p > 0.05 (e) HO: p= 0.05 vs HA: p > 0.05

71) A company is testing the proportion of defective parts in a manufacturing process.

If there are more than 5% of parts that are defective, the machinery must be stopped and serviced, before production can resume. Suppose they take a random sample of 200 items from a large batch of parts (assume there were thousands of

parts in the batch) and of these 20 were defective. Which of the following is the correct description of how the conditions are checked?

(a) Sample is random Need at least 2,000 parts in the population of parts being manufactured; told

thousands of parts in a batch Number of successes = np = 200(0.10) = 20 > 10 Number of failures = nq = 200(0.90) = 180 > 10

(b) Random sample

Less than 10% of population Number of successes > 10

Number of failures > 10 (c) This is a large sample so the conditions are satisfied.

(d) Sample is random

Need at least 2,000 parts in the population of parts being manufactured; told thousands of parts in a batch Number of successes = np = 200(0.05) = 10 ≥ 10

Number of failures = nq = 200(0.95) = 190 > 10

(e) None of these choices is correct.


Consider the following to answer the next 4 questions:

A report on the U.S. economy indicates that 28% of Americans have experienced difficulty in making mortgage payments. A news organization randomly sampled

400 Americans from 10 cities named the “fastest dying cities in the U.S.” (Forbes Magazine, August 2008) and found that 136 reported such difficulty. Does this

indicate that the problem is more severe among these cities? Hint: p = population proportion of Americans who have experienced difficulty making mortgage repayments.

72) What are the correct null and alternative hypotheses?

a) Ho: p ≠ 0.28 and HA: p = 0.28 b) Ho: p = 0.28 and HA: p < 0.28 c) Ho: p = 0.28 and HA: p ≠ 0.28

d) Ho: p > 0.28 and HA: p = 0.28 e) Ho: p = 0.28 and HA: p > 0.28

73) What is the correct value of the test statistic? a) z = 119.05 b) z = 2.53

c) z = 2.67 d) z = −2.67

e) z = −2.53

74) What is the correct P-value associated with the above test statistic? a) 0.9962 b) 0.9943

c) 0 d) 0.0057

e) 0.0038

75) At α = 0.05, what conclusion can we draw from the above test?

a) We can conclude that the percentage of Americans in these cities experiencing difficulty making mortgage payments is significantly higher than 28%.

b) We can conclude that the percentage of Americans in these cities experiencing

difficulty making mortgage payments is significantly lower than 28%.

c) We can conclude that the percentage of Americans in these cities experiencing

difficulty making mortgage payments is not significantly different from 28%.

d) We can conclude that the percentage of Americans in these cities experiencing difficulty making mortgage payments is approximately equal to 28%.

e) We can conclude that the percentage of Americans in these cities experiencing

difficulty making mortgage payments is exactly equal to 28%.


76) A study is interested in whether or not adults sleep a recommended 8 hours per night, on average. To investigate this, a random sample of 30 adults is taken, and

the sample mean is found to be 7.5 hours with standard deviation of 0.90 hours. Assuming that the conditions have been satisfied, which of the following is the correct critical value, if testing at the 5% level of significance?

a) −1.96 b) +/− 2.045

c) −2.045 d) +/ −1.96 e) +/− 1.699

77) A canning company is concerned about the real average amount of sweet corn it is putting into its cans. The machine is set to 450 grams. A sample of 100 cans is

taken randomly and their average weight is 448 grams and the standard deviation is 1.9 grams. The canner is testing the null hypothesis that the average weight of all cans is 450 grams. Assume the conditions are satisfied. Which of the following

is the correct value of the calculated test statistic? a) −10.526

b) −1.96 c) −105.263

d) −1.053 e) Cannot be determined because we do not know if it is a 1 or 2 tail test.

78) Suppose a random sample of size 49 is selected from a population with mean μ,

the value of which is unknown. The sample statistics are y =6.4 and s = 14. The hypothesis test is Ho: μ=10 against Ha: μ <10 using = 0.05.

Then which of the following is the correct decision? a) The calculated value of the test statistic is −1.8 and Ho is retained. b) The calculated value of the test statistic is −1.8 and Ho is rejected.

c) A type I error must have been committed. d) The calculated value of the test statistic is −0.257 and Ho is retained.

e) The calculated value of the test statistic is −0.257 and Ho is rejected.

79) Condemning an innocent defendant in a criminal trial is equivalent to committing what sort of decision in classical hypothesis testing?

a) A type I error. b) A type II error.

c) A correct decision. d) Sampling error. e) None of these choices is correct.


80) When using the chi-square distribution for tests of independence, we calculate the

expected frequencies by multiplying the row total by the column total and then dividing by the grand total. Why is this the correct way of determining expected

frequencies? a) Because the row and column totals must equal the grand total. b) Because the expected frequencies must be in same proportion as row totals

c) Because this follows from the rules of probability if the null hypothesis (independence) is true.

d) Because the expected frequencies must be in the same proportion as the column totals.

e) Because the joint probability always equals the product of the 2 marginals.

81) In a chi square test of independence at a 5% level, with a table consisting of 5

row and 3 columns, the correct critical value would be: a) 23.685

b) 18.307 c) 17.535 d) 15.507

e) 3.841

The next 4 questions relate to the following EXCEL output. The explanatory variable is INCOME (personal disposable income in billions of $).

The response variable is IMPORTS (expenditure on imports in billions of $).

82) The estimated slope was 0.246. Which of the following is correct:

a) When testing the significance of the slope, we should use an upper tail test

because the estimated slope is positive. b) When testing the significance of the slope, we should use an upper tail test

because economic theory suggests that imports increase as income increases. c) When testing the significance of the slope, we should use a lower tail test

because the estimated slope is positive.

d) When testing the significance of the slope, we should use a 2-tail test unless someone asks us to do otherwise.


SUMMARY OUTPUT Regression Statistics Multiple R 0.967 R Square 0.936 Adjusted R Square 0.932 Standard Error 22.305 Observations 20

ANOVA df SS MS F Significance F

Regression 1 130904.793 130904.793 263.121 3.46E-12 Residual 18 8955.157 497.509 Total 19 139859.95

Coefficients

Standard Error t Stat P-value Lower 95% Upper 95% Intercept -260.771 32.067 -8.132 0.000 -328.14 -193.401 Income 0.246 0.015 16.221 0.000 0.214 0.277


83) This question relates to the EXCEL output of Income/Imports.

When testing whether the slope is significantly above 0, the appropriate conclusion would be to:

a) Read off the P-value from the output and reject Ho because the P-value is low. b) Read off the P-value from the output and retain Ho because the P-value is low. c) Correct p-value to use here is (Excel P-value)/2 from the output, and retain Ho

because this is low. d) Correct p-value to use here is (Excel P-value)/2 from the output, and reject Ho

because this is low. e) None of these choices is correct.

84) This question relates to the EXCEL output of Income/Imports. Suppose we wish to test whether the slope is significantly different from 0.25.

Which of the following is the correct value of the calculated t value for the t test? a) 0

b) −0.004 c) −0.267 d) 16.221

e) −8.132

85) This question relates to the EXCEL output of Income/Imports. Suppose we wish to test whether the slope is significantly different from 0.25.

Which of the following is the correct value of the critical t value for the t test at the 5% level? a) +/− 1.960

b) +/− 2.101 c) +/− 1.734

d) +/− 2.093 e) +/− 1.729

86) Look at point A in this scatter plot and decide whether the point has high or low leverage, whether or not is it influential and whether it’s residual is likely to be high

or low?

a) High leverage, not influential, large residual

b) High leverage, not influential, low residual. c) High leverage, influential, low residual would. d) Low leverage, not influential, low residual.

e) Low leverage, influential, high residual.

A


87) Look at point B in this scatter plot and decide whether the point has high or low

leverage, whether or not it is influential and whether it’s residual is likely to be high or low?

a) High leverage, not very influential, low residual. b) High leverage, influential, hard to say what the residual would be.

c) High leverage, not very influential, large residual d) High leverage, influential, low residual.

e) Low leverage, not very influential, possibly low residual.

88) Monthly closing share prices for a utility company were obtained from January 2007 through August 2008. A regression model was estimated to describe the

trend in closing share prices over time. What does the plot of residuals below suggest?

52504846444240

5.0

2.5

0.0

-2.5

-5.0

-7.5

Fitted Value

Re

sid

ua

l

Versus Fits(response is Closing Price)

a) An outlier is present in the data set.

b) The linearity condition is not satisfied. c) A high leverage point is present in the data set. d) The data are not normal.

e) The independence condition is not satisfied.

B


Use the output below to answer the NEXT SIX QUESTIONS.

The following EXCEL output is for a regression of Y on X1, X2 and X3. Here β0 refers to the intercept and β1 is the coefficient (or slope) on X1

β2 is the coefficient (or slope) on X2 and β3 is the coefficient (or slope) on X3.

89) This question relates to the Excel output of the regression of Y on X1, X2 and X3.

As X2 increases by 1 unit, after allowing for the effect of X1 and X3, what do we

estimate happens to Y, on average?

a) Increases by 15433.76

b) Increases by 28.92

c) Increases by 172.22

d) Decreases by 5.39

e) Increases by 15629.51

90) This question relates to the Excel output of Y on X1, X2 and X3.

What is the correct value of the test statistic to test the significance of X2?

a) 1.45

b) 2.58

c) −2.92

d) 2.33

e) 0.16


What is the correct null hypothesis to test the overall usefulness of this model?

a) H0: β1 = β2 = β3 = 1

b) H0: β1 = β2 = β3

c) H0: β0 = β1 = β2 = β3 = 1

d) H0: β1 = β2 = β3 = 0

e) H0: β0 = β1 = β2 = β3 = 0

SUMMARY OUTPUT


Multiple R 0.60

R Square 0.35

Adjusted R Square 0.28


Observations 30

ANOVA


Regression 3 3125697695 1041899232 4.76 0.01

Residual 26 5691753717 218913604

Total 29 8817451411


Intercept 15433.76 6610.97 2.33 0.03 1844.72 29022.80

X1 28.92 11.20 2.58 0.02 5.90 51.94

X2 172.22 119.00 1.45 0.16 -72.38 416.82

X3 -5.39 1.85 -2.92 0.01 -9.19 -1.59



What is the correct p-value to test the overall usefulness of this model?

a) 0.005

b) 0.01

c) 0.02

d) 0.03

e) 0.16


What number of degrees of freedom do we use for a t-test of significance of the population intercept or to individually test the coefficients of any of the X variables? a) 29

b) 2 c) 3

d) 30 e) 26

94) What is the correct conclusion from testing the overall usefulness of the model AND from tests on the individual coefficients? (Use the 5% level of significance)

a) The equation as a whole is significant but none of the individual variables are

significant.

b) The equation as a whole is not significant and neither are any of the individual

variables.

c) The equation as a whole is not significant although X2 is significant.

d) The equation as a whole is significant and so are X1 and X3.

e) There has been an error because it is not possible for the individual coefficients

to have different significance from the overall equation.

95) When is a dummy variable used as an explanatory variable in a regression model?

a) When two independent variables interact. b) When the variable involved is quantitative.

c) When a non-linear relationship is suspected. d) When data for the variable of interest is not available and the researcher uses a

made up proxy variable. e) When the variable involved is qualitative.


96) In the equation: Estimated Salary = 100,000 + 500Year + 1,000Gender

Salary is the annual salary of a person

Year is the number of years of experience in the job Gender is a dummy with gender = 1 for males and 0 for females,

What is the correct interpretation of the coefficient on Gender?

a) Males earn $1,000 more than females. b) Females earn $1,000 more than males. c) We estimate, males earn $1,000 more, on average, than females with the same

number of years of experience in the job. d) A male earns $1,000 more than a female with the same number of years of

experience in the job. e) On average, females earn $1,000 more than males with the same number of

years of experience in the job.

97) The following linear regression equation estimates the relationship between the

selling price ($) of a particular model of sport’s car based on the age of the car (years) and the colour of the car, where Colour = 1 for a red sport’s car and

Colour = 0 for other colours of the sports car. Estimated Selling Price = 250,000 – 10,000Year + 2,500Colour

Which of the following best describes the coefficient of the dummy variable,

Colour?

(a) We estimate, that a red sport’s car of this particular model, will sell for $2,500 more than for any other colour.

(b) While holding age constant, we estimate, that a red sport’s car of this particular model, will sell for $2,500 less, on average, than for any other colour.

(c) We estimate, that a red sport’s car of this particular model, will sell for $2,500 more, on average, than for any other colour.

(d) While holding age constant, we estimate, that a red sport’s car of this particular

model, will sell for $2,500 more, on average, than for any other colour.

(e) None of these choices is correct.


98) The following linear regression equation estimates the relationship between the

selling price ($) of a particular model of sport’s car based on the age of the car (years) where Colour = 1 for a red sport’s car and Colour = 0 for other colours of

the sports car. Estimated Selling Price = 250,000 – 10,000Year + 2,500Colour

Which of the following best describes the estimate of a black sport’s car of this particular model, that is 10 year’s old?

(a) $152,500 (b) $150,000 (c) $240,000

(d) $100,000 (e) $242,500

99) Here is the result from EXCEL of a regression of the score given by consumers to gourmet pizzas (score) on the fat content of the pizza (fat) and the type of pizza (Type) where Type = 1 for a cheese pizza and 0 for a pepperoni pizza.

Score = −148.817 + 15.634 Type − 3.89 Fat

The p-value for the coefficient on Type is 0.0651 We want to test the significance of the coefficient on Type using a 2 tailed test at α of 5%.

Which of the following is correct? a) This can be done using p-values and Type is not significant at 5%. b) This cannot be done from EXCEL because Type is a dummy variable.

c) This can be done using p-values and Type is significant at 5%. d) This cannot be done using p-values because Type is a dummy variable.

e) The coefficient on Type will always be insignificant because Type is a dummy variable, irrespective of the size of the p-value.

100) Which of the following best defines a p-value? a) The p-value is the probability of getting our population value (or population

results) or more extreme if the null hypothesis was really true, and as such is a measure of the statistical evidence in favour of the alternative hypothesis.

b) The p-value is the probability of getting our sample value (or sample results) or

more extreme if the null hypothesis was really false, and as such is a measure

of the statistical evidence in favour of the alternative hypothesis.

c) The p-value is the probability of getting our sample value (or sample results) or more extreme if the null hypothesis was really true, and as such is a measure of

the statistical evidence in favour of the alternative hypothesis.

d) The p-value is the probability of getting our population value (or population

results) or more extreme if the null hypothesis were really false, and as such is a measure of the statistical evidence in favour of the alternative hypothesis.

e) The p-value is always the level of significance in a hypothesis test.


101) This table relates to the number of complaints received in a 6 month period:

If a 3 period moving average is used to smooth this series, what is the value of the

last term calculated? a) 54

b) 114 c) 144

d) 90 e) None of these choices is correct.

102) Here is a time series regression for quarterly data:

Y=10 + 2Q1 + 3Q2 − 4Q3 + 2t

with t in quarters with origin March 2013, where Q1, Q2 and Q3 are dummy variables for the first, second and third quarters respectively.

What can be said about Q4?

a) The coefficient on Q4 must be −1 so that the seasonal effects add to 0. b) Q4 is not included in the equation but it should have been included if we want

to estimate fully the seasonal effect.

c) The coefficient on Q4 must be 3 so that the seasonal effects add to 4. d) There is no dummy variable for Q4 because Q4 is the benchmark variable, to

which the coefficients of the other dummy variables are compared. e) There is no seasonal effect in a fourth quarter.

103) Here is a time series regression for quarterly data:

Y=10 + 2Q1 + 3Q2 − 4Q3 + 2t

with t in quarters with origin March 2007 where Q1, Q2 and Q3 are dummy variables for the first, second and third quarters respectively. Given that by

convention March is quarter 1; June is quarter 2; September is quarter 3; and December is quarter 4, what is the forecast for the June quarter of 2010? a) 37

b) 14 c) 39

d) 41 e) None of these choices is correct.

Month Complaints

January 36

February 45

March 81

April 90

May 108

June 144


104) Data have been collected for the price and quantity for the following basket of

goods for 2005 and 2014.

Which of the following is the correct interpretation of the Laspeyres price index for

2014, using 2005 as the base period? a) LP = 177.78, so there has been a 77.78% increase in the price of this basket of

goods from 2005 to 2014.

b) LP = 179.49, so there has been a 79.49% price increase in this basket of goods from 2005 to 2014.

c) LP = 167.44, so there has been a 67.44% price increase in this basket of goods from 2005 to 2014.

d) LP = 162.79, so there has been a 62.79% price increase in this basket of goods from 2005 to 2014.

e) LP = 184.62, so there has been a 84.62% price increase in this basket of goods

from 2005 to 2014.

105) Data have been collected for the price and quantity of the following 3 items in

2005, 2010 and 2014, where the products were all home brands of a major supermarket chain.

Which of the following is the correct interpretation of the Paasche price index for 2014 using 2005 as the base?

(a) PP = 335.398, so according to the Paasche price index, prices of this basket of goods have increased by 335.398% from 2005 to 2014.

(b) PP = 315.29, so according to the Paasche price index, prices of this basket of goods have increased by 215.29% from 2005 to 2014.

(c) PP = 346.34, so according to the Paasche price index, prices of this basket

of goods have increased by 246.34% from 2005 to 2014. (d) PP = 335.398, so according to the Paasche price index, prices of this basket

of goods have increased by 235.398% from 2005 to 2014.

(e) PP = 319.44, PP = 335.398, so according to the Paasche price index, prices of this basket of goods have increased by 219.44% from 2005 to 2014.

2005 2014

price quantity cost price quantity cost

Good 1 2 10 20 4 8 32

Good 2 3 5 15 5 5 25

Good 3 4 1 4 5 3 15

2005 2010 2014

Price Quantity Price Quantity Price Quantity

Milk (1 litre) 0.6 2 1.5 14 2 12

Bread (1 loaf) 0.7 3 1.2 11 2.8 14

Water (500 ml) 0.8 1 1.3 5 1.8 7


106) Which of the following is the best interpretation of a Paasche price index of 78.25, where the base period is 2010 and the current period is 2013?

a) According to the Paasche price index, prices have decreased by 78.25% from

2010 to 2013.

b) According to the Paasche price index, prices have decreased by 178.25% from 2010 to 2013.

c) According to the Paasche price index, prices have decreased by 21.75% from 2013 to 2010.

d) According to the Paasche price index, prices have increased by 21.75% from

2010 to 2013. e) According to the Paasche price index, prices have decreased by 21.75% from

2010 to 2013.

107) In 2000, you earned an annual salary of $42,500 and in 2011, your annual

salary is $64,800. You know that the CPI in 2000 was 124.70 and the CPI in 2011 was 178.50, (CPI = 100 in 1990). What is your real income in 2011?

(a) $64,800

(b) $363.01 (c) $36,302.52 (d) $22,300

(e) $34,081.80

108) If your salary was $34,000 in 2005 and $44,000 in 2010 and the CPI was 100 in

2005 and 162 in 2010, what has happened to your real income?

a) Decreased by 67%. b) Decreased by 20%.

c) Increased by 32%. d) Increased by 68%.

e) Increased by 12%.

109) In 2010 the CPI was 172.6 and GDP was 1,357,034 in millions of dollars. What is the value of real GDP in 2010?

a) 786,231 millions of dollars

b) 1,357,034 in millions of dollars c) 7,862 in millions of dollars d) 7,862,306 in millions of dollars

e) Cannot calculate, as we do not know the base period for the CPI.


110) Suppose that one year, the CPI in Hobart was 123 and 144 in Adelaide. They

both have the same base year. Which of the following would be correct? a) Prices in Hobart cannot be lower than prices in Adelaide.

b) Prices in Hobart must be higher than prices in Adelaide. c) Prices in Hobart must be lower than prices in Adelaide. d) Prices rose faster in Adelaide than Hobart from the base to the current year.

e) Something that costs $144 in Adelaide costs $123 in Hobart.

111) If the 2009 price relative for bread, with base 1999=100, is 300, then what can

we say about the price of bread?

a) The price of bread in 2009 has increased by 300% since 1999. b) The price of bread in 2009 has increased by 200% since 1999.

c) The price of bread in 1999 was 200% lower than in 2009. d) The price of bread in 1999 was 100% lower than in 2009.

e) The price of bread is 300 times as much in 2009 as 1999.

112) A 2009 Paasche price index number (with base 2000 = 100) of 235 indicates that quantities that were in fact bought for

a) $100 in 2000 would cost $235 in 2009. b) $235 in 2009 would cost $100 in 2000.

c) $235 in 2000 would cost $100 in 2009. d) $100 in 2009 would cost $235 in 2000. e) $100 in 2000 would cost $335 in 2009


SECTION B

Worked Answer

Questions (WAQ)


SECTION B: Worked Answer Questions.

SDVV refers to exercises from the textbook, THIRD EDITION. Some Chapter exercises

have changed between the 3rd edition and earlier editions of the text. Hence, the exercises from the 3rd edition of the text are reproduced in this section. Please note that answers to odd numbered questions in the text are at the back of the

book and usually the questions are in pairs – an odd and an even one on a similar topic, so you can do an odd numbered question and check the answer to help you

with the even numbered ones. The exercises from earlier editions of the text are still very good and should be used by students in the same way.

1) SDVV Chapter 1, Exercise 6, page 43. A student finds data on an internet site that contains financial information about selected companies. He plans to analyze the data and the results to develop a stock

investment stratergy. What kind of data is he using? What concerns may you have about drawing conclusions from this data set?

2) SDVV Chapter 1, Exercise 20, page 44. Please answer only the following parts of this question. In 2013, Consumer Reports published an article comparing smart phones. It listed

46 phones, giving brand, price, display size, operating system (Android iOS, or Windows phones), camera image size (megapixels), and whether it had a memory

card slot. (a) Identify the five W’s plus the “how”. (b) Identify the quantitative variables and give the units.

(c) Identify the variables that are catergorical. (d) Identify the variables that are time series or cross-sectional.

(e) Are there any concerns?

3) SDVV Chapter 1, Exercise 22, page 44

Please answer only the following parts of this question. L.L. Bean is a large U.S. retailer that depends heavilly on its catalogue sales. It

collects data internally and tracks the number of catalogs mailed out, the number of square inches in each catalogue , and the sales ($ thousands) in the four weeks

following each mailing. The company is interested in learning more about the relationship (if any) among the timing and space of their catalogues and their sales. (a) Identify the five W’s plus the “how”.

(b) Identify the quantitative variables and give the units. (c) Identify the variables that are catergorical.

(d) Identify the variables that are time series or cross-sectional. (e) Are there any concerns?


4) SDVV Chapter 8, Exercise 4, page 292 A movie theatre company is interested in the opinions of their frequent customers

about their recently installed online ticketing system. Specifically they want to know what proportion of them plan to use the new ticketing system. They took a random sample of 15,000 customers from their data base and sent them an SMS message

with a request to fill out a survey in exchange for a free ticket to see a movie of their choice. (a) What is the population? (b) What is the sampling frame? (c) What is the population parameter of interest?

(d) What is the sampling method used?

5) Briefly explain why each of the following statements from past BES mini-projects,

are incorrect. (a) Our sample was biased. We should have taken a larger sample to prevent this. (b) We took a sample of 40 BES students by taking the 2 practicals (tutorials) that

our tutor runs and asking the tutor to distribute the survey form to 20 students in each practical. This is stratified sampling.

(c) Our target sample was the number of students who responded and answered YES, they were in the Business School.

(d) This was a random sample. It suffered from convenience sampling and non-response error.


For their class project, a group of Business students decides to survey the student body to assess opinions about a proposed new student coffee to judge how successful it might be. Their sample of 200 contained 50 first-year students, 50

sophomores, 50 juniors 50 and 50 seniors. (a) Do you think the group was using an SRS (simple random sample)? Why?

(b) What kind of sampling design do you think they used?

7) SDVV Chapter 8, Exercise 16, page 293 Indicate whether each statement below is true or false. If false, explain why.

(a) Asking viewers to call into an 800 number is a good way to produce a representative sample.

(b) When writing a survey, it’s a good idea to include as many questions as possible to ensure efficiency and to lower costs.

(c) A recent poll on a website was valid because the sample size was over 1,000,000 respondents. (d) Malls are not necessarily good places to conduct surveys because people who

frequent malls may not be representative of the population at large.



Hoping to learn what issues may resonate with voters in the coming election; the campaign director for the mayoral candidate selects one block at random from each

of the cities election districts. Staff members go there and interview all the residents they can find. Identify the following items (if possible). If you can’t tell, then say so. (a) The population (b) The population parameter of interest

(c) The sampling frame (d) The sample (e) The sampling method, including whether or not randomization was employed

(f) Any potential sources of bias you can detect and any problems you can see in generalizing to the population of interest.

9) Here are two entries from a previous semester’s “Best of the Worst graphic”

competition. Critically assess the displays.

(a)

(b)


10) SDVV Chapter 2, Exercises 2, 4 and 8, pages 67 and 68

As part of the marketing group at Pixar, you are asked to find the age distribution of the audience of Pixar’s latest film. With the help of 10 of your colleagues, you conduct

exit interviews by randomly selecting people to question at 20 different movie theatres. You ask them to tell you if they are younger than 6 years old, 6 to 9 years old, 10 to 14 years old, 15 to 21 years old, or older than 21. From 470 responses, you

find out that 45 are younger than 6, 83 are 6 to 9 years old, 154 are 10 to 14 years old, 18 are 15 to 21 and 170 are older than 21. For the age distribution:

(a) Make a frequency table. (b) Make a relative frequency table

Exercise 4 From the age distribution data described in Exercise 2: (a) Make a bar chart using counts on the y-axis.

(b) Make a relative frequency bar chart using percentages on the y-axis. (c) Make a pie chart.

Exercise 8: In addition to their age levels, the movie audiences in Exercise 2 [this] question ...

were also asked if they had seen the movie before (Never, Once, More than Once). Here is a table showing the responses by age group:

(a) Find the marginal distribution of their previous viewing of the movie. (Hint: find the

row totals). (b) Verify that the marginal distribution of ages is the same as that given in Exercise 2.

11) SDVV Chapter 2, Exercise 36 and , page 73

It has become more common for shoppers to “comparison shop” using the Internet. Respondents to a Pew survey in 2013 who owned cell phones were asked whether they had in the past 30 days, looked up the price of a product while they were in a

store to see if they could get a better price somewhere else. Here is a table of their responses by income level.

< $30K $30K - $49.9K $50K - $74.9K >$ 75K

Yes 207 115 134 204

No 625 406 260 417

(a) Find the conditional distribution (in percentages) of income distribution for those

who do not compare prices on the internet. (b) Find the conditional distribution (in percentages) of income distribution for

shoppers who do compare prices (on the internet). (c) Create a graph comparing the income distributions of those who compare prices with those who don’t.

(d) Do you see any differences between the conditional distributions? Write a brief (short) summary of what these data show about Internet use and its relationship to

income.

Under 6 6 to 9 10 to 14 15 to 21 Over 21

Never 39 60 84 16 151

Once 3 20 38 2 15

More than Once 3 3 32 0 4



The U.S. department of Labour (www.bls.gov) collects data on the number of U.S. workers who are employed at or below the minimum wage. Here is a table showing

the number of hourly workers by Age and Gender and the number who were paid at or below the prevailing minimum wage:

Hourly Workers

(in thousands)

At or below

minimum wage (in thousands)

Men Women Men Women

16-24 7978 7701 384 738

Age 25-34 9029 7864 150 332

35-44 7696 7783 71 170

45-54 7365 8260 68 134

55-64 4092 4895 35 72

65+ 1174 1469 22 50

(a) What percent of the women were ages 16 – 24? (b) Using side-by-side bar graphs, compare the proportions of the men and women

who worked at or below minimum wage at each Age group. Write a couple of sentences summarizing what you see.

13) SDVV Chapter 2, Exercise 50, page 76 PLUS EXTRA PARTS IN BOLD TYPE A company must decide which of two delivery services they will contract. During a

recent trial period, they shipped numerous packages with each service and have kept track of how often deliveries did not arrive on time. Here are the data.

(a) Compare the two service’s overall percentage of late deliveries.

(b) Based on the results in part (a) the company has decided to hire Pack Rats. Do you agree they deliver on time more often? Why or why not? Be specific (showing workings).

(c) The results here are an instance of what phenomenon? Explain.

Delivery Type of Number of Number of

Service Service Deliveries Late Packages

Pack Rats Regular 400 12

Overnight 100 16

Boxes R Us Regular 100 2

Overnight 400 28

http://www.bls.gov/


14) A random sample of 400 pairs of sunglasses from a Melbourne factory has 16

that are defective, and a sample of 1000 from a Sydney factory has 60 defects.

a) Which factory has the lower rate of defects? Which appears to be better? b) Suppose now that we have further data. The Melbourne sample has 300 men’s

sunglasses with 8 defects and 100 women’s sunglasses with 8 defects whilst the

Sydney sample has 300 men’s sunglasses with 6 defects and 700 women’s sunglasses with 54 defects.

Write down a 2-way table for each factory, separately, showing men’s and women’s sunglasses cross tabulated against defective or not.

c) Now calculate the defective rate for men’s sunglasses at each factory.

d) Now calculate the defective rate for women’s sunglasses at each factory. e) Explain what is paradoxical about these results.

f) Explain why these paradoxical results occurred. Show your calculations.

15) Chapter 3, Exercise 2, 4 and 6, pages 110 and 111

Exercise 2: As the new manager of a small convenience store, you want to understand the

shopping patterns of your customers. You randomly sample 20 purchases from yesterday’s records (all purchases in U.S, dollars).

39.05 2.73 32.92 47.51

37.91 34.35 64.48 51.96

56.95 81.58 47.80 11.72

21.57 40.83 38.24 32.98

75.16 74.30 47.54 65.62

(a) Make a histogram of the data using a bar width of $20. (b) Make a histogram of the data using a bar width of $10.

(c) Make a relative frequency histogram of the data using a bar width of $10. (d) Make a stem and leaf plot of the data using $10 as the stems and putting the

smallest amounts on top and round the data to the nearest $ (whole number).

Exercise 4: For the histogram you made in the part (a) of the previous question ie: Exercise 2(a) (a) Is the distribution unimodal or multimodal?

(b) Where is (are) the mode(s)? (c) Is the distribution symmetric?

(d) Are there any outliers?

Exercise 6 For the data in Exercise 2: (a) Would you expect the mean purchase to be smaller than, bigger than, or about the

same size as the median? Explain (briefly). (b) Find the mean purchase.

(c) Find the median purchase.


16) Chapter 3, Exercise 14 page 112 SLIGHTLY ADJUSTED in bold type

(a) Using the data of shopping patterns for the convenience store in the previous question, draw a boxplot, using method of finding quartiles taught in lectures.

(b) Does the boxplot nominate any outliers. (c) What purchase amount would be considered a high outlier?

17) SDVV Chapter 3, Exercise 46, page 117 A real estate agent has surveyed houses in 20 nearby zip codes in an attempt to put

together a comparison for a new property that she would like to put on the market. She knows that the size of the living area of a house is a strong factor in the price, and she’d like to market this house as being one of the biggest in the area. Here is a

histogram and summary statistics for the sizes of all the houses in the area.

a) What is the range of these sizes? b) Between what sizes do the central 50% of houses lie?

c) What summary statistics would you use to describe these data? d) Write a brief description of these data (shape, centre and spread).

18) SDVV Chapter 3, Exercise 54 page 118

Ozone levels (in parts per billion, ppb) were recorded at sites in New Jersey monthly.

Here are boxplots of the data for each month (over the 46 years) lined up in order (January=1).

a) In what month was the highest ozone level ever recorded?

b) Which month has the largest IQR? c) Which month has the smallest range? d) Write a brief comparison of the ozone levels in January and June.

e) Write a report (brief) on the annual patterns you see in the ozone levels.


19) Chapter 3, Exercise 18 page 113 The convenience store manager from the

previous exercise, has collected data on purchases from weekdays and weekends. Here are some summary statistics (rounded to the nearest dollar):

Weekdays n = 230 Min = 4, Q1 = 28, Median = 40, Q3 = 68, Max = 95

Weekends: n = 150 Min = 10, Q1 = 35, Median = 55, Q3 = 70, Max = 100

From these statistics, construct side-by-side boxplots and write a sentence comparing the two distributions.

20) State and briefly explain which is the best measure of central tendency for each of the following. For each of parts (a) to (d), provide a sketch of a suitable frequency

curve where possible; either a symmetric curve, positively skewed curve or a negatively skewed curve, labelling where you believe would be the position of the

mean, median and mode. (a) Earnings of employees of an airline company. (b) Colours of smartphone covers in a random sample.

(c) Final marks of a very easy compulsory test, where a number of students that did not sit the test automatically received a mark of zero.

(d) Methods of travelling to work.

21) Suppose the marketing manager of a large company was earning $129,420 per annum, got a raise and is now earning $140,000 per annum. Indicate how this would affect the following summary statistics (increase, decrease or stay about the same):

(a) Mean (b) Median (c) Range (d) IQR (e) Standard deviation

22) The number of orders received by a company over the last 25 days are as follows:

3 0 1 4 4 4 2 5 3 6

4 5 1 4 2 3 0 2 0 5 4 2 3 3 1

Please calculate parts (a) to (g) manually, use methods taught in BES Lecture.

a) Give the mean, median and mode of this sample of number of orders/day. Show formulae/workings or your reasoning.

b) Find the quartiles and the interquartile range.

c) Write down the 5 number summary for this data. d) Draw a box plot for this data.

e) Calculate the standard deviation. f) On the 26th day, the company received 8 orders. Use a Z-score to determine if

this number of orders was unusual. g) Now use EXCEL to provide the descriptive statistics for these data and so check

some of your answers above. (Hints on using Excel below or see text book)


Hints to use EXCEL to provide the default range of descriptive statistics: i) Type the data in one column. Put the name (for example, orders) in the first cell (maybe

B1) and then the data below that (maybe B2:26). ii) Select Data Data Analysis Descriptive Statistics and click OK. If you do not see

this option in the menu – see Add-ins below!

iii) Type in or select the input range (including the cell with the name) e.g. B1:B26. iv) Make sure Labels in First Row is checked. v) Make sure the circle on Output Range is checked and enter a cell for the output (E.g. C2).

vi) Make sure Summary Statistics is checked.

vii) Click on OK. ADD-INs: EXCEL has Add-Ins, optional components which increase its functionality.

Some, such as the Analysis Toolpak needed here for Data Analysis, come with EXCEL

whilst others come from other suppliers. If you do not find Data Analysis in the data part of the ribbon, add it in as follows:

Go to File menu. Select options.

Next select Add-Ins (on the left hand side). Then choose excel add-ins in the drop down box to the right of manage

towards the bottom of the screen and click on go. Ensure that both Analysis Toolpak and Analysis Toolpak VBA are checked and

then click on OK. Now you will find Data Analysis on the Data menu.


23) The following table shows data on total assets ($ billion) for a small sample of U.S. banks.

Bank Assets ($ billion)

State Street Bank and Trust 160.5

Discover Bank 63.9

Bank West 72.8

Citizens Bank 130.0

Northern Trust 83.8

Huntington Bank 53.8

Key Bank 91.8

People’s United 27.9

(a) Calculate the mean total assets for this sample. (b) Calculate the standard deviation of total assets for this sample.

(c) Standardize the asset value of State Street Bank and Trust (Hint: find the z score). Interpret the standard value.

24) SDVV Chapter 3, Exercise 74, page 122 The World Bank, through their Doing Business project (www.doingbusiness.org),

ranks nearly 200 economies on the ease of doing business. One of their rankings measures the ease of starting a business and is made up (in part) of the following variables: number of required start-up procedures, average start-up time (in days),

and average start-up cost (in % per capita income). The following table gives the means and standard deviations of these variables for 95 economies.

Procedures (#) Time(days) Cost (%) Mean 7.9 27.9 14.2

SD (standard deviation) 2.9 19.6 12.9

Here are the data for three countries. Procedures (#) Time(days) Cost (%)

Spain 10 47 15.1 Guatemala 11 26 47.3 Fiji 8 46 25.3

(a) Use Z scores to combine the three measures. (Hint: Do this for each country separately: find a z score for procedures, a z score for time and a z score for cost then sum these to get the total z score for each country). (b) Which country has the best environment after combining the three measures? Be careful - a lower rank indicates a better environment to start up a business.

http://www.doingbusiness.org/


25) For each of the following scenarios indicate which is the explanatory variable and

which is the response variable, and do a rough sketch of a labelled scatterplot x and y axes, showing the suspected direction of any possible linear association.

(a) Salary data (in $) as well as years of managerial experience collected for a sample of executives in the high tech industry.

(b) Interest rates (in % per annum) and number of house mortgage applications.

(c) Data collected on job performance rating (in points) and hours of training for a sample of employees at a telecommunication repair facility.

(d) Price (in $) of flat screen TVs and screen size (in inches).

26) Here are several scatterplots. The calculated correlations are −0.977, −0.021, 0.736 and 0.951. Which is which?

27) SDVV Chapter 4, Exercise 54, page 168 Tell what each of the following residual plots indicates about the appropriateness of

the linear model that was fit to the data.


28)

In a paper presented by Anne Arnold to a Teaching and Learning Forum, she estimated several regression equations using data from BES results. (You may

assume that the assumptions and conditions for regression are met). Here is one of the regression equations:

Est. final = 30.87 + 3.54 tutorial with r = 0.65

The response variable is final: the final marks obtained by students who remained in the course, %, and the explanatory variable is tutorial: the mark (out of 10) for tutorial participation, where students were awarded 1 mark for each tutorial

attended, where there were 10 tutorials in that semester.

a) Interpret the slope of this equation.

b) Predict the final marks of a student who attended no tutorials.

c) Predict the final marks of a student who attended all 10 tutorials.

d) Interpret the coefficient of determination. How good are your predictions?

e) List 2 other factors that we might want to take into account when predicting a

student’s final mark in the course.

29) SDVV Chapter 4 Exercise 52 page 168

An online clothing retailer examined their transactional database to see if total yearly Purchases ($) were related to customers’ Incomes ($). (You may assume that the

assumptions and conditions for regression are met). The least squares regression equation is

Estimated Purchases = −31.6 + 0.012 Income.

(a) Interpret the intercept in this linear model.

(b) Interpret the slope in this linear model.

(c) If a customer has an Income of $20,000, what is his predicted total yearly Purchases?

(d) This customer’s yearly purchases were actually $100. What is the residual using this linear model? Did the model provide an underestimate or overestimate for this

customer?


30) Use the EXCEL output below to answer the following questions. The variables are

WAGE (hourly wage rate, $US per hour) and EDUC (years of formal education) for 1000 people in the US. The data are from the 1997 population survey.

a) Comment on the scatter plot. b) Write down the regression equation.

c) Write down the value of and interpret the correlation coefficient. d) Interpret the slope of the equation.

e) Predict the wage of someone with 10 years of education. f) Do you think your estimate would be any good? Explain your answer. g) Comment on the residual plot.

Mean 10.213 Mean 13.285

Standard Error 0.198 Standard Error 0.078

Median 8.790 Median 13

Mode 4.420 Mode 12

Standard Deviation 6.247 Standard Deviation 2.468

Sample Variance 39.021 Sample Variance 6.092

Kurtosis 7.051 Kurtosis 1.539

Skewness 1.956 Skewness -0.212

Range 58.160 Range 17

Minimum 2.030 Minimum 1

Maximum 60.190 Maximum 18

Sum 10213.020 Sum 13285

Count 1000 Count 1000

wage educ

scatter plot of wage and educ

0

10

20

30

40

50

60

70

0 5 10 15 20education

wage

Histogram of Wage

0

50

100

150

200

250

2 4 6 8 10 12 14 16 18 20 More

Fre

qu

en

cy

Histogram of Educ

0

100

200

300

400

6 7 8 9 10 11 12 13 14 15 16 17 18

Mor

e

Fre

quency

Response variable: WAGE

SUMMARY OUTPUT


Multiple R 0.450

R Square 0.202



Observations 1000

ANOVA


Regression 1 7888.511 7888.511 253.200 5.59313E-51

Residual 998 31092.986 31.155

Total 999 38981.497


Intercept -4.912 0.967 -5.081 0.000 -6.809 -3.015

educ 1.139 0.072 15.912 0.000 0.998 1.279

educ Residual Plot

-20

-10

0

10

20

30

40

50

0 5 10 15 20

educ

Resid

uals


31) You are thinking of selling your house and to try to decide what price to ask for.

You collect data on the Selling price ($ 000’s) and the Average Floor Area, (square metres) of properties from the last 10 recent house sales in your area.

Note: An empty block would have an Average Floor Area of 0 square metres. The following EXCEL output is provided.

Average Floor Area (sqm) Price ($)

245 625

540 875

458 900

150 500

270 500

100 300

290 635

300 700

350 835

200 430

SUMMARY OUTPUT


Multiple R 0.9171

R Square 0.841



Observations 10

ANOVA


Regression 1 304947.1623 304947.1623 42.314956 0.0001871

Residual 8 57652.83769 7206.604712

Total 9 362600

CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%

Intercept 230.04 67.09068433 3.428723648 0.0089717 75.32402 384.7468

Average Floor Area (sqm) 1.3778 0.211800779 6.504994707 0.0001871 0.8893495 1.866176

0

200

400

600

800

1000

0 100 200 300 400 500 600P

rice

($)

Average Floor Area (sqm)

Scatterplot of Price against Average Floor Area

-150

-100

-50

0

50

100

150

0 100 200 300 400 500 600Re

sid

ual

s

Average Floor Area (sqm)

Average Floor Area (sqm) Residual Plot

(a) Check the conditions for linear regression.

(b) What is the estimated regression equation? (c) Interpret the slope.

(d) Interpret the correlation coefficient. (e) Your house has a floor area of 250 square metres. Predict your selling price. (f) How good is your prediction?

(g) If you receive an offer of $550,000 for your house, should you take it? Explain using residuals.

(h) You also own an empty block of land in the same area, measuring in land size of 600 square metres. Predict the sale of your empty block of land. Briefly explain if you

think that this regression model gives an accurate prediction for selling an empty block of land. (i) List at two other factors that could affect the selling price of your house.


32) A random sample of rental accommodations was taken from Adelaide, where the

distance from the city centre, in kilometres, and the weekly rental fee, in dollars, was recorded with the data provided in the table below.

Note: The city centre is defined as the central square kilometre in the centre of Adelaide bordered between North, South, East and West Terraces, within which rental accommodations exist.

Distance (km) Rent ($)

5 800

15 300

3 700

0 800

7 400

9 450

10 400

14 350

30 1,600

4 600

3 650

1 750

Please use EXCEL for the REGRESSION.

See HINTS on next page on how to use EXCEL for scatter plots and regression. Students MUST hand in the Excel regression output with their solution.

Select to create a residual plot when doing the regression, as needed later in question.

a) Use EXCEL to prepare a scatterplot of Price against Distance. b) What can you say about the direction of the association?

c) What can you say about the form of the relationship? d) What can you say about the strength of the relationship?

e) Does the scatterplot show any outliers? NOW REMOVE THE SUSPECTED OUTLIER Use EXCEL to fit the regression of Price on Capacity.

f) Write down the estimated equation. g) Interpret the slope.

h) Interpret the intercept. Is it meaningful in the context of this question? i) Give the value of the correlation coefficient. Interpret.

j) Give the value of the coefficient of determination. k) What is the weekly rental fee predicted using this model, for accommodation

that is 9 kilometres from the city centre?

l) Using you answer from part (j), how good is your prediction? m) What is the residual weekly rental fee for rental accommodation 9 km from the

city centre? n) Briefly, explain whether the model has overestimated or underestimated the

weekly rental fee, for rental accommodation, 9 kilometres from the city centre?

o) Submit and comment on the scatterplot of the residuals. p) Do you think we could use this model to predict the weekly rental fee for

accommodation 1,000 km from the city centre? Briefly explain.


HINTS on how to use EXCEL to create a Scatterplot and for Regression

i. Enter the data in columns, with the variable names (labels) in the first row.

ii. For the scatter plot;

Highlight the 2 columns of data

Include names (labels) only in some versions of Excel. For example, Excel 2010: highlight only the data, not the labels, to get a

scatter plot , where first column is the X variable and second column is the Y variable. Insert title, x-axis label and y-axis labels by inserting text

boxes.

Go to insert scatter plot etc.

Alternatively, see * Note below to create a scatterplot using the same dialogue box when creating a regression. NOTE: BE AWARE OF

DOING THIS IF YOU HAVE TO REMOVE A SUSPECTED OUTLIER

iii. For the regression; Data Analysis (You may have to Add in the Data Analysis tool pak in

your version of Excel)

Regression and click OK

Type in or select the input Y range (you must include the cell with the name)

Type in or select the input X range (you must include the cell with the

name)

Make sure the Labels box is checked.

Check you have chosen the correct columns for the X and Y variables.

Ensure you select the Output Range; then immediately click in the output

range section of dialogue box then click on a cell in your spreadsheet.

Select to create a residual plot when doing the regression, as may be needed in question.

* Note: By selecting Line fit plots in the Regression dialogue box, you will get a

scatter plot of the predicted (regression model) values and the actual data values of Y for the same value of X.

To get a scatter plot of Y vs X from this Line fit plot, delete the predicted values, (by clicking on the red coloured plot points and pressing delete). You can then change the title to Scatterplot of … vs … (using names).


33) A small specialist food store offers different types of olive oil for sale, amongst

other food items. To determine whether price impacts sales, the manager of the store recorded the volume of oil sales, measured in litres (L) and the price, measured

in dollars per litre ($/L) of the different qualities of olive oil the store sold last month. Here is a table of last months olive oil sales:

Please use EXCEL for the REGRESSION.

See HINTS on previous page on how to use EXCEL for scatter plots and regression.

Students MUST hand in the Excel regression output with their solution.

a) Use EXCEL to prepare a scatterplot of volume of oil sales against price. b) What can you say about the direction of the association?

c) What can you say about the form of the relationship? d) What can you say about the strength of the relationship?

e) Does the scatterplot show any outliers? NOW REMOVE THE SUSPECTED OUTLIER

Use EXCEL to fit the regression of volume of oil sales on price. f) Write down the estimated equation.

g) Interpret the slope. h) Interpret the intercept. Is it meaningful in the context of this question?

i) Give the value of the correlation coefficient. Interpret. j) Give the value of the coefficient of determination. k) What monthly volume of oil sales would you predict for an olive oil that sells in

the store for $10 per litre? l) Using you answer from part (j), how good is your prediction in part (k)?

m) What is the residual monthly volume of oil sales for olive oil that sells for $10 per litre?

n) Does the model overestimate or underestimate the monthly volume of oil sales? o) Create a scatterplot of the residuals. Comment. p) The store owner is thinking of stocking the most expensive olive oil in the

world; an ultra-premium olive oil called Lambda, made in Greece. The store owner would sell Lambda at $185 per litre to customers. Do you think we could

use this model to predict the store’s monthly volume of Lambda sales? Briefly explain.

Price

($/L)

Volume of

oil sales (L)

18 1,200

7 2,000

10 1000

50 4000

12 1,200

25 1,000

4 1,800

6.5 1,700

20 900

30 600



Multigenerational families can be categorized as having two adult generations such as parents living with adult children, “skip” generation families, such as grandparents

living with grandchildren, and three or more generations living in the household. Pew Research surveyed multigenerational households. This table is based on their reported results.

2 Adult Gens 3 Skip Gens

3 or more

Gens

White 509 55 222 786

Hispanic 139 11 142 292

Black 119 32 99 250

Asian 61 1 48 110

828 99 511 1438

(a) What is the probability that a multigenerational family is Hispanic? (b) What is the probability that a multigenerational family selected ta random is a

Black, two-adult-generation family? (c) What type of probability did you find in parts a and b?

35) SDVV Chapter 5 Exercise 10 page 199 Using the table from Exercise 8 (the previous question),

(a) What is the probability that a randomly selected Black multigenerational family is a 2 Adult Generation family? (b) What is the probability that a randomly selected multigenerational family is White,

given that it is a “skip” generation family? (c) What is P(3 or more Generations | Asian)?


A Mintel study asked consumers if electronic communications devices influenced whether or not they bought a certain car. The table below gives the results classified by household income: Communications influence on car purchase, by household income, July 2011

Income

Communication (e.g., hands free calling):

< $50K $50K – 99.9K $100K+ Total

Very much 30 57 41 128

Somewhat 26 39 62 127

Not at all 23 39 35 97

Total 79 135 138 352

If we select a person at random from this sample:

(a) What is the probability that electronic communication devices somewhat influenced their decisions? (b) What is the probability that the person is earning at least $100K?

(c) What is the probability that the person was somewhat influenced by electronic communications and earns at least $100K?

(d) What is the probability that electronic communications somewhat influenced the purchase or that the person earns at least $100K?


37) SDVV Chapter 5, Exercise 58, page 205 A European department store is developing a new advertising campaign for their new

U.S. location, and their marketing managers need to understand their target market better. A survey of adult shoppers found the probabilities that and adult would shop at their new U.S. store classified by age is shown below.

(a) What is the probability that a survey respondent will shop at the U.S. store? (b) What is the probability that a survey respondent will shop at the store given that

they are younger than 20 years old? (c) What is the probability that a survey respondent who is older than 40, shops at the store.

(d) What is the probability that a survey respondent is younger than 20 or will shop at the store?

38) SDVV Chapter 5, Exercise 62, page 206 The following questions use the table of data from Chapter 5 Exercise 54, reproduced below:

Communications influence on car purchase, by household income, July 2011

Income

Communication (e.g., hands

free calling):

< $50K $50K – 99.9K $100K+ Total

Very much 30 57 41 128

Somewhat 26 39 62 127

Not at all 23 39 35 97

Total 79 135 138 352

(a) If we select a respondent at random, what is the probability that we choose a person earning less than $50K and responded “somewhat”?

(b) Among those earning $50-99.9K, what is the probability that the person responded “not at all”?

(c) What is the probability that a person who responded “very much” was earning at least $100K? (d) If the person responded “very much”, what is the probability that they earn

between $50K and 99.9K? (e) Are the responses to the question and income level independent?

Ag

e

Shop

Yes No Total

< 20 0.26 0.04 0.30

20 – 40 0.24 0.10 0.34

>40 0.12 0.24 0.36

Total 0.62 0.38 1.00



Professional polling organizations face the challenge of selecting a representative sample of U.S. adults by telephone. This has been complicated by people who only

use cell phones and by others whose landline phones are unlisted. A careful survey by Democracy Corps determined the following proportions:

Cell phone only 39%

Both cell and landline 29%

Landline only listed 22%

Landline only unlisted 7%

(a) What is the probability a randomly selected U.S. adult has a landline?

(b) What is the probability that a U.S. adult has a landline given that he or she has a cell phone?

(c) Are having a cell phone and having a landline independent? Explain. (d) Are having a cell phone and a landline disjoint? Explain.

40) SDVV Chapter 6 Exercise 10 page 231 PLUS EXTRA PARTS IN BOLD

Given independent random variables, X and Y, with means and standard deviations shown, find the mean and standard deviation of each of the variables in parts a to d.

Note: Mean X = E[X] and SD X = SD[X], similarly for Y.

Mean SD

X 80 12

Y 12 3

a) X − 20 b) 0.5Y

c) X + Y d) X – Y e) X + 0.5Y + 4

f) 2X – 0.5Y

41) The monthly demand (in hundreds) for a magazine at a newsagent is listed

below along with corresponding probabilities.

Demand (x) P(x)

1 0.1 2 0.25 3 0.5

4 0.15

a) Find the expected demand and interpret.

b) Find the standard deviation of demand. c) A newsagent receives a payment of $100 for stocking the magazine plus 90

cents for each magazine sold. What is the mean and variance of the total

revenue of the newsagent from selling the magazine?



A motor home sales department has created three plans for purchasing a new or used motor home for leisure to increase potential sales for its fleets. They estimate

that 20% will choose plan 1, which includes no down payment with 10-years finance option; 40% will choose plan 2, which includes a 20% down payment with a 7-year finance option; and 40% will choose plan 3, which includes 40% down payment and

a 5-year finance option. (Hint: create a table showing the discrete probability model of X and p(X) , converting % to decimals. Then use formula for E[X] and V[X] and square root V[X] to get SD[X].)

(a) Find the expected value of the type of down payment potential customers will need.

(b) Find the standard deviation of the type of down payment potential customers will need.


A small software company will bid on a major contract. It anticipates a profit of $50,000 if it gets it, but thinks that there is only a 30% chance of that happening.

(Hint: create a table showing the discrete probability model of X and p(X) , converting % to decimals. Then use formula for E[X] and V[X] and square root V[X] to get SD[X].) (a) What is the expected profit? (b) Find the standard deviation for the profit.

44) SDVV Chapter 6 Exercise 32 page 233 PLUS EXTRA PART IN BOLD For warranty purposes, analysts want to model the number of defects on the screen of the new tablet they are manufacturing. Let X = number of defective pixels per

screen. If X can be modeled by: X = # of Defective pixels 0 1 2 3 4 or more

P(X = x) 0.95 0.04 0.008 0.002 0

(a) What is the expected number of defective pixels per screen? Interpret. (b) What is the standard deviation of the number of defective pixels per screen?

(c) What is the expected number of defective pixels in the next 100 screens? (d) What is the standard deviation of the number of defective pixels in the next 100

screens? 45) SDVV Chapter 6 Exercise 34 page 233

At a casino, people play the slot machines in hopes of hitting the jackpot, but most of the time, they lose their money. A certain machine pays out an average of $0.92 (for

every dollar played), with a standard deviation of $120. (a) Why is the standard deviation so large?

(b) If a gambler plays 5 times, what are the mean and standard deviation of the casino’s profit? (c) If gamblers play this machine a 1,000 times in a day, what are the mean and

standard deviation of the casino’s profit?


46) Chapter 7 Exercises 10 and 12 page 262 plus EXTRA PARTS IN BOLD

Exercise 10: What percent of a standard Normal model is found in each region? Draw a picture first.

(a) z > −2.05 (b) z < − 0.33 (c) 1.2 < z <1.8

(d) |z|<1.28 which is −1.28 < z < 1.28

Exercise 12: In a standard Normal model, what value(s) of z cuts off the region described? Don’t forget to draw a picture. (a) The lowest 20%

(b) The highest 15% (c) The highest 20%

(d) The middle 50% (e) The first quartile

(f) The third quartile

47) A sample of students were selected and asked to participate in a simple

experiment, measuring reaction length (in cm). Here are the sample data, descriptive statistics and a histogram.

a) Using the mean and standard deviation from the Excel output and the

68/95/99.7 rule, within what values would you expect the middle 95% of values

to lie? b) In fact, what % of all the values in the sample data, actually fall in that interval

you found in part a? c) Do you think a Normal model is appropriate for the reaction times? Explain. d) One student had a measurement of 15.5 cm. Was this unusual? Answer this by

calculating the standardised value and the area under the normal curve to the right of this value.

Histogram

0

10

20

30

40

0 4 8 12 16 20 24 28Bin

Fre

qu

en

cy


48) SDVV Chapter 7, Exercise 28 and 30, pages 264 and 265 PLUS EXTRA PARTS

IN BOLD Exercise 28: For the 300 trading days from January 11, 2012 to March 22, 2013, the

daily closing price of IBM stock (in $) is well modelled by a Normal model with mean $197.92 and standard deviation $7.16, According to this model, what is the probability that on a randomly selected day in this period the stock priced closed

(a) above $205.8? (b) below $212.24

(c) between $183.60 and $205.08? (d) Which would be more unusual, a day on which the stock price closed above $206 or below $180?

Exercise 30: According to the model in Exercise 28, what cut-off value of price would

separate the (a) lowest 16% of the days?

(b) highest 0.15%? (c) middle 68%? (d) highest 50%?

(e) lowest 25%? (f) highest 25%?

49) SDVV Chapter 7 Exercise 38 page 265 Every Normal model is defined by its parameters, the mean and the standard deviation. For each model described here, find the missing parameter.

Don’t forget to draw a picture.

a) µ = 1250; 35% below 1200; σ = ? b) µ = 0.64; 12% above 0.70; σ = ? c) σ = 0.50; 90% above 10; µ = ?

d) σ = 220; 3% below 202; µ = ?


A tyre manufacturer believes that the tread life of its snow tyres can be described by a Normal model, with a mean of 32,000 miles and standard deviation of 2,500 miles. (a) If you buy a set of these tyres, would it be reasonable for you to hope that they

will last 40,000 miles? Explain. (b) Approximately, what fraction of these tyres, can be expected to last less than

30,000 miles? (c) Approximately, what fraction of these tyres, can be expected to last between

30,000 and 35,000 miles? (d) Estimate the IQR for these data. (e) In planning a marketing strategy, a local tyre dealer wants to offer a refund to

any customer whose tyres fail to last a certain number of miles. However, the dealer does not want to take too big a risk. If the dealer is willing to give refunds to no

more than 1 of every 25 customers, for what mileage can he guarantee these tyres to last?


51) The active lifetime of a particular brand and model of smart phone is Normally

distributed with a mean of 34 months and a standard deviation of 5 months. Draw a picture for each of parts a to d to support your solutions.

(a) What is the probability that one randomly selected smart phone of this particular brand, will last less than 24 months?

(b) What is the probability that one randomly selected smart phone, of this particular brand, will last more than a year and a half?

(c) What is the probability that one randomly selected smart phone, of this particular brand, will last between 24 months and 48 months? (d) Determine the minimum number of whole months that this particular brand of

smart phone will last for in the top 1.1%.

52) SDVV Chapter 9, Exercises 22 and 24 pages 325 and 326 Exercise 22: An automatic character recognition device can successfully read about 85% of handwritten credit card applications. To estimate what might happen when

this device reads a stack of applications, the company did a simulation using samples of size 20, 50, 75 and 100. For each sample size, they simulated 1000 samples with

success rate p = 0.85 and constructed the histogram of the 1000 sample proportions, shown here. Explain what these histograms say about the sampling distribution

model for sample proportions. Be sure to talk about shape, centre and spread.

Exercise 24

The automatic character recognition device discussed in Exercise 22, successfully reads about 85% of handwritten credit card applications. In Exercise 22, you looked at the histograms showing distributions of sample proportions from 1,000 simulated

samples of size 20, 50, 75 and 100. The sample statistics from each simulation is provided in the following table.


(a) According to the Normal model, what should the theoretical mean and standard deviations be for these sample sizes? (b) How close are those theoretical values to what was observed in these

simulations? (c) Looking at the histograms provided in Exercise 22, at what sample size would you

be comfortable using the Normal model as an approximation for the sampling distribution? (d) What does the Success/Failure Condition say about the choice you made in

part (c).

53) Based on past experience, a car dealership believes that 30% of its customers

who purchase a car, using the car dealership’s lease-hire agreement, do not make their payments on time. The car dealership randomly selects 100 of its customers who purchased a car using the dealership’s lease-hire agreement.

Let p represent the population proportion of this dealership’s lease-hire agreement customers who do not make their payments on time.

(a) Describe the appropriate model for P

?

(Hint: Check conditions, specify the name of the distribution; specify the mean; specify the standard deviation) (b) What is the probability that more than one third of this sample do not make their

payments on time?


The proportion of adult women in Latvia is approximately 54%. A marketing survey telephones 400 people at random.

a) What is the sampling distribution of the observed proportion that are women?

b) What is the standard deviation of that proportion? c) Would you be surprised to find 56% women in a sample of size 400? Explain?

d) Would you be surprised to find 51% women in a sample of size 400? Explain? e) Would you be surprised to find that there were fewer than 180 women in the

sample? Explain. 55) SDVV Chapter 9 Exercise 58 page 329 PLUS EXTRA PART IN BOLD

During the period of Sep 2 – Oct 10, 2013, a Gallup Poll asked 1,500 Indian adults, aged 18 or over, how they rated economic conditions. Only 29% rated the economy

as “Getting better”. Construct a 95% confidence interval for the true proportion of Indians who rated the Indian economy as improving. Interpret your confidence

interval.


56) In a random sample of 100 customers in “The Olde Coffee Shoppe”, it was found that 25 customers had paid by credit card.

a) Find a 90% confidence interval for the proportion of all customers who pay by credit card. Interpret this interval.

b) Find a 95% confidence interval for the proportion of all customers who pay by credit card.

c) Suppose instead that the sample had been 60 people with 15 using their credit card. Now calculate the 90% and the 95% confidence intervals for the population proportion.

d) Compare the four confidence intervals and comment on the results.


In preparing a report on the economy, we need to estimate the percentage of businesses that plan to hire additional employees in the next 60 days.

(a) How many randomly selected employers must we contact in order to create an estimate in which we are 98% confident with a margin of error of 5%?

(b) Suppose we want to reduce the margin of error to 3%. What sample size will suffice? (c) Why might it not be worth the effort to try and get an interval with a margin of

error of 1%?

58) SDVV Chapter 9, Exercise 46 page 328

Recently, two students made worldwide headlines by spinning a Belgian euro 250 times and getting 140 heads-that’s 56%. That makes the 90% confidence interval (51%, 61%). What does this mean? Are the conclusions in parts a-e correct? Explain

your answers? a) Between 51% and 61% of all euros are unfair.

b) We are 90% sure that in this experiment this euro landed heads between 51% and 61% of the spins.

c) We are 90% sure that spun euros will land heads between 51% and 61% of the

time. d) If you spin a euro many times, you can be 90% sure of getting between 51%

and 61% heads. e) 90% of all spun euros will land heads between 51% and 61% of the time.

59) Receipts of a small clothing store show that customer purchases have a skewed

distribution with mean $32 and standard deviation $20. a) Explain why you cannot determine the probability that the next customer will

spend more than $40. b) Can you estimate the probability that the next 8 customers will spend an

average of more than $40? Explain. c) Can you estimate the probability that the next 50 customers will spend an

average of at least $40? Explain. Calculate an answer if possible.


60) Incomes for production line workers in a certain city average $38.74 per hour with a standard deviation of $8.78. The incomes are skewed to the right.

a) Sketch a frequency curve that would represent the original population.

b) Now describe the sampling distribution for the sample mean for samples of size 100 and sketch this distribution.

c) Assume instead that the sample size was 64. Without working out the problem, state what would happen to the sampling distribution.

61) The lift in the Nexus 10 Building says Max 17 people or 1140 kg. This means 17

people who average 67 kg. If people’s weights are modelled by a normal model with mean 68.68 kg and standard deviation 15.67kg, find the probability that 17 people

would weigh more than 1140 kg.

62) In a test, students averaged 14.2 errors with a standard deviation of 4.2 errors.

a) If errors are known to be normally distributed, what is the probability that a given student will have more than 13 errors in the test?

b) If errors are not known to be normally distributed, what is the probability that a

sample of 49 students will average more than 13 errors in the test?

c) Why are your answers different?

d) Why was the assumption of normality required in part (a) but not in (b)?

63) In their mini-project, a BES team posed the question: “What is the average price that people living in Adelaide would reasonably expect to pay for a good cup of

regular coffee? Their sample results were: n = 41, sample mean = $3.50 and sample standard deviation = $0.429

The histogram of sample data was unimodal and showed only a slight positive skew.

a) Check the conditions for inference about the mean.

b) Construct a 98% CI for the average price that people living in Adelaide would

reasonably expect to pay for a good cup of regular coffee.

c) Interpret the CI you calculated in (b) above.

d) How would their confidence interval width change, if this team chose to do a

90% confidence interval, holding all else the same?

e) How would their 98 % confidence interval in part b change, if this team chose to use a sample of size 100, holding all else the same?


64) SDVV Chapter 11, Exercises 8 and 14 (omit part c), page 387

Exercise 8 A random sample of 24 phone conversations was recorded by a local university

switch board and the time spent in conversation (in minutes) was noted below:

38.12 2.7 32.82 47.51 36.52 34.2

64 52 26.6 31 5 12.4

32 4 1 17 18 33

12 6 8 42 15 16

The average was 24.45 minutes and the standard deviation was 17.23 minutes. (a) Find the standard error of the mean.

(b) How would the standard error change if the sample size had been reduced to 10? (Assume that the sample standard deviation did not change).

Exercise 14 (omit part c)

For the purchase amounts in the Exercise 8: (a) Construct a 90% confidence interval for the mean purchases of all customers, assuming that the assumptions and conditions for the confidence interval, have been

met. (b) How large is the margin of error?

65) How many pages can you expect to get from a print cartridge? (based on a question 44 from Lind, Marchal and Mason, Statistical Techniques in Business and Economics, 11th edition, McGraw-Hill, page 324) Suppose we took a random sample of cartridges and wrote down how many pages each printed. Here is a histogram along with some summary statistics.

a) Check the conditions for inference about the mean.

b) Find a 90% confidence interval for the true mean. c) Interpret this interval. d) Write down, from the EXCEL output, the 95% CI for the mean.

e) Which of your two intervals is wider? Is this what you expected?

Number of pages

Mean 2597.783


Median 2698

Mode 2888

Standard Deviation 444.634

Sample Variance 197699.374

Kurtosis -0.901

Skewness -0.054

Range 1541

Minimum 1884

Maximum 3425

Sum 119498

Count 46

Confidence Level(95.0%) 132.040

1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 More

Number of pages


66) SDVV Chapter 10, Exercise 12, page 353 PLUS EXTRA PART IN BOLD

Write the null and alternative hypotheses to test each of the following situations. Briefly describe what the parameter you are testing, is in context of the

question. (a) A 2010 Harvard Business Review article looked at 1109 CEOs from global

companies and found that 32% had MBAs. Has the percentage changed?

(b) Recently, 20% of cars of a certain model have needed costly transmission work after being driven between 50,000 and 100,000 miles. The car manufacturer hopes that the redesign of a transmission component has solved this problem.

(c) A marketer researcher for a cola company decides to field test a new soft drink

flavor, planning to market it only if he is sure that over 60% of the people like the flavour.

67) It is estimated that 40% of service stations have fuel tanks that leak. A new design is supposed to lessen the prevalence of these leaks and has been tested in

South Australia. A random sample of 27 of the new design tanks finds that 7 show some signs of leaking.

a) What are the null and alternative hypotheses? b) Check the conditions necessary for inference.

c) Test the hypothesis at the 5% level. d) State your conclusion.

e) If the new design actually works have you made an error? If so, what kind of error?

f) What 2 things could you do to decrease the probability of making this kind of

error?

68) Two students studying BES I are worried about the failure rate.

a) The first has no idea what the failure rate is so takes a random sample of 100 students of whom 28 failed. Based on the sample evidence, calculate the 90%

confidence interval and interpret it. Would this student conclude that the failure rate is 20%?

b) The second student has a prior belief that the fail rate is 20%. Set up and test the appropriate hypotheses, at =0.05 using the same sample as above.

Would this student conclude that the fail rate is 20%?

c) In your own words, compare your answers to part (a) and (b) above - do they reach the same conclusion? Why or why not?


69) The Advertiser Voteline* on January 25th 2012 p 17 reported the results of the

poll of readers from the preceding day; the question was: Should Unis teach alternative medicine?

The Advertiser January 25th 2012 p 17 Of the 52 callers, 32 said YES and the remaining 20 said NO.

a) Assuming this was a properly conducted random sample, test whether a majority of people agree with the question (Should Unis teach alternative

medicine?); do this by calculating and using the appropriate p-value. b) Repeat the hypothesis test, this time using the critical value method.

i) Use the 5% level of significance. Provide a sketch.

ii) Use the 1% level of significance. Provide a sketch. c) Your answers in the 2 parts of part (b) above will differ – explain what is going

on here. d) The Advertiser is Adelaide’s daily newspaper; the Voteline consists of a topical

question posed each day. Respondents phone in and either agree or disagree.

The results are published the next day. Comment on the assumption in part a, that this was a properly conducted

random sample.

70) SDVV Chapter 10, Exercise 28, page 355 PLUS EXTRA PART IN BOLD A billing company that collects bills for doctor’s offices in the area is concerned that the percentage of bills being paid by Medicare has risen. Historically, that percentage

has been 31%. An examination of 8,368 recent bills reveals that 32% of these bils are being paid by Medicare. Is this evidence of a change in the percent of bills being

paid by Medicare? (a) Write appropriate hypotheses.

(b) Check the assumptions and conditions. (c) Perform the test and find the p-value. (d) State your conclusion.

(e) Do you think this difference is meaningful? Explain. (f) Interpret the p-value in the context of this question.

71) SDVV Chapter 12 34 page 419 PLUS EXTRA PART IN BOLD

Production managers on an assembly line must monitor the output to be sure that the level of defective products remains small. They periodically inspect a random sample of the items produced. If they find a significant increase in the proportion of

items that must be rejected, they will halt the assembly process until the problem can be identified and repaired. Write the null and alternative hypotheses for this

problem. (Hint: the population proportion of defective parts is not given, so must be a worded set of hypotheses.) Continued on next page.


a) In this context, what is a Type I error?

b) In this context, what is a Type II error? c) Which type of error would the factory owner consider more serious?

d) Which type of error might customers consider more serious? 72) A factory believes that the average cost of finishing a part after it comes out of

the mould is $260. A new design is supposed to lessen this average cost and so, to see if this is the case, a random sample of 27 parts built to the new design is taken

and their finishing costs measured. The sample has a mean of $253.80 and the standard deviation of $20.

a) What are the null and alternative hypotheses?

b) What conditions or assumptions will you need to assume in order to carry out inference?

c) Test the hypothesis at the 5% level. d) State your conclusion.

e) If the new design actually works have you made an error? If so, what kind of error?

f) What 2 things could you do to decrease the probability of making this kind of

error? g) Suppose that this study was done by the engineer who came up with this new

design and is keen to prove that their new design is better than the old one. Explain how they could modify

(1) The way they conducted the test and took the sample (2) The way they did the inference in order to “prove” what they wanted to.

h) Explain why the manufacturer should not proceed as suggested in part (g).

73) Insurance companies track life expectancy information to assist in determining the cost of life insurance policies. Last year, the average life expectancy was 77 years. A particular insurance company wants to determine if their clients have a

longer life expectancy, on average, so they randomly sample 20 of their recently paid policies and find the sample mean was 78.6 years with a standard deviation of 4.48

years. Is there significant evidence that life expectancy has increased? a) What are the null and alternative hypotheses?

b) What conditions or assumptions will you need to assume in order to carry out inference?

c) Test the hypothesis at the 5% level.

d) State your conclusion.

74) SDVV Chapter 12 Exercise 6 page 416 PLUS EXTRA PART IN BOLD For each of the following situations, find the critical value for z or t.

Draw a picture for each of parts a-f, labelling the rejection region(s). (a) Ho: µ = 105 vs HA: µ ≠ 105 at α = 0.05, n = 61, σ unknown

(b) Ho: p = 0.05 vs HA: p > 0.05 at α = 0.05

(c) Ho: p = 0.6 vs HA: p ≠ 0.6 at α = 0.01

(d) Ho: p = 0.5 vs HA: p < 0.5 at α = 0.01, n = 500

(e) Ho: p = 0.2 vs HA: p < 0.2 at α = 0.01

(f) Ho: µ = 10 vs HA: µ > 10 at α = 0.05, n = 30, σ unknown



For each type of the following scenarios, state whether a Type I, a Type II, or neither error has been made. (a) A test of Ho: µ = 20 vs HA: µ > 20 rejects the null hypothesis. Later it is discovered that

µ = 19.9 (b) A test of Ho: p = 0.7 vs HA: p < 0.7 fails to reject the null hypothesis. Later it is

discovered that p = 0.8

(c) A test of Ho: p = 0.4 vs HA: p ≠ 0.4 rejects the null hypothesis. Later it is discovered

that p = 0.55

(d) A test of Ho: p = 0.6 vs HA: p < 0.6 fails to reject the null hypothesis. Later it is

discovered that p = 0.5

76) SDVV Chapter 12 Exercise 16 page 417 Analysts evaluating a new program to encourage customer retention in a test market

find no evidence of an increased rate of retention in a test of 2,000 customers. They based this conclusion on a test using α = 0.01.

Would they have made the same decision at α = 0.05.? How about α = 0.001?

Explain.

77) SDVV Chapter 14 Exercise 12 page 496 PLUS EXTRA PARTS IN BOLD

To complete the poll reported in Exercise 9, (in Chapter 14 of text), Pew research surveyed respondents by telephone, drawing a random sample of landlines and

another random sample of cell phones. For those numbers that were valid, they report the following:

Are the results they find independent of the telephone type?

(i) Write the hypotheses.

(ii) Check the conditions.

(a) Under the usual null hypothesis, what are the expected values? (b) Compute the ² statistic.

(c) How many degrees of freedom does it have. (d) What do you conclude.

(e) Standardize the cell’s residual for Land and No Answer/Busy. Briefly comment.

Land Cell Total

No Answer/Busy 552 42 594

Voicemail 3347 2843 6190

Contact 8399 8612 17,011

Total 12,298 11,497 23,795


78) An online bookstore wants to determine if coupon redemption is independent of

gender. After a special coupon broadcast to its reward members, the following data on coupon redemption at checkout were collected.

Coupon redeemed?

Yes No Total

Gender Male 66 66 132

Female 125 74 199

Total 191 140 331

Perform the appropriate hypothesis test at the 5% level of significance, checking conditions.

79) A manufacturing plant for recreational vehicles receives shipments from three different parts vendors. There has been a defect issue with some of the electrical

wiring in the recreational vehicles manufactured at the plant. The plant manager believes that the defect issue is the fault of parts received from the plant’s parts

vendors. The plant manager reviews a sample of quality assurance inspections from the last six months.

Parts vendors

Perfect

Parts Co.

Made-4-U

Co.

25 Hours

Parts Co.

Part rejected 53 48 70

Part perfect 93 71 88

Part not perfect but acceptable 22 31 22

Perform the appropriate hypothesis test at the 5% significance level, checking required conditions.


80) The management of a chain of package delivery stores would like to develop a

model for predicting the weekly sales (Y, in thousands of dollars) for individual stores based on the number of customers who made purchases (X). A random

sample of 20 stores was selected from among all stores in the chain, and the following is the EXCEL output for a linear regression.

a) Write down the estimated linear regression equation.

b) Write down and interpret the correlation coefficient.

c) Write down and interpret the coefficient of determination.

d) Interpret the slope.

e) Test the significance of the slope using = 0.01.

Justify your choice between a one-tail and a two-tail alternative hypothesis.

f) Write down and interpret the 95% confidence interval for the intercept.


81) Excel was used to create a linear relationship between the capacity of disc

drives, in terabytes, (TB) and price (in dollars) based on a sample of disc drives. Here is the output. SUMMARY OUTPUT


Multiple R 0.933548

R Square 0.871512

Adjusted R Square0.86657


Observations 28

ANOVA


Regression 1 288117.653 288117.7 176.3534 4.3E-13

Residual 26 42477.532 1633.751

Total 27 330595.185

CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%

Intercept 8.069286 13.6444352 0.591398 0.559361 -19.9773 36.11582

Capacity (TB) 73.44357 5.53046709 13.27981 4.3E-13 62.07553 84.81161

0

100

200

300

400

0 50 100 150

Pri

ce ($

)

Sample Percentile

Normal Probability Plot

-100

-50

0

50

100

0 2 4 6Re

sid

ual

s

Capacity (TB)

Capacity (TB) Residual Plot

0

100

200

300

400

0 1 2 3 4 5

Pri

ce ($

)Capacity (TB)

Scatterplot of Price ($) against Capacity (TB)

(a) Check, to the extent possible, the regression conditions.

(b) Write down the estimated regression equation. (c) Interpret the intercept. Does this make sense in the context of this question?

(d) Interpret the slope. (e) Write down and interpret the p-value of the slope.

(f) Test the significance of the slope against a suitable alternative hypothesis, justifying your choice of alternative hypothesis. (g) Test whether the intercept is significantly different from zero, using the 5% level of

significance. (h) Test whether the slope is significantly greater than 70, at the 5% level of

significance. (i) When testing the significance of the population correlation coefficient, the

calculated t statistic is 13.27981. What else has the same calculated t statistic? Explain why it is not surprising that these two have the same calculated t-statistic.


82) Nutritional information was collected on a number of muesli bars to investigate

a possible relationship between the number of calories and the protein content, (in grams) per serve. Excel output is provided below. SUMMARY OUTPUT


Multiple R 0.81815

R Square 0.66937



Observations 48

ANOVA


Regression 1 100316.828 100316.828 93.12839453 1.24701E-12

Residual 46 49550.6671 1077.18842

Total 47 149867.495


Intercept 95.50172 9.70662211 9.83882099 6.8067E-13 75.96330041 115.040134

Protein 12.53007 1.29841164 9.65030541 1.24701E-12 9.91650179 15.1436359

-100

-50

0

50

100

0 5 10 15 20Re

sid

ual

s

Protein

Protein Residual Plot

0

100

200

300

400

0 5 10 15 20

Cal

ori

es

Protein

Scatterplot of Calories against Protein

0

100

200

300

400

0 50 100 150

Cal

ori

es

Sample Percentile

Normal Probability Plot

0

5

10

15

-60

-40

-20 0 20 40 60

Mo

re

Fre

qu

en

cy

Bin

Histogram of Residuals

Frequency


(a) Check, to the extent possible, the regression conditions.

(b) Write down the estimated regression equation. (c) Interpret the slope.

(d) Test the significance of the slope against a suitable alternative hypothesis, justifying your choice of alternative hypothesis. (e) Test whether the intercept is significantly different from zero, using the 5% level

of significance. (f) Test whether the slope is significantly greater than 10, at α of 5%.

(g) When testing the significance of the population correlation coefficient, the calculated t-statistic is 9.6503. What else has the same calculated t statistic? Explain why it is not surprising that these two have the same calculated t-statistic.


Each of the following scatterplots (a) to (d) shows a cluster of points, and one “stray” point. For each, answer questions (1) to (4).

(HINT: answer as follows (a): 1 to 4; (b): 1 to 4 etc.) (1) In what way is the point unusual? Does it have high leverage, a large

residual, or both?

(2) Do you think that point is an influential point? (3) If that point were removed from the data, would the correlation become

stronger or weaker? Explain. (4) If that point were removed from the data, would the slope of the

regression line increase, decrease or remain the same? Explain.

84) Below are residual plots for 3 separate linear regressions. Tell what each of the

following residual plots indicate about the appropriateness of the linear model that was fit to the data.

-15

-10

-5

0

5

10

15

0 2 4 6 8 10 12

(a) (b) (c)

-10

-5

0

5

10

15

0 5 10 15

-10

-5

0

5

10

15

0 5 10 15


85) This question relates to shows on Broadway for most weeks of 2006-2008 and is based on SDVV Chapter 17, Exercise 12, page 614.

Use the computer output below, which differs slightly from the output shown in the text, because of rounding. The response variable is Receipts, $m, and the explanatory variables are Paid Attendance, (thousands), #Shows (the number of shows) and Average Ticket Price, ($).


a) Check, to the extent possible, the regression conditions. b) If we found a simple linear regression to predict receipts only from paid

attendance, what would the R2 of that regression be?

c) Write out the multiple linear regression model. d) What does the coefficient on average ticket price mean in this regression?

Does that make sense? e) Estimate the receipts in a week in which the paid attendance was 300,000

customers attending 35 shows at an average ticket price of $70. f) Is this a good prediction? Why do you say that? g) Test the significance of the coefficient on shows at the 5% level.

h) Test the overall significance of the equation.

86) SDVV Chapter 17 Exercise 2 page 612 A candy maker surveyed chocolate bars available in a local supermarket and found the following least squares regression model, (ie: linear regression model):

Estimated calories = 28.4 +11.37 Fat (g) + 2.91 Sugar (g).

(a) The hand-crafted chocolate she makes has 15g of fat and 20g of sugar. How many calories does the model predict for a serving?

(b) In fact, a laboratory test shows that her candy has 227 calories per serving. Find

the residual corresponding to this candy. (Be sure to include the units).

(c) What does that residual say about her candy?

SUMMARY OUTPUT


Multiple R 0.999

R Square 0.999



Observations 78

ANOVA


Regression 3 484.788 161.596 18633 2.122E-106

Residual 74 0.642 0.009

Total 77 485.430


Intercept -18.320 0.3127 -58.6 0.000 -18.943 -17.697

Paid Attendance 0.076 0.0006 120.8 0.000 0.075 0.077

Shows 0.007 0.0044 1.6 0.116 -0.002 0.016

Avg Ticket Price 0.24 0.0039 61.0 0.000 0.231 0.246


87) Continuing with the muesli bar example. Another variable was added to the regression; dietary fibre (in grams). This was to

investigate a possible relationship between the number of calories, protein (in grams) and the dietary fibre content, (in grams) per serve. Assume that the conditions for inference with regression have been satisfied. Excel output is provided below. SUMMARY OUTPUT


Multiple R 0.819285677

R Square 0.671229021



Observations 48

ANOVA


Regression 2 100595.4 50297.7059 45.93669738 1.34928E-11

Residual 45 49272.08 1094.93518

Total 47 149867.5


Intercept 92.7230491 11.23018 8.25659157 1.46317E-10 70.10429586 115.341802

Protein 12.41686809 1.328161 9.34891453 4.10306E-12 9.741813568 15.0919226

Fibre 1.125161988 2.230648 0.50441032 0.61643412 -3.367594182 5.61791816

(a) Write out the regression equation. (b) Interpret the coefficient of fibre. (c) If a muesli bar has 5 grams of protein and 3 grams of fibre, how many calories is

it estimated to contain? (d) Is this a good prediction? Explain. (e) Test the significance of the coefficient of fibre at α of 5%.

(f) Test the overall significance of the equation.

88) Here is the regression of ross Revenue of movies (in $millions) on the budget (in $millions) of the movie and an indicator (dummy) variable, Comedy; this takes the

value 1 for movies that are comedies and 0 otherwise.

Estimated Revenue = −7.03913 + 1.00428 Budget($m) + 25.4175 Comedy

a) Write out the regression equation for movies that are comedies and the regression equation for all non-Comedy movies.

b) Sketch the 2 equations from (a) above.

c) Interpret the estimated coefficient on the variable Comedy. d) Predict the gross revenue of a movie that is a comedy, which had a budget of

only $890,000.


89) Consider the following estimated regression equation which models the

expenditure on food of single people: Estimated Exp = 5060.24 + 528.99D + 0.089Inc

where Exp is annual expenditure on food ($) D is a dummy with D = 1 for males and D = 0 for females

Inc is annual after tax income ($) a) Write out the separate equations for males and for females. b) Sketch the 2 equations from (a) above.

c) Interpret the estimated coefficient on the variable D.

d) Predict expenditure on food for a male with an annual income of $100 000.

90) Continuing with the muesli bar example. On the market are a number of muesli bars containing chocolate. We would like to

investigate how chocolate together with protein and dietary fibre, influence the calorie content of muesli bars. A dummy variable for the chocolate variable was

added to the regression; where the variable chocolate takes the value of 1 if the muesli bar contained chocolate; or takes the value of 0, if the muesli bar did not contain chocolate. Assume the conditions for regression are satisfied.

Excel output is provided below.

Table of Correlation Coefficients Protein Fibre Calories Chocolate

Protein 1

Fibre 0.16897192 1

Calories 0.818150446 0.180739 1

Chocolate 0.196456524 -0.24895 0.25773578 1 SUMMARY OUTPUT


Multiple R 0.82754132

R Square 0.68482463



Observations 48

ANOVA


Regression 3 102632.952 34211 31.8683 4.1801E-11

Residual 44 47234.5424 1073.5

Total 47 149867.495


Intercept 87.247 11.809 7.388 0.000 63.447 111.046

Protein 11.949 1.358 8.798 0.000 9.212 14.687

Fibre 2.054 2.309 0.889 0.379 -2.600 6.708

Chocolate 14.207 10.312 1.378 0.175 -6.576 34.989


(a) If we found a regression to predict calories only from fibre, what would the R of

that regression be? (b) If we found a regression to predict calories only from chocolate, what would the

R2 of that regression be? (c) Write out the regression model. (d) What does the coefficient on chocolate mean in this regression? Does it make

sense? Explain. (e) Estimate the number of calories for a muesli bar that has 4 grams of protein, 3

grams of fibre and contains chocolate. (f) Is this a good prediction? Why do you say that? (g) Test the significance of the coefficient of chocolate at α of 5%.

(h) Test the overall significance of the equation using the F test.

91) This table shows a Laspeyres index number which has been calculated to

measure the change in prices of all inputs used in a production process.

Year 2005 2007 2009 2011

Index Number 100 149 157 193

a) Interpret the value for 2007. b) Change the base of the index number to 2009. c) In this example, would you expect a Paasche price index number to be higher

or lower than the corresponding Laspeyres price index number? Explain your answer.

92) The data in this question are the number of people entering Australia from New

Zealand for short term stays, quarterly, 1991 to 2012. The source is ABS, 3401.0 Overseas Arrivals and Departures, Australia, Table 5: Short-term Movement, Visitor

Arrivals---Selected Countries of Residence: Original. The plot below is from EXCEL:

0

50000

100000

150000

200000

250000

300000

350000

Ma

r-1

99

1

Ma

y-1

99

2

Jul-

19

93

Se

p-1

99

4

No

v-1

99

5

Jan

-19

97

Ma

r-1

99

8

Ma

y-1

99

9

Jul-

20

00

Se

p-2

00

1

No

v-2

00

2

Jan

-20

04

Ma

r-2

00

5

Ma

y-2

00

6

Jul-

20

07

Se

p-2

00

8

No

v-2

00

9

Jan

-20

11

Ma

r-2

01

2Visitor Arrivals: New Zealand


a) Describe which of the 4 components of time series you see in this chart and

which you do not see. b) The trend equation is given by Trend = 303,027 + 2509 t

where t is time in quarters, with origin March quarter 2011. Interpret the trend equation.

c) Calculate the trend for the March quarter of 2014.

d) Is your answer in part (c) likely to be a good estimate of the number of people entering Australia from New Zealand for short term stays in the March quarter

of 2014? Explain. 93) You are the manager of a large Australian seaside resort and have used EXCEL

to create the following multiple linear regression model:

Estimated Occupancy = 285 + 142Q1 – 197Q2 + 250Q4 + 5t with r = 0.8175 where the origin is March 1998. Q1 is the dummy variable for the March quarter,

where Q1 = 1 if the quarter is March, otherwise it is 0. Q2 is the dummy variable for the June quarter and Q4 is the dummy variable for December quarter, similarly.

(a) Describe which of the 4 components of time series you see in this chart and

which you do not see. (b) Interpret the coefficient of determination.

(c) Interpret the coefficient of trend in the regression model (d) Why is there no dummy variable for the September quarter? (e) Interpret the coefficient of the December quarter?

(f) The coefficients of the dummies for Q1 and Q4 are positive, but Q2 has a negative coefficient. Is this a mistake? Explain in the context of this question.

(g) Predict the occupancy of this resort in the March quarter of 2016. (h) Predict the occupancy of this resort in the September quarter of 2016.

(i) Using your predictions in the previous parts, how much of the difference between your predictions of occupancy in March 2016 and in September of 2016, is due to trend and how much of the difference is due to seasonal effects?

0

100

200

300

400

500

600

700

800

900

Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3

Hotel occupancy of Large Australian Seaside Resort origin March 1998


94) We have monthly number of new passenger cars sold in SA.

Source: ABS, 9314.0, Table 2. The EXCEL output below has a time series plot and a linear regression model. The extra variables are

Month which measures time in months with origin 0 in January 2010 and dummies for each month, so Feb = 1 if the month is February, 0 otherwise, Mar = 1 if the month is March, 0 otherwise and so on.

a) Why is there no dummy variable for January in the model? b) What does the coefficient on Month tell you?

c) The coefficients on all the dummies are positive. Is this a mistake? What does it mean if all these coefficients are positive?

d) Forecast sales of new passenger cars in SA for December 2013 and January 2014.

e) OPTIONAL: See if you can find the actual sales for either of these months: go to the ABS website (abs.gov.au) and find

the statistics on passenger vehicle sales. You can try typing “passenger vehicles “into the search box (or there other ways to search).

SUMMARY OUTPUT


Multiple R 0.66

R Square 0.44



Observations 228

ANOVA


Regression 12 16122509.9 1343542.5 13.9 3.8283E-21

Residual 215 20811471.7 96797.5

Total 227 36933981.6


Intercept 2740.23 76.08 36.02 0.00 2590.27 2890.19

Month 2.03 0.31 6.48 0.00 1.41 2.65

February 442.39 100.94 4.38 0.00 243.43 641.35

March 596.93 100.94 5.91 0.00 397.97 795.90

April 117.32 100.95 1.16 0.25 -81.65 316.29

May 526.76 100.95 5.22 0.00 327.79 725.74

June 930.73 100.95 9.22 0.00 731.74 1129.72

July 442.96 100.96 4.39 0.00 243.96 641.96

August 514.67 100.97 5.10 0.00 315.66 713.67

September 460.00 100.97 4.56 0.00 260.98 659.02

October 526.76 100.98 5.22 0.00 327.72 725.80

November 640.94 100.99 6.35 0.00 441.88 839.99

December 537.38 101.00 5.32 0.00 338.30 736.45


95) This question is to give you a feel for how you would use EXCEL in time series analysis. However to keep the problem manageable, we will only use a small data

set. Be aware that in practice we often use huge datasets! Use the data on the number of live sheep exported, in thousands, quarterly, from ABS, 7215.0 Table 6. We will use only the data for 2003 to 2012 inclusive.

a) Go to the ABS website, abs.gov.au and choose statistics, then by catalogue number (since we know the catalogue number here); choose 7, then 72 and

then 7215.0. Go to downloads and download Table 6. b) Copy the data for 2003 to 2012 inclusive for sheep (first column). c) Construct a line fit plot of the data.

d) Comment on which of the components of time series you see in this plot. e) Construct dummy variables for the quarters. To avoid the dummy variable trap,

we need 3 so let’s take dummies for June, September and December quarters. Do this by simply creating new variables with names in the first row and then

type in 0 or 1 as appropriate. For example, for a June quarter observation, you will type 1 for the June dummy, 0 for the September and December dummies.

f) Construct a new variable called t for time in quarters. Make the origin the March

quarter of 2010: that is, make t=0 for Mar 2010. g) Run the regression as usual, including the dummy variable as well as the time

variable. h) Write out your estimated equation.

i) Interpret the coefficient on t. j) Interpret the coefficient on the September dummy. k) Comment on the seasonal pattern of your data.

l) Now consider December 2004. What was the actual value then? What would your model have predicted? What is the irregular component for that month?

96)

The number of employees in a small firm is given by

Number = 12 + Q1 − 2Q2 + 3Q4 + 1.5t

where t is time in quarters, with origin March 2011 Q1 is a dummy for the March quarter Q2 is a dummy for the June quarter

Q4 is a dummy for the December quarter

a) Write out the equations for each of the 4 quarters. b) Sketch the 4 equations from (a).

c) Comment on the seasonal pattern to the number of employees. d) Forecast the number of employees for all quarters of 2013. 2014 and 2015. e) Plot the forecasts in part (d) above against time and comment on the plot.

f) Consider the estimates for the December quarter of 2014 and the June quarter of 2015. How much of the difference between these two estimates is due to

seasonal effects and how much is due to trend?


97) Given the price index number for a particular item:

Year 2007 2008 2009 2010 2011 2012

100 112.5 117 119 131 152

a) Change the base to 2010. b) Calculate the % price increase from 2010 to 2012 by using the index in (a).

c) Calculate the % price increase from 2010 to 2012 by using the original index in the table above.

d) Compare your answers to (b) and (c).

98) This table lists values of two index numbers of median annual family income in XYZCountry:

a) Calculate the missing values of each index. b) Interpret the 2012 value for each index.

c) Can you say that one index is better than the other? Why or why not?

99) Data on the cost of living in Adelaide is provided for the guidance of international visitors. Here are some prices and costs for 2 years, 2005 and 2009.

Price ($) Cost ($ per

month)

2005 2009 2005 2009

Bowl of noodles 4.50 5.50 45 66

Slice of pizza 3.00 5.00 39 50

Bottle of water (600ml) 2.00 2.00 40 42

Takeaway coffee 2.50 3.00 25 27

McDonalds Big Mac 3.30 3.60 23.10 28.80

a) Calculate simple price relatives for the 4 items for 2009 using 2005 as base. b) Calculate the unweighted average of the simple price relatives calculated in (a).

What does this index mean? c) Compute a Laspeyres price index for 2009 with 2005 as base. d) Interpret the index you calculated in (c) above.

e) Compute a Paasche price index for 2009 with 2005 as base. f) Compute a Fisher price index for 2009 with 2005 as base.

g) Arrange the Laspeyres, Paasche and Fisher indexes in order; is this ordering what you expected?

Year Base 2005 index Base 2010 index

2005 100

2006 103

2007 105

2008 107

2009 110 98

2010 100

2011 101

2012 105

2013 107


100)

This table shows a Laspeyres index number which has been calculated to measure the change in prices of all inputs used in a production process.

a) Interpret the value for 2011. b) What has been the percentage change in input prices between 2011 and 2012? c) Change the base of the index number to 2012.

d) State ONE reason why you might want to change the base of an index number. e) Would you expect a Paasche index to be higher or lower? Explain your answer.

Year 2009 2010 2011 2012

Price index 100 121.8 132.5 142.0

Stat questions Semester 1

Documents

Transcript of Stat questions Semester 1