Stat questions Semester 1
-
Upload
praveen-muthukumarana -
Category
Documents
-
view
112 -
download
7
description
Transcript of Stat questions Semester 1
3
SCHOOL OF ECONOMICS THE UNIVERSITY OF ADELAIDE
2015 ECON 1008: Business and Economic Statistics I
PRACTICAL QUESTIONS 2 SECTIONS:
SECTION A: Multiple choice questions (MCQ) SECTION B: Worked (short) ANSWER QUESTIONS (WAQ)
All MCQ questions and short answer questions set for practicals (tutorials) are taken from this set of PRACTICAL QUESTIONS.
Each Wednesday, students are asked to read an Announcement on the BES MyUni site letting students know the complete set of questions for the next Week’s practical
(tutorial class). In the practical (tutorial class), students will be assessed on a sample of these questions set for that Week. Students should prepare solutions to the set of MCQ and WAQ in preparation.
Students receive solutions to MCQ and WAQ by attending tutorials and doing the
questions that are tested/submitted and checked by their tutor or asking the study coach during study coach sessions.
PLEASE DO EVERY QUESTION THAT IS REQUIRED TO HAND UP FOR YOUR ASSESMENT.
_______________________________________________
SECTION A: Multiple-choice questions 1) Observations about a continuous quantitative variable,
a) can be made in only two categories. b) must be made in more than two distinct categories.
c) can assume values at all points on a scale of values, with no breaks between possible values.
d) can assume values only at specific points on a scale of values, with inevitable gaps between.
e) can take integer (whole number) values only.
2) A poltical candidate wants to ascertain her chances of winning a seat in the next coming election. There are 50,940 registered voters in this political candidates
electrorate. From a survey of 1,000 registered voters in the electrorate, 52% of these voters stated that they would vote for her. What is the population of interest? a) 52% that would vote for her. b) The 50,940 registered voters.
c) The 1,000 voters surveyed. d) The 49,940 registerd voters that were not surveyed.
e) The 48% of voters surveyed that would not vote for her.
4
3) Which of the following is most likely a population as opposed to a sample?
a) Repondents to a phone poll. b) The first five students, in a class of 50 students, to submit a project.
c) Every tenth person that arrives at a bank. d) Registered voters in a country.
e) 50 people you replied to a survey on the back of a ceral box.
4) Which one of the following represents cross-section data? a) Listings of the closing prices of different securities traded at the New York Stock
Exchange on June 30, 2010. b) Listings of the profits of a given company during each of the last 15 years.
c) Listings of average annual profits of 10 companies over the last 5 years. d) Listings of the closing prices of 5 different securities at the end of each of 5
consecutive trading days.
e) Listings of the closing prices of a particular share at the end of each of 5 successive business days.
5) One of the BES teams for their mini-project, decided to estimate the proportion of
Bachelor of Economics students at the University of Adelaide who regularly buy
their lunch. They took their sample at Hub Central one particular Monday. What is their population?
a) Students at the University of Adelaide who regularly buy their lunch. b) University of Adelaide students. c) Bachelor of Economics students at the University of Adelaide.
d) University of Adelaide students who were at Hub Central that Monday. e) Bachelor of Economics students at University of Adelaide who were at Hub
Central that Monday.
6) Suppose a sporting authority decides to randomly test 400 athletes at a sporting
event for steroid use. They separate the athletes by gender and take a random sample of 20 females and twenty males for testing. What type of sample is this?
a) Stratified b) Simple Random
c) Convenience d) Cluster e) Convenience.
5
7) A population is made up of groups that have wide variation within each group but
have little variation from group to group. Which of the following is the best type of sampling method to use to sample from this population?
a) Systematic. b) Stratified.
c) Convenience. d) Simple Random. e) Cluster.
8) A local radio show asks callers to call and give their opinion on whether live sheep exports should be banned. Which of the following is this survey method most likely
to suffer from? a) Nonresponse bias b) Response bias
c) This survey will only suffer from sampling error. d) This survey will not suffer from any bias.
e) Voluntary response bias.
9) What happens when a random sample is made larger?
a) One can eliminate bias inherent in a smaller sample.
b) One can eliminate sampling error. c) One can reduce, but not eliminate, bias inherent in a smaller sample.
d) One can reduce, but not eliminate, sampling error. e) One can reduce, but not eliminate, sampling error and bias.
10) Which of the following statements best describes the difference between a
sampling frame, the target sample and the actual sample? a) The sampling frame is the list from which the sample is drawn; the target
sample is all the individuals who are asked to participate in the survey; and the actual sample is the respondents.
b) The sampling frame is the list from which the sample is drawn; the target sample is all the individuals who may not participate in the survey; and the
actual sample is the respondents.
c) The sampling frame is the list from which the sample is drawn; the target sample may be smaller than the actual sample.
d) The sampling frame is the list from which the sample is drawn; the target sample is the population; and the actual sample is the respondents.
e) The sampling frame is the population; the target sample is all the individuals
who are asked to participate in the survey; and the actual sample is the
respondents.
6
For Questions 11) to 14) use the following table, that shows people's ages
cross classified against their attitude to proposed new drink driving laws:
11) What proportion of people who agree with the new laws happen to be under 20?
a) 10/250 b) 10/250
c) 10/70 d) 10/50
e) None of these choices is correct.
12) (Refer to the previous table) Of the people who are under 20, what proportion disagree with the new laws?
a) 10/250 b) 20/250
c) 20/70 d) 20/50 e) 50/70
13) (Refer to the previous table) What % of people are under 20 and disagree with the new laws? a) 10/250
b) 20/250 c) 20/70
d) 20/50 e) 50/70
14) (Refer to the previous table) Which of the following best describes the answer 30/100 (Hint: the value 30 comes from the cell in the middle of the table and the
value 100 is a total of one of the rows or columns) a) 30% of people are between 20 to 40
b) 30% of people who disagree with the new laws are between 20 to 40 c) 30% of people disagree with the new laws d) 30% of people who are between 20 to 40 disagree with the new laws
e) 30% of people are 20 to 40 year olds and disagree with the new laws.
Agree Disagree Don't care
Under 20 10 20 20
20 to 40 20 30 50
Over 40 40 20 40
7
15) Combining percentages inappropriately across categories in a contingency table
can yield incorrect conclusions. This is known as a) Bias
b) Frequency c) Non-sampling error
d) Simpson’s Paradox e) Sampling error
16) If a test was generally very easy, except for a few students who had very low scores, then the distribution of scores would be:
a) Normal b) Uniform c) Symmetric
d) Positively skewed e) Negatively skewed
17) Which of the following is true when a frequency distribution exhibits positive
skewness?
a) The median exceeds the mean. b) The mean exceeds the median.
c) The median and mode are both greater than the mean. d) The variance exceeds the standard deviation. e) The standard deviation exceeds the range.
18) In a group of 12 scores, if one of the scores is increased by 36 points. What effect
will this have on the mean of the scores? a) Increase by 12 points. b) Increase by 0.33 points
c) Increase by 3 points. d) Increase by 36 points
e) Remain unchanged.
19) Which of the following may easily be found near the either extreme of a data set?
a) The mean. b) The median.
c) The mode. d) The mean and the mode. e) All of the mean, median and mode.
8
20) The difference between a histogram and a bar chart is that:
a) The histogram reflects qualitative data; the bar chart represents quantitative data.
b) The adjacent rectangles in a histogram have a gap between them; those in a bar chart do not.
c) The histogram reflects actual numbers; the bar chart represents percentages. d) Adjacent rectangles in a bar chart have a gap between them; those in a
histogram do not.
e) There is no practical difference apart from the name.
21) Do men and women run a 5 kilometre race at the same pace? Here are box plots
of the time (in minutes) for a recent race. Which of the following best describes the
box plots?
a) Women appear to run about 3 minutes faster than men, and the two distributions have different IQR.
b) Men appear to run about 3 minutes faster than women, but the two distributions are very similar in shape and spread.
c) Men appear to run about 10 minutes faster than women, but the two distributions are very similar in shape and spread.
d) Women appear to run about 3 minutes faster than men, but the two
distributions are very similar in shape and spread. e) Men appear to run about 3 minutes faster than women, and the two
distributions have different IQR.
22) Anthony’s Pizza offers free delivery of their pizza. The following summary
information concerns the time of deliveries: mean time is 20 minutes, median time is 18 minutes, the first quartile is 12 minutes, and the third quartile is 25 minutes.
What percent of the deliveries take more than 12 minutes? a) 50 percent b) 75 percent
c) 25 percent d) 95 percent
e) Cannot tell without knowing if the normal model is applicable.
9
23) Which of the following statements is true regarding the standard deviation?
a) It cannot assume a negative value. b) If it is zero, then all the data values are the same.
c) It is in the same units as the mean. d) All the above are all correct.
e) None of the above is correct.
24) Find the variance of the sample 4, 4, 5, 6, 6, 7, 10
a) 1.93 b) 4.33 c) 2.38
d) 3.71 e) 6.00
25) A standard score is best described by which of the following statements:
a) The numbers of standard deviations between a particular observation and the
mean of all observations in a data set. b) The difference between an observed value and the standard deviation, divided
by the mean. c) How many means away from the standard deviation a particular observation is
located.
d) The value that would be expected if a randomly chosen observation was selected.
e) The value that would be expected in the long run.
26) Student Alex scored 66% in a macroeconomics exam in which the class average was 52% and the variance 36%2 whilst student Chris scored 52% in a
microeconomics exam in which the class average was 73% and the variance 64%2. Find the standardised scores for each and decide which student’s score was
relatively more unusual. a) Chris’s mark was relatively more unusual than Alex’s; the standardised scores
were 2.33 for Alex and −2.63 for Chris.
b) Chris’s mark was relatively more unusual than Alex’s; the standardised scores were 2.33 for Alex and 2.63 for Chris.
c) Alex’s mark was relatively more unusual than Chris’s; the standardised scores were 2.33 for Alex and −2.63 for Chris.
d) Alex’s mark was relatively more unusual than Chris’s; the standardised scores
were 0.39 for Alex and 0.33 for Chris. e) Alex’s mark was relatively more unusual than Chris’s; the standardised scores
were 0.39 for Alex and −0.33 for Chris.
10
27) You want to compare your scores for two different subjects. You scored 68% for
Macroeconomics, where the class mean was 63% and the variance was 4 %2. You
scored 70% in Microeconomics, where the class average was 60% and the
variance was 25%2. Which of the following correctly describes a comparison of
your final scores in these two subjects?
a) You did better in Macroeconomics with a Z-score of 1.5 compared to a Z-score
of 0.40 with Microeconomics. b) You did better in Macroeconomics with a Z-score of 2.5 compared to a Z-score
of 2 with Microeconomics. c) You did better in Microeconomics with a Z-score of 0.40 compared to a Z-score
of 1.25 with Macroeconomics.
d) You did better in Microeconomics with a Z-score of 70% than in Macroeconomics with a Z-score of 63%.
e) You did better in Microeconomics with a Z-score of 2 compared to a Z-score of 2.5 with Macroeconomics.
28) A science instructor assigns a group of students to investigate the relationship
between the pH of the water of a river and the hardness of the water (measured in grains). Some students wrote these conclusions: "there was a very strong
correlation of 1.45 grains between pH of the water and water's hardness." Is it appropriate to calculate the correlation coefficient in this example? a) No: the correlation coefficient is unit-free.
b) No: correlation cannot be greater than 1. c) No: the relationship may not be linear.
d) All of the above. e) None of the above.
29) Which the correlation coefficient r = 1.00 then which of the following
statements is correct? a) All the data points must fall exactly on a straight line with a slope that equals
1.00 b) All the data points must fall exactly on a straight line with a negative slope c) All the data points must fall exactly on a straight line with a positive slope
d) All the data points must fall exactly on a horizontal straight line. e) All the data points must be identical; there is really only one point.
30) In a scatter diagram, observed data points that lie above the estimated regression line: a) Involve positive residuals
b) Involve negative residuals c) Must be wrong because regression minimises errors
d) Must be outliers e) Involve squared errors because regression minimises squared errors
11
31) Several scatterplots are below, numbered (1) to (4).
Several correlation coefficients are below, labelled A to D. Match the correlation coefficients to the scatterplots.
A = −0.962 B = −0.091 C = 0.719 D = 0.921
(1) (2)
(3) (4)
a) 1A, 2C, 3D, 4B
b) 1C, 2D, 3B, 4A
c) 1B, 2D, 3C, 4D
d) 1A, 2C, 3B, 4D
e) 1D, 2B, 3A, 4C
12
32) Which of the following plots of residuals suggests that a linear model may not be applicable?
I II
III IV
a) IV b) III c) I
d) II e) None of these choices is correct.
USE THE FOLLOWING EXCEL OUTPUT TO ANSWER THE NEXT 5 QUESTIONS
EXCEL output below is the regression of exam marks (%) on attendance (number
of tutorials attended during the semester) for a sample of BES students. SUMMARY OUTPUT
Regression Statistics
Multiple R 0.811
R Square 0.658
Adjusted R Square
0.645
Standard Error 13.983
Observations 27
ANOVA
df SS MS F Significance F
Regression 1 9413.176 9413.176 48.142 0.000
Residual 25 4888.231 195.529
Total 26 14301.407
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 19.914 5.620 3.543 0.002 8.339 31.489
attendance 3.884 0.560 6.938 0.000 2.731 5.037
Vivian Piovesan Cavaiuolo 2015 Page 13
33) Using the Excel output provided, which of the following is the most correct regression equation (model)?
a) Estimated Attendance = 19.914 + 3.884 Exam marks b) Estimated Exam marks = 19.914 + 3.884 Attendance c) Attendance = 19.914 + 3.884 Exam marks
d) Exam marks = 19.914 + 3.884 Attendance e) None of these choices is correct.
34) According to this model, which of the following best describes the estimated effect
on the exam mark if students attend one extra BES tutorial during the semester? a) All students get an extra 19.914 points.
b) All students get an extra 3.884 points. c) Students get an extra 3.884 points, on average.
d) Students get an extra 19.914 points, on average. e) Students get an extra 23.754 points.
35) One student who attended 6 tutorials during the semester, scored 88 points in the
final exam. What final mark did the regression model predict for this student? a) 23.3 points.
b) 43.2 points. c) 88 points. d) 29.8 points.
e) 23.8 points.
36) What percentage of the variation in exam marks is accounted for by this linear
relationship with tutorial attendance?
a) 81.1% b) 65.8%
c) 64.5% d) 19.9%
e) 13.9%
37) Using this regression output of exam marks on tutorial attendance; find the residual of a student who attended 4 tutorials during the semester, given that they
actually scored 50 points in the exam. a) 50 – 19.914 = 30.09 points.
b) 50 – 3.884(4) = 34.46 points. c) 3.884(4) = 15.54 points. d) 19.914 + 3.884(4) = 35.45 points.
e) 50 – [19.914 + 3.884(4)] = 14.55 points.
Vivian Piovesan Cavaiuolo 2015 Page 14
38) A random spinner can land on red, green, blue, or yellow. If on the first three spins it lands once each on red, green, and yellow, is it more likely to land on blue
on the fourth spin? a) No, because the spins are disjoint events. b) No, because knowing one outcome will not affect the next.
c) Yes, because the Law of Avrages dictates that all outcomes should occur. d) Yes, because every colour is equally likely to occur.
e) Yes, because the spinner shows randomness.
39) Suppose you roll 2 dice; which of the following are disjoint (mutually exclusive)
events? a) Getting a sum of 6; getting doubles.
b) Getting a sum of 2; getting doubles. c) Getting a 1 on the first die; getting a sum of 6.
d) Getting a 5 on the first die; getting doubles. e) Getting a sum of 7; getting doubles.
40) If A and B are 2 events such that P(A) = 0.6, P(B) = 0.58 and P(A or B) = 0.70, what is P(A and B)?
a) 0.48 b) 1.18 c) 0.70
d) 0.30 e) 0.35
41) If A and B are two independent events such that P(A) = 0.3, P(B) = 0.4 what is P(A given B)?
a) 0.40 b) 0.30
c) 0.12 d) 0.01 e) 0 because events are independent
42) If A and B are two events such that P(A) = 0.35 and P(B) = 0.45 and
P(A and B) = 0.20, what is P(B given A)? a) 0.80 b) 0.60
c) 0.20 d) 0.57
e) None of these choices is correct.
Vivian Piovesan Cavaiuolo 2015 Page 15
43) Which of the following statements is correct?
a) Independent events cannot be disjoint events. b) Marginal events are conditional events.
c) Joint events are disjoint. d) Disjoint events cannot be independent events. e) Marginal events are joint events.
44) Sue owns a medium-sized business. The probability model below describes the
number of employees that may call in sick on any given day. What is the expected value of the number of employees calling in sick each day?
a) 2.50 employees b) 2.00 employees
c) 1.80 employees d) 1.00 employees
e) 1.85 employees
45) Find the expected value for the random variable X which has probability model:
a) 0.80 units b) 0.20 units
c) 7.67 units d) 5.20 units
e) 7.40 units 46) A new lottery ticket on the market has players pay $5 to have the chance to win
up to $1,000 instantly in cash prizes. Let X be the discrete random variable of cash prizes available. The table below shows the discrete probability distribution of the
possible cash prizes available to players, with all of the corresponding probabilities. Find the standard deviation of the cash prize per game, for this $5 instant lottery
ticket, given that the expected value is $3.24.
a) 1,005.65 dollars2
b) 3.24 dollars c) 31. 71 dollars
d) 1,016.15 dollars2 e) 10.50 dollars2
X P(X)
4 0.3
8 0.5
11 ?
X P(X)
0 0.65
5 0.25
10 0.099
1,000 0.001
Vivian Piovesan Cavaiuolo 2015 Page 16
47) A discrete random variable X takes the values −2, −1, 1 and 2 but we are not
given the corresponding probability distribution. Which of the following statements is true?
a) The expected value could be 0.75 b) The expected value must be one of −2, −1, 1 or 2 c) The expected value could be 3
d) The expected value can only be 0 e) The expected value can only be 1.5
48) If X and Y are independent random variables with E[X] = 10, E[Y] = 12, V[X] = 9 and V[Y] = 16, then find the expected value of X – Y.
a) 2
b) −2 c) 22 d) −7
e) None of these choices is correct.
49) If X and Y are independent random variables with E[X] = 10, E[Y] = 12, V[X] = 9 and V[Y] = 16, then find the standard deviation of 2X – Y + 4.
a) √68 b) √52
c) √36 d) √20 e) √12
50) A traveller visited Europe and stays 30 days in 30 different hotels, paying each day with a credit card. The hotels charge an average of 50 Euros with standard deviation of 10 Euros. When the charges appear on the credit card, the Bank has
converted them to AUD by saying that 1 euro is $1.40, and has added a $5 fee for each transaction. What are the mean and standard deviation of the 30 day hotel
charges in AUD including the transactions fee? a) Mean $50, standard deviation $19
b) Mean $70, standard deviation $14 c) Mean $70, standard deviation $19
d) Mean $75, standard deviation $14 e) Mean $75, standard deviation $19
Vivian Piovesan Cavaiuolo 2015 Page 17
51) In a standard Normal model, state what values of z cuts off the middle 96%?
a) −3.00 to 3.00 b) 0 to 2.05
c) −1.75 to 1.75 d) −2.33 to 2.33 e) −2.05 to 2.05
52) Given that a normally distributed random variable has a mean of 85 and a
standard deviation of 8, what is the probability that one randomly selected value
will be less than 63? a) 0.9970
b) 1 c) 0
d) 0.0030 e) −2.75
53) The mean of a normal probability distribution is 50 with a variance of 36. What is the probability that one randomly selected value will be greater than 60?
a) 0.9525 b) 0.3897 c) 0.0475
d) 0.6103 e) 1.67
54) The mean of a normal probability distribution is 87 and the standard deviation is 4. What is the probability of a value between 82 and 90?
a) 0.7734 b) 0.1056
c) 0.8944 d) 0.2266 e) 0.6678
55) Find σ in a Normal model with μ = 0.38 if 20.05% of values are above 0.50 a) 0.20
b) 0.1 c) 0.14 d) 0.84
e) 1.43
Vivian Piovesan Cavaiuolo 2015 Page 18
56) The proportion of people who agree with building a hotel at Glenelg is 0.40. A
random sample of size 100 is drawn. What is the standard deviation of the sampling distribution of the sample proportion?
a) 0.002 b) 0.049 c) 0.240
d) 0.005 e) 0.490
57) It is known that 60% of voters in a large electorate are in favour of a particular policy. A random sample of 48 voters is taken from the electorate. What is the
probability that less than 24 of those sampled are in favour of the policy? a) 0.079
b) 0 c) 0.421 d) 0.579
e) 0.921
58) Which of the following best describes what we mean by the sampling distribution
of the sample mean?
a) The sampling distribution of the sample mean will be approximately normal even if the original population is not normal.
b) The sampling distribution of the sample mean is the population from which the sample is drawn.
c) The sampling distribution of the sample mean shows us the distribution of
possible values that the sample mean could take, generated by calculating the sample mean for repeated samples, all of the same size, from the original
population. d) The sampling distribution of the sample mean will be approximately normal if
we take many samples. e) None of these choices is correct.
59) A random sample of size 3 is drawn from a population with mean 7 and variance 9. What is the standard deviation of the sampling distribution of the sample mean?
a) 9 b) 1.73 c) 3.0
d) 5.2 e) 1.0
Vivian Piovesan Cavaiuolo 2015 Page 19
60) A random sample of n=10 is drawn from a normal population with mean 75 &
variance 90: find the probability that the sample mean of this sample exceeds 81 a) 0.9772
b) 0.9721 c) 0.7643 d) 0.0228
e) 2
61) In one of the mini-projects, a team calculated a 95% confidence interval in order
to estimate the proportion of people in Adelaide who own Apple branded phones. Assume all required conditions are met. What is the confidence interval if their
sample shows that 28 out of 55 people in Adelaide own Apple phones?
a) 0.5091 ± 1.645(0.0045) = (0.5017, 0.5165) b) 0.5091 ± 1.96(0.0674) = (0.377, 0.6412) c) 0.5091 ± 2.33(0.0674) = (0.3521, 0.6661)
d) 0.5091 ± 1.645(0.0674) = (0.3982, 0.62) e) 0.5091 ± 1.96(0.0045) = (0.5003, 0.5179)
62) In one of the mini-projects, a BES team wanted to estimate the proportion of Business students who think that knowing a foreign language is useful in today’s workforce. What sized sample should they have taken if they wanted a 95% CI to
have a margin of error of 4%? a) 25
b) 2401 c) 151 d) 423
e) 601
63) Data collected by child development scientists produced the following 90%
confidence interval for the average age (in months) at which children say their first word: 10.4 < μ(age) < 13.8.
Interpret the confidence interval: a) If we took many random samples of children, about 90% of them would
produce this confidence interval. b) We can say with 90% confidence that the mean age at which children say their
first word is between 10.4 and 13.8 months. c) We are 90% confident that a child will say his first word when he is between
10.4 and 13.8 months old.
d) 90% of the children in this sample said their first word when they were between 10.4 and 13.8 months old.
e) We are 90% confident that the average age at which children in this sample said their first word was between 10.4 and 13.8 months.
Vivian Piovesan Cavaiuolo 2015 Page 20
64) When constructing confidence intervals, for a given level of confidence, if the
sample size is decreased, a) the interval will include the parameter less often.
b) the width of the interval remains the same. c) the width of the interval increases. d) the width of the interval decreases.
e) None of these choices is correct.
65) Suppose we use z tables instead of t-tables in estimating a CI for the mean, when
the t-table was appropriate. Which of the following is true? Assume everything else is unchanged. a) Since the t-distribution involves more variability than z, the CI using the z tables
will be wider than it should be b) Since the t-distribution involves less variability than z, the CI using the z tables
will be wider than it should be c) Since the t-distribution involves more variability than z, the CI using the z tables
will be narrower than it should be d) Since the t-distribution involves less variability than z, the CI using the z tables
will be narrower than it should be e) The answer depends on the level of confidence
66) Calculate the width of a 90% CI for μ if a sample of 16 gave a mean of 25 and variance 36. Assume the conditions are met.
a) 1.753 × (6/4) = 2.6295 b) 2 × [1.753 × (36/4)] = 31.554
c) 1.645 × (6/4) = 2.4675 d) 2 × [1.645 ×(6/4)] = 4.935 e) 2 × [1.753 × (6/4)] = 5.259
67) Suppose we wanted to estimate the average amount that students spend on lunch each week of the semester. A random sample of 25 students gave an average of
$37.50 with standard deviation $3.20. What are the correct calculations for an 80% CI? (Assume all conditions are met). a) 37.50 ± 1.282(3.20/√25) = (36.68, 38.32) in dollars
b) 37.50 ± 1.318(3.20/√25) = (36.66, 38.34) in dollars c) 37.50 ± 1.282(3.20/25) = (37.34, 37.66) in dollars
d) 37.50 ± 1.318(3.20/25) = (37.33, 37.67) in dollars e) None of these choices is correct.
68) Which set of circumstances is most likely to result in a narrow confidence interval?
a) Large n and a 95% confidence interval b) Large n and a 99% confidence interval
c) Small n and a 95% confidence interval d) Small n and a 99% confidence interval
e) Any n and a 99.5% confidence interval
Vivian Piovesan Cavaiuolo 2015 Page 21
69) Which of the following is true about the null hypothesis, symbolized by Hₒ?
a) It represents a proposition about an unknown population parameter that is tentatively assumed to be true.
b) It represents a proposition about an unknown statistic that is tentatively assumed to be true.
c) It never represents a proposition about an unknown population parameter that is tentatively assumed to be true.
d) It is the hypothesis that is declared null and void at the completion of a hypothesis test.
e) It is the hypothesis that is of no interest.
70) A company is testing the proportion of defective parts in a manufacturing process. If there are more than 5% of parts that are defective, the machinery must be stopped and serviced before production can resume. Which of the following is the
correct set of hypotheses that would be used to test whether the machinery needs to be serviced? Hint: p = population proportion of defective parts
(a) HO: p= 0.05 vs HA: p ≠ 0.05 (b) HO: p= 0.05 vs HA: p < 0.05 (c) HO: p > 0.05 vs HA: p = 0.05
(d) HO: p < 0.05 vs HA: p > 0.05 (e) HO: p= 0.05 vs HA: p > 0.05
71) A company is testing the proportion of defective parts in a manufacturing process.
If there are more than 5% of parts that are defective, the machinery must be stopped and serviced, before production can resume. Suppose they take a random sample of 200 items from a large batch of parts (assume there were thousands of
parts in the batch) and of these 20 were defective. Which of the following is the correct description of how the conditions are checked?
(a) Sample is random Need at least 2,000 parts in the population of parts being manufactured; told
thousands of parts in a batch Number of successes = np = 200(0.10) = 20 > 10 Number of failures = nq = 200(0.90) = 180 > 10
(b) Random sample
Less than 10% of population Number of successes > 10
Number of failures > 10 (c) This is a large sample so the conditions are satisfied.
(d) Sample is random
Need at least 2,000 parts in the population of parts being manufactured; told thousands of parts in a batch Number of successes = np = 200(0.05) = 10 ≥ 10
Number of failures = nq = 200(0.95) = 190 > 10
(e) None of these choices is correct.
Vivian Piovesan Cavaiuolo 2015 Page 22
Consider the following to answer the next 4 questions:
A report on the U.S. economy indicates that 28% of Americans have experienced difficulty in making mortgage payments. A news organization randomly sampled
400 Americans from 10 cities named the “fastest dying cities in the U.S.” (Forbes Magazine, August 2008) and found that 136 reported such difficulty. Does this
indicate that the problem is more severe among these cities? Hint: p = population proportion of Americans who have experienced difficulty making mortgage repayments.
72) What are the correct null and alternative hypotheses?
a) Ho: p ≠ 0.28 and HA: p = 0.28 b) Ho: p = 0.28 and HA: p < 0.28 c) Ho: p = 0.28 and HA: p ≠ 0.28
d) Ho: p > 0.28 and HA: p = 0.28 e) Ho: p = 0.28 and HA: p > 0.28
73) What is the correct value of the test statistic? a) z = 119.05 b) z = 2.53
c) z = 2.67 d) z = −2.67
e) z = −2.53
74) What is the correct P-value associated with the above test statistic? a) 0.9962 b) 0.9943
c) 0 d) 0.0057
e) 0.0038
75) At α = 0.05, what conclusion can we draw from the above test?
a) We can conclude that the percentage of Americans in these cities experiencing difficulty making mortgage payments is significantly higher than 28%.
b) We can conclude that the percentage of Americans in these cities experiencing
difficulty making mortgage payments is significantly lower than 28%.
c) We can conclude that the percentage of Americans in these cities experiencing
difficulty making mortgage payments is not significantly different from 28%.
d) We can conclude that the percentage of Americans in these cities experiencing difficulty making mortgage payments is approximately equal to 28%.
e) We can conclude that the percentage of Americans in these cities experiencing
difficulty making mortgage payments is exactly equal to 28%.
Vivian Piovesan Cavaiuolo 2015 Page 23
76) A study is interested in whether or not adults sleep a recommended 8 hours per night, on average. To investigate this, a random sample of 30 adults is taken, and
the sample mean is found to be 7.5 hours with standard deviation of 0.90 hours. Assuming that the conditions have been satisfied, which of the following is the correct critical value, if testing at the 5% level of significance?
a) −1.96 b) +/− 2.045
c) −2.045 d) +/ −1.96 e) +/− 1.699
77) A canning company is concerned about the real average amount of sweet corn it is putting into its cans. The machine is set to 450 grams. A sample of 100 cans is
taken randomly and their average weight is 448 grams and the standard deviation is 1.9 grams. The canner is testing the null hypothesis that the average weight of all cans is 450 grams. Assume the conditions are satisfied. Which of the following
is the correct value of the calculated test statistic? a) −10.526
b) −1.96 c) −105.263
d) −1.053 e) Cannot be determined because we do not know if it is a 1 or 2 tail test.
78) Suppose a random sample of size 49 is selected from a population with mean μ,
the value of which is unknown. The sample statistics are y =6.4 and s = 14. The hypothesis test is Ho: μ=10 against Ha: μ <10 using = 0.05.
Then which of the following is the correct decision? a) The calculated value of the test statistic is −1.8 and Ho is retained. b) The calculated value of the test statistic is −1.8 and Ho is rejected.
c) A type I error must have been committed. d) The calculated value of the test statistic is −0.257 and Ho is retained.
e) The calculated value of the test statistic is −0.257 and Ho is rejected.
79) Condemning an innocent defendant in a criminal trial is equivalent to committing what sort of decision in classical hypothesis testing?
a) A type I error. b) A type II error.
c) A correct decision. d) Sampling error. e) None of these choices is correct.
Vivian Piovesan Cavaiuolo 2015 Page 24
80) When using the chi-square distribution for tests of independence, we calculate the
expected frequencies by multiplying the row total by the column total and then dividing by the grand total. Why is this the correct way of determining expected
frequencies? a) Because the row and column totals must equal the grand total. b) Because the expected frequencies must be in same proportion as row totals
c) Because this follows from the rules of probability if the null hypothesis (independence) is true.
d) Because the expected frequencies must be in the same proportion as the column totals.
e) Because the joint probability always equals the product of the 2 marginals.
81) In a chi square test of independence at a 5% level, with a table consisting of 5
row and 3 columns, the correct critical value would be: a) 23.685
b) 18.307 c) 17.535 d) 15.507
e) 3.841
The next 4 questions relate to the following EXCEL output. The explanatory variable is INCOME (personal disposable income in billions of $).
The response variable is IMPORTS (expenditure on imports in billions of $).
82) The estimated slope was 0.246. Which of the following is correct:
a) When testing the significance of the slope, we should use an upper tail test
because the estimated slope is positive. b) When testing the significance of the slope, we should use an upper tail test
because economic theory suggests that imports increase as income increases. c) When testing the significance of the slope, we should use a lower tail test
because the estimated slope is positive.
d) When testing the significance of the slope, we should use a 2-tail test unless someone asks us to do otherwise.
e) None of these choices is correct.
SUMMARY OUTPUT Regression Statistics Multiple R 0.967 R Square 0.936 Adjusted R Square 0.932 Standard Error 22.305 Observations 20
ANOVA df SS MS F Significance F
Regression 1 130904.793 130904.793 263.121 3.46E-12 Residual 18 8955.157 497.509 Total 19 139859.95
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95% Intercept -260.771 32.067 -8.132 0.000 -328.14 -193.401 Income 0.246 0.015 16.221 0.000 0.214 0.277
Vivian Piovesan Cavaiuolo 2015 Page 25
83) This question relates to the EXCEL output of Income/Imports.
When testing whether the slope is significantly above 0, the appropriate conclusion would be to:
a) Read off the P-value from the output and reject Ho because the P-value is low. b) Read off the P-value from the output and retain Ho because the P-value is low. c) Correct p-value to use here is (Excel P-value)/2 from the output, and retain Ho
because this is low. d) Correct p-value to use here is (Excel P-value)/2 from the output, and reject Ho
because this is low. e) None of these choices is correct.
84) This question relates to the EXCEL output of Income/Imports. Suppose we wish to test whether the slope is significantly different from 0.25.
Which of the following is the correct value of the calculated t value for the t test? a) 0
b) −0.004 c) −0.267 d) 16.221
e) −8.132
85) This question relates to the EXCEL output of Income/Imports. Suppose we wish to test whether the slope is significantly different from 0.25.
Which of the following is the correct value of the critical t value for the t test at the 5% level? a) +/− 1.960
b) +/− 2.101 c) +/− 1.734
d) +/− 2.093 e) +/− 1.729
86) Look at point A in this scatter plot and decide whether the point has high or low leverage, whether or not is it influential and whether it’s residual is likely to be high
or low?
a) High leverage, not influential, large residual
b) High leverage, not influential, low residual. c) High leverage, influential, low residual would. d) Low leverage, not influential, low residual.
e) Low leverage, influential, high residual.
A
Vivian Piovesan Cavaiuolo 2015 Page 26
87) Look at point B in this scatter plot and decide whether the point has high or low
leverage, whether or not it is influential and whether it’s residual is likely to be high or low?
a) High leverage, not very influential, low residual. b) High leverage, influential, hard to say what the residual would be.
c) High leverage, not very influential, large residual d) High leverage, influential, low residual.
e) Low leverage, not very influential, possibly low residual.
88) Monthly closing share prices for a utility company were obtained from January 2007 through August 2008. A regression model was estimated to describe the
trend in closing share prices over time. What does the plot of residuals below suggest?
52504846444240
5.0
2.5
0.0
-2.5
-5.0
-7.5
Fitted Value
Re
sid
ua
l
Versus Fits(response is Closing Price)
a) An outlier is present in the data set.
b) The linearity condition is not satisfied. c) A high leverage point is present in the data set. d) The data are not normal.
e) The independence condition is not satisfied.
B
Vivian Piovesan Cavaiuolo 2015 Page 27
Use the output below to answer the NEXT SIX QUESTIONS.
The following EXCEL output is for a regression of Y on X1, X2 and X3. Here β0 refers to the intercept and β1 is the coefficient (or slope) on X1
β2 is the coefficient (or slope) on X2 and β3 is the coefficient (or slope) on X3.
89) This question relates to the Excel output of the regression of Y on X1, X2 and X3.
As X2 increases by 1 unit, after allowing for the effect of X1 and X3, what do we
estimate happens to Y, on average?
a) Increases by 15433.76
b) Increases by 28.92
c) Increases by 172.22
d) Decreases by 5.39
e) Increases by 15629.51
90) This question relates to the Excel output of Y on X1, X2 and X3.
What is the correct value of the test statistic to test the significance of X2?
a) 1.45
b) 2.58
c) −2.92
d) 2.33
e) 0.16
91) This question relates to the Excel output of Y on X1, X2 and X3.
What is the correct null hypothesis to test the overall usefulness of this model?
a) H0: β1 = β2 = β3 = 1
b) H0: β1 = β2 = β3
c) H0: β0 = β1 = β2 = β3 = 1
d) H0: β1 = β2 = β3 = 0
e) H0: β0 = β1 = β2 = β3 = 0
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.60
R Square 0.35
Adjusted R Square 0.28
Standard Error 14795.73
Observations 30
ANOVA
df SS MS F Significance F
Regression 3 3125697695 1041899232 4.76 0.01
Residual 26 5691753717 218913604
Total 29 8817451411
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 15433.76 6610.97 2.33 0.03 1844.72 29022.80
X1 28.92 11.20 2.58 0.02 5.90 51.94
X2 172.22 119.00 1.45 0.16 -72.38 416.82
X3 -5.39 1.85 -2.92 0.01 -9.19 -1.59
Vivian Piovesan Cavaiuolo 2015 Page 28
92) This question relates to the Excel output of Y on X1, X2 and X3.
What is the correct p-value to test the overall usefulness of this model?
a) 0.005
b) 0.01
c) 0.02
d) 0.03
e) 0.16
93) This question relates to the Excel output of Y on X1, X2 and X3.
What number of degrees of freedom do we use for a t-test of significance of the population intercept or to individually test the coefficients of any of the X variables? a) 29
b) 2 c) 3
d) 30 e) 26
94) What is the correct conclusion from testing the overall usefulness of the model AND from tests on the individual coefficients? (Use the 5% level of significance)
a) The equation as a whole is significant but none of the individual variables are
significant.
b) The equation as a whole is not significant and neither are any of the individual
variables.
c) The equation as a whole is not significant although X2 is significant.
d) The equation as a whole is significant and so are X1 and X3.
e) There has been an error because it is not possible for the individual coefficients
to have different significance from the overall equation.
95) When is a dummy variable used as an explanatory variable in a regression model?
a) When two independent variables interact. b) When the variable involved is quantitative.
c) When a non-linear relationship is suspected. d) When data for the variable of interest is not available and the researcher uses a
made up proxy variable. e) When the variable involved is qualitative.
Vivian Piovesan Cavaiuolo 2015 Page 29
96) In the equation: Estimated Salary = 100,000 + 500Year + 1,000Gender
Salary is the annual salary of a person
Year is the number of years of experience in the job Gender is a dummy with gender = 1 for males and 0 for females,
What is the correct interpretation of the coefficient on Gender?
a) Males earn $1,000 more than females. b) Females earn $1,000 more than males. c) We estimate, males earn $1,000 more, on average, than females with the same
number of years of experience in the job. d) A male earns $1,000 more than a female with the same number of years of
experience in the job. e) On average, females earn $1,000 more than males with the same number of
years of experience in the job.
97) The following linear regression equation estimates the relationship between the
selling price ($) of a particular model of sport’s car based on the age of the car (years) and the colour of the car, where Colour = 1 for a red sport’s car and
Colour = 0 for other colours of the sports car. Estimated Selling Price = 250,000 – 10,000Year + 2,500Colour
Which of the following best describes the coefficient of the dummy variable,
Colour?
(a) We estimate, that a red sport’s car of this particular model, will sell for $2,500 more than for any other colour.
(b) While holding age constant, we estimate, that a red sport’s car of this particular model, will sell for $2,500 less, on average, than for any other colour.
(c) We estimate, that a red sport’s car of this particular model, will sell for $2,500 more, on average, than for any other colour.
(d) While holding age constant, we estimate, that a red sport’s car of this particular
model, will sell for $2,500 more, on average, than for any other colour.
(e) None of these choices is correct.
Vivian Piovesan Cavaiuolo 2015 Page 30
98) The following linear regression equation estimates the relationship between the
selling price ($) of a particular model of sport’s car based on the age of the car (years) where Colour = 1 for a red sport’s car and Colour = 0 for other colours of
the sports car. Estimated Selling Price = 250,000 – 10,000Year + 2,500Colour
Which of the following best describes the estimate of a black sport’s car of this particular model, that is 10 year’s old?
(a) $152,500 (b) $150,000 (c) $240,000
(d) $100,000 (e) $242,500
99) Here is the result from EXCEL of a regression of the score given by consumers to gourmet pizzas (score) on the fat content of the pizza (fat) and the type of pizza (Type) where Type = 1 for a cheese pizza and 0 for a pepperoni pizza.
Score = −148.817 + 15.634 Type − 3.89 Fat
The p-value for the coefficient on Type is 0.0651 We want to test the significance of the coefficient on Type using a 2 tailed test at α of 5%.
Which of the following is correct? a) This can be done using p-values and Type is not significant at 5%. b) This cannot be done from EXCEL because Type is a dummy variable.
c) This can be done using p-values and Type is significant at 5%. d) This cannot be done using p-values because Type is a dummy variable.
e) The coefficient on Type will always be insignificant because Type is a dummy variable, irrespective of the size of the p-value.
100) Which of the following best defines a p-value? a) The p-value is the probability of getting our population value (or population
results) or more extreme if the null hypothesis was really true, and as such is a measure of the statistical evidence in favour of the alternative hypothesis.
b) The p-value is the probability of getting our sample value (or sample results) or
more extreme if the null hypothesis was really false, and as such is a measure
of the statistical evidence in favour of the alternative hypothesis.
c) The p-value is the probability of getting our sample value (or sample results) or more extreme if the null hypothesis was really true, and as such is a measure of
the statistical evidence in favour of the alternative hypothesis.
d) The p-value is the probability of getting our population value (or population
results) or more extreme if the null hypothesis were really false, and as such is a measure of the statistical evidence in favour of the alternative hypothesis.
e) The p-value is always the level of significance in a hypothesis test.
Vivian Piovesan Cavaiuolo 2015 Page 31
101) This table relates to the number of complaints received in a 6 month period:
If a 3 period moving average is used to smooth this series, what is the value of the
last term calculated? a) 54
b) 114 c) 144
d) 90 e) None of these choices is correct.
102) Here is a time series regression for quarterly data:
Y=10 + 2Q1 + 3Q2 − 4Q3 + 2t
with t in quarters with origin March 2013, where Q1, Q2 and Q3 are dummy variables for the first, second and third quarters respectively.
What can be said about Q4?
a) The coefficient on Q4 must be −1 so that the seasonal effects add to 0. b) Q4 is not included in the equation but it should have been included if we want
to estimate fully the seasonal effect.
c) The coefficient on Q4 must be 3 so that the seasonal effects add to 4. d) There is no dummy variable for Q4 because Q4 is the benchmark variable, to
which the coefficients of the other dummy variables are compared. e) There is no seasonal effect in a fourth quarter.
103) Here is a time series regression for quarterly data:
Y=10 + 2Q1 + 3Q2 − 4Q3 + 2t
with t in quarters with origin March 2007 where Q1, Q2 and Q3 are dummy variables for the first, second and third quarters respectively. Given that by
convention March is quarter 1; June is quarter 2; September is quarter 3; and December is quarter 4, what is the forecast for the June quarter of 2010? a) 37
b) 14 c) 39
d) 41 e) None of these choices is correct.
Month Complaints
January 36
February 45
March 81
April 90
May 108
June 144
Vivian Piovesan Cavaiuolo 2015 Page 32
104) Data have been collected for the price and quantity for the following basket of
goods for 2005 and 2014.
Which of the following is the correct interpretation of the Laspeyres price index for
2014, using 2005 as the base period? a) LP = 177.78, so there has been a 77.78% increase in the price of this basket of
goods from 2005 to 2014.
b) LP = 179.49, so there has been a 79.49% price increase in this basket of goods from 2005 to 2014.
c) LP = 167.44, so there has been a 67.44% price increase in this basket of goods from 2005 to 2014.
d) LP = 162.79, so there has been a 62.79% price increase in this basket of goods from 2005 to 2014.
e) LP = 184.62, so there has been a 84.62% price increase in this basket of goods
from 2005 to 2014.
105) Data have been collected for the price and quantity of the following 3 items in
2005, 2010 and 2014, where the products were all home brands of a major supermarket chain.
Which of the following is the correct interpretation of the Paasche price index for 2014 using 2005 as the base?
(a) PP = 335.398, so according to the Paasche price index, prices of this basket of goods have increased by 335.398% from 2005 to 2014.
(b) PP = 315.29, so according to the Paasche price index, prices of this basket of goods have increased by 215.29% from 2005 to 2014.
(c) PP = 346.34, so according to the Paasche price index, prices of this basket
of goods have increased by 246.34% from 2005 to 2014. (d) PP = 335.398, so according to the Paasche price index, prices of this basket
of goods have increased by 235.398% from 2005 to 2014.
(e) PP = 319.44, PP = 335.398, so according to the Paasche price index, prices of this basket of goods have increased by 219.44% from 2005 to 2014.
2005 2014
price quantity cost price quantity cost
Good 1 2 10 20 4 8 32
Good 2 3 5 15 5 5 25
Good 3 4 1 4 5 3 15
2005 2010 2014
Price Quantity Price Quantity Price Quantity
Milk (1 litre) 0.6 2 1.5 14 2 12
Bread (1 loaf) 0.7 3 1.2 11 2.8 14
Water (500 ml) 0.8 1 1.3 5 1.8 7
Vivian Piovesan Cavaiuolo 2015 Page 33
106) Which of the following is the best interpretation of a Paasche price index of 78.25, where the base period is 2010 and the current period is 2013?
a) According to the Paasche price index, prices have decreased by 78.25% from
2010 to 2013.
b) According to the Paasche price index, prices have decreased by 178.25% from 2010 to 2013.
c) According to the Paasche price index, prices have decreased by 21.75% from 2013 to 2010.
d) According to the Paasche price index, prices have increased by 21.75% from
2010 to 2013. e) According to the Paasche price index, prices have decreased by 21.75% from
2010 to 2013.
107) In 2000, you earned an annual salary of $42,500 and in 2011, your annual
salary is $64,800. You know that the CPI in 2000 was 124.70 and the CPI in 2011 was 178.50, (CPI = 100 in 1990). What is your real income in 2011?
(a) $64,800
(b) $363.01 (c) $36,302.52 (d) $22,300
(e) $34,081.80
108) If your salary was $34,000 in 2005 and $44,000 in 2010 and the CPI was 100 in
2005 and 162 in 2010, what has happened to your real income?
a) Decreased by 67%. b) Decreased by 20%.
c) Increased by 32%. d) Increased by 68%.
e) Increased by 12%.
109) In 2010 the CPI was 172.6 and GDP was 1,357,034 in millions of dollars. What is the value of real GDP in 2010?
a) 786,231 millions of dollars
b) 1,357,034 in millions of dollars c) 7,862 in millions of dollars d) 7,862,306 in millions of dollars
e) Cannot calculate, as we do not know the base period for the CPI.
Vivian Piovesan Cavaiuolo 2015 Page 34
110) Suppose that one year, the CPI in Hobart was 123 and 144 in Adelaide. They
both have the same base year. Which of the following would be correct? a) Prices in Hobart cannot be lower than prices in Adelaide.
b) Prices in Hobart must be higher than prices in Adelaide. c) Prices in Hobart must be lower than prices in Adelaide. d) Prices rose faster in Adelaide than Hobart from the base to the current year.
e) Something that costs $144 in Adelaide costs $123 in Hobart.
111) If the 2009 price relative for bread, with base 1999=100, is 300, then what can
we say about the price of bread?
a) The price of bread in 2009 has increased by 300% since 1999. b) The price of bread in 2009 has increased by 200% since 1999.
c) The price of bread in 1999 was 200% lower than in 2009. d) The price of bread in 1999 was 100% lower than in 2009.
e) The price of bread is 300 times as much in 2009 as 1999.
112) A 2009 Paasche price index number (with base 2000 = 100) of 235 indicates that quantities that were in fact bought for
a) $100 in 2000 would cost $235 in 2009. b) $235 in 2009 would cost $100 in 2000.
c) $235 in 2000 would cost $100 in 2009. d) $100 in 2009 would cost $235 in 2000. e) $100 in 2000 would cost $335 in 2009
Vivian Piovesan Cavaiuolo 2015 Page 37
SECTION B: Worked Answer Questions.
SDVV refers to exercises from the textbook, THIRD EDITION. Some Chapter exercises
have changed between the 3rd edition and earlier editions of the text. Hence, the exercises from the 3rd edition of the text are reproduced in this section. Please note that answers to odd numbered questions in the text are at the back of the
book and usually the questions are in pairs – an odd and an even one on a similar topic, so you can do an odd numbered question and check the answer to help you
with the even numbered ones. The exercises from earlier editions of the text are still very good and should be used by students in the same way.
1) SDVV Chapter 1, Exercise 6, page 43. A student finds data on an internet site that contains financial information about selected companies. He plans to analyze the data and the results to develop a stock
investment stratergy. What kind of data is he using? What concerns may you have about drawing conclusions from this data set?
2) SDVV Chapter 1, Exercise 20, page 44. Please answer only the following parts of this question. In 2013, Consumer Reports published an article comparing smart phones. It listed
46 phones, giving brand, price, display size, operating system (Android iOS, or Windows phones), camera image size (megapixels), and whether it had a memory
card slot. (a) Identify the five W’s plus the “how”. (b) Identify the quantitative variables and give the units.
(c) Identify the variables that are catergorical. (d) Identify the variables that are time series or cross-sectional.
(e) Are there any concerns?
3) SDVV Chapter 1, Exercise 22, page 44
Please answer only the following parts of this question. L.L. Bean is a large U.S. retailer that depends heavilly on its catalogue sales. It
collects data internally and tracks the number of catalogs mailed out, the number of square inches in each catalogue , and the sales ($ thousands) in the four weeks
following each mailing. The company is interested in learning more about the relationship (if any) among the timing and space of their catalogues and their sales. (a) Identify the five W’s plus the “how”.
(b) Identify the quantitative variables and give the units. (c) Identify the variables that are catergorical.
(d) Identify the variables that are time series or cross-sectional. (e) Are there any concerns?
Vivian Piovesan Cavaiuolo 2015 Page 38
4) SDVV Chapter 8, Exercise 4, page 292 A movie theatre company is interested in the opinions of their frequent customers
about their recently installed online ticketing system. Specifically they want to know what proportion of them plan to use the new ticketing system. They took a random sample of 15,000 customers from their data base and sent them an SMS message
with a request to fill out a survey in exchange for a free ticket to see a movie of their choice. (a) What is the population? (b) What is the sampling frame? (c) What is the population parameter of interest?
(d) What is the sampling method used?
5) Briefly explain why each of the following statements from past BES mini-projects,
are incorrect. (a) Our sample was biased. We should have taken a larger sample to prevent this. (b) We took a sample of 40 BES students by taking the 2 practicals (tutorials) that
our tutor runs and asking the tutor to distribute the survey form to 20 students in each practical. This is stratified sampling.
(c) Our target sample was the number of students who responded and answered YES, they were in the Business School.
(d) This was a random sample. It suffered from convenience sampling and non-response error.
6) SDVV Chapter 8, Exercise 6, page 292
For their class project, a group of Business students decides to survey the student body to assess opinions about a proposed new student coffee to judge how successful it might be. Their sample of 200 contained 50 first-year students, 50
sophomores, 50 juniors 50 and 50 seniors. (a) Do you think the group was using an SRS (simple random sample)? Why?
(b) What kind of sampling design do you think they used?
7) SDVV Chapter 8, Exercise 16, page 293 Indicate whether each statement below is true or false. If false, explain why.
(a) Asking viewers to call into an 800 number is a good way to produce a representative sample.
(b) When writing a survey, it’s a good idea to include as many questions as possible to ensure efficiency and to lower costs.
(c) A recent poll on a website was valid because the sample size was over 1,000,000 respondents. (d) Malls are not necessarily good places to conduct surveys because people who
frequent malls may not be representative of the population at large.
Vivian Piovesan Cavaiuolo 2015 Page 39
8) SDVV Chapter 8, Exercise 26, page 294
Hoping to learn what issues may resonate with voters in the coming election; the campaign director for the mayoral candidate selects one block at random from each
of the cities election districts. Staff members go there and interview all the residents they can find. Identify the following items (if possible). If you can’t tell, then say so. (a) The population (b) The population parameter of interest
(c) The sampling frame (d) The sample (e) The sampling method, including whether or not randomization was employed
(f) Any potential sources of bias you can detect and any problems you can see in generalizing to the population of interest.
9) Here are two entries from a previous semester’s “Best of the Worst graphic”
competition. Critically assess the displays.
(a)
(b)
Vivian Piovesan Cavaiuolo 2015 Page 40
10) SDVV Chapter 2, Exercises 2, 4 and 8, pages 67 and 68
As part of the marketing group at Pixar, you are asked to find the age distribution of the audience of Pixar’s latest film. With the help of 10 of your colleagues, you conduct
exit interviews by randomly selecting people to question at 20 different movie theatres. You ask them to tell you if they are younger than 6 years old, 6 to 9 years old, 10 to 14 years old, 15 to 21 years old, or older than 21. From 470 responses, you
find out that 45 are younger than 6, 83 are 6 to 9 years old, 154 are 10 to 14 years old, 18 are 15 to 21 and 170 are older than 21. For the age distribution:
(a) Make a frequency table. (b) Make a relative frequency table
Exercise 4 From the age distribution data described in Exercise 2: (a) Make a bar chart using counts on the y-axis.
(b) Make a relative frequency bar chart using percentages on the y-axis. (c) Make a pie chart.
Exercise 8: In addition to their age levels, the movie audiences in Exercise 2 [this] question ...
were also asked if they had seen the movie before (Never, Once, More than Once). Here is a table showing the responses by age group:
(a) Find the marginal distribution of their previous viewing of the movie. (Hint: find the
row totals). (b) Verify that the marginal distribution of ages is the same as that given in Exercise 2.
11) SDVV Chapter 2, Exercise 36 and , page 73
It has become more common for shoppers to “comparison shop” using the Internet. Respondents to a Pew survey in 2013 who owned cell phones were asked whether they had in the past 30 days, looked up the price of a product while they were in a
store to see if they could get a better price somewhere else. Here is a table of their responses by income level.
< $30K $30K - $49.9K $50K - $74.9K >$ 75K
Yes 207 115 134 204
No 625 406 260 417
(a) Find the conditional distribution (in percentages) of income distribution for those
who do not compare prices on the internet. (b) Find the conditional distribution (in percentages) of income distribution for
shoppers who do compare prices (on the internet). (c) Create a graph comparing the income distributions of those who compare prices with those who don’t.
(d) Do you see any differences between the conditional distributions? Write a brief (short) summary of what these data show about Internet use and its relationship to
income.
Under 6 6 to 9 10 to 14 15 to 21 Over 21
Never 39 60 84 16 151
Once 3 20 38 2 15
More than Once 3 3 32 0 4
Vivian Piovesan Cavaiuolo 2015 Page 41
12) SDVV Chapter 2, Exercise 44, page 74
The U.S. department of Labour (www.bls.gov) collects data on the number of U.S. workers who are employed at or below the minimum wage. Here is a table showing
the number of hourly workers by Age and Gender and the number who were paid at or below the prevailing minimum wage:
Hourly Workers
(in thousands)
At or below
minimum wage (in thousands)
Men Women Men Women
16-24 7978 7701 384 738
Age 25-34 9029 7864 150 332
35-44 7696 7783 71 170
45-54 7365 8260 68 134
55-64 4092 4895 35 72
65+ 1174 1469 22 50
(a) What percent of the women were ages 16 – 24? (b) Using side-by-side bar graphs, compare the proportions of the men and women
who worked at or below minimum wage at each Age group. Write a couple of sentences summarizing what you see.
13) SDVV Chapter 2, Exercise 50, page 76 PLUS EXTRA PARTS IN BOLD TYPE A company must decide which of two delivery services they will contract. During a
recent trial period, they shipped numerous packages with each service and have kept track of how often deliveries did not arrive on time. Here are the data.
(a) Compare the two service’s overall percentage of late deliveries.
(b) Based on the results in part (a) the company has decided to hire Pack Rats. Do you agree they deliver on time more often? Why or why not? Be specific (showing workings).
(c) The results here are an instance of what phenomenon? Explain.
Delivery Type of Number of Number of
Service Service Deliveries Late Packages
Pack Rats Regular 400 12
Overnight 100 16
Boxes R Us Regular 100 2
Overnight 400 28
Vivian Piovesan Cavaiuolo 2015 Page 42
14) A random sample of 400 pairs of sunglasses from a Melbourne factory has 16
that are defective, and a sample of 1000 from a Sydney factory has 60 defects.
a) Which factory has the lower rate of defects? Which appears to be better? b) Suppose now that we have further data. The Melbourne sample has 300 men’s
sunglasses with 8 defects and 100 women’s sunglasses with 8 defects whilst the
Sydney sample has 300 men’s sunglasses with 6 defects and 700 women’s sunglasses with 54 defects.
Write down a 2-way table for each factory, separately, showing men’s and women’s sunglasses cross tabulated against defective or not.
c) Now calculate the defective rate for men’s sunglasses at each factory.
d) Now calculate the defective rate for women’s sunglasses at each factory. e) Explain what is paradoxical about these results.
f) Explain why these paradoxical results occurred. Show your calculations.
15) Chapter 3, Exercise 2, 4 and 6, pages 110 and 111
Exercise 2: As the new manager of a small convenience store, you want to understand the
shopping patterns of your customers. You randomly sample 20 purchases from yesterday’s records (all purchases in U.S, dollars).
39.05 2.73 32.92 47.51
37.91 34.35 64.48 51.96
56.95 81.58 47.80 11.72
21.57 40.83 38.24 32.98
75.16 74.30 47.54 65.62
(a) Make a histogram of the data using a bar width of $20. (b) Make a histogram of the data using a bar width of $10.
(c) Make a relative frequency histogram of the data using a bar width of $10. (d) Make a stem and leaf plot of the data using $10 as the stems and putting the
smallest amounts on top and round the data to the nearest $ (whole number).
Exercise 4: For the histogram you made in the part (a) of the previous question ie: Exercise 2(a) (a) Is the distribution unimodal or multimodal?
(b) Where is (are) the mode(s)? (c) Is the distribution symmetric?
(d) Are there any outliers?
Exercise 6 For the data in Exercise 2: (a) Would you expect the mean purchase to be smaller than, bigger than, or about the
same size as the median? Explain (briefly). (b) Find the mean purchase.
(c) Find the median purchase.
Vivian Piovesan Cavaiuolo 2015 Page 43
16) Chapter 3, Exercise 14 page 112 SLIGHTLY ADJUSTED in bold type
(a) Using the data of shopping patterns for the convenience store in the previous question, draw a boxplot, using method of finding quartiles taught in lectures.
(b) Does the boxplot nominate any outliers. (c) What purchase amount would be considered a high outlier?
17) SDVV Chapter 3, Exercise 46, page 117 A real estate agent has surveyed houses in 20 nearby zip codes in an attempt to put
together a comparison for a new property that she would like to put on the market. She knows that the size of the living area of a house is a strong factor in the price, and she’d like to market this house as being one of the biggest in the area. Here is a
histogram and summary statistics for the sizes of all the houses in the area.
a) What is the range of these sizes? b) Between what sizes do the central 50% of houses lie?
c) What summary statistics would you use to describe these data? d) Write a brief description of these data (shape, centre and spread).
18) SDVV Chapter 3, Exercise 54 page 118
Ozone levels (in parts per billion, ppb) were recorded at sites in New Jersey monthly.
Here are boxplots of the data for each month (over the 46 years) lined up in order (January=1).
a) In what month was the highest ozone level ever recorded?
b) Which month has the largest IQR? c) Which month has the smallest range? d) Write a brief comparison of the ozone levels in January and June.
e) Write a report (brief) on the annual patterns you see in the ozone levels.
Vivian Piovesan Cavaiuolo 2015 Page 44
19) Chapter 3, Exercise 18 page 113 The convenience store manager from the
previous exercise, has collected data on purchases from weekdays and weekends. Here are some summary statistics (rounded to the nearest dollar):
Weekdays n = 230 Min = 4, Q1 = 28, Median = 40, Q3 = 68, Max = 95
Weekends: n = 150 Min = 10, Q1 = 35, Median = 55, Q3 = 70, Max = 100
From these statistics, construct side-by-side boxplots and write a sentence comparing the two distributions.
20) State and briefly explain which is the best measure of central tendency for each of the following. For each of parts (a) to (d), provide a sketch of a suitable frequency
curve where possible; either a symmetric curve, positively skewed curve or a negatively skewed curve, labelling where you believe would be the position of the
mean, median and mode. (a) Earnings of employees of an airline company. (b) Colours of smartphone covers in a random sample.
(c) Final marks of a very easy compulsory test, where a number of students that did not sit the test automatically received a mark of zero.
(d) Methods of travelling to work.
21) Suppose the marketing manager of a large company was earning $129,420 per annum, got a raise and is now earning $140,000 per annum. Indicate how this would affect the following summary statistics (increase, decrease or stay about the same):
(a) Mean (b) Median (c) Range (d) IQR (e) Standard deviation
22) The number of orders received by a company over the last 25 days are as follows:
3 0 1 4 4 4 2 5 3 6
4 5 1 4 2 3 0 2 0 5 4 2 3 3 1
Please calculate parts (a) to (g) manually, use methods taught in BES Lecture.
a) Give the mean, median and mode of this sample of number of orders/day. Show formulae/workings or your reasoning.
b) Find the quartiles and the interquartile range.
c) Write down the 5 number summary for this data. d) Draw a box plot for this data.
e) Calculate the standard deviation. f) On the 26th day, the company received 8 orders. Use a Z-score to determine if
this number of orders was unusual. g) Now use EXCEL to provide the descriptive statistics for these data and so check
some of your answers above. (Hints on using Excel below or see text book)
Vivian Piovesan Cavaiuolo 2015 Page 45
Hints to use EXCEL to provide the default range of descriptive statistics: i) Type the data in one column. Put the name (for example, orders) in the first cell (maybe
B1) and then the data below that (maybe B2:26). ii) Select Data Data Analysis Descriptive Statistics and click OK. If you do not see
this option in the menu – see Add-ins below!
iii) Type in or select the input range (including the cell with the name) e.g. B1:B26. iv) Make sure Labels in First Row is checked. v) Make sure the circle on Output Range is checked and enter a cell for the output (E.g. C2).
vi) Make sure Summary Statistics is checked.
vii) Click on OK. ADD-INs: EXCEL has Add-Ins, optional components which increase its functionality.
Some, such as the Analysis Toolpak needed here for Data Analysis, come with EXCEL
whilst others come from other suppliers. If you do not find Data Analysis in the data part of the ribbon, add it in as follows:
Go to File menu. Select options.
Next select Add-Ins (on the left hand side). Then choose excel add-ins in the drop down box to the right of manage
towards the bottom of the screen and click on go. Ensure that both Analysis Toolpak and Analysis Toolpak VBA are checked and
then click on OK. Now you will find Data Analysis on the Data menu.
Vivian Piovesan Cavaiuolo 2015 Page 46
23) The following table shows data on total assets ($ billion) for a small sample of U.S. banks.
Bank Assets ($ billion)
State Street Bank and Trust 160.5
Discover Bank 63.9
Bank West 72.8
Citizens Bank 130.0
Northern Trust 83.8
Huntington Bank 53.8
Key Bank 91.8
People’s United 27.9
(a) Calculate the mean total assets for this sample. (b) Calculate the standard deviation of total assets for this sample.
(c) Standardize the asset value of State Street Bank and Trust (Hint: find the z score). Interpret the standard value.
24) SDVV Chapter 3, Exercise 74, page 122 The World Bank, through their Doing Business project (www.doingbusiness.org),
ranks nearly 200 economies on the ease of doing business. One of their rankings measures the ease of starting a business and is made up (in part) of the following variables: number of required start-up procedures, average start-up time (in days),
and average start-up cost (in % per capita income). The following table gives the means and standard deviations of these variables for 95 economies.
Procedures (#) Time(days) Cost (%) Mean 7.9 27.9 14.2
SD (standard deviation) 2.9 19.6 12.9
Here are the data for three countries. Procedures (#) Time(days) Cost (%)
Spain 10 47 15.1 Guatemala 11 26 47.3 Fiji 8 46 25.3
(a) Use Z scores to combine the three measures. (Hint: Do this for each country separately: find a z score for procedures, a z score for time and a z score for cost then sum these to get the total z score for each country). (b) Which country has the best environment after combining the three measures? Be careful - a lower rank indicates a better environment to start up a business.
Vivian Piovesan Cavaiuolo 2015 Page 47
25) For each of the following scenarios indicate which is the explanatory variable and
which is the response variable, and do a rough sketch of a labelled scatterplot x and y axes, showing the suspected direction of any possible linear association.
(a) Salary data (in $) as well as years of managerial experience collected for a sample of executives in the high tech industry.
(b) Interest rates (in % per annum) and number of house mortgage applications.
(c) Data collected on job performance rating (in points) and hours of training for a sample of employees at a telecommunication repair facility.
(d) Price (in $) of flat screen TVs and screen size (in inches).
26) Here are several scatterplots. The calculated correlations are −0.977, −0.021, 0.736 and 0.951. Which is which?
27) SDVV Chapter 4, Exercise 54, page 168 Tell what each of the following residual plots indicates about the appropriateness of
the linear model that was fit to the data.
Vivian Piovesan Cavaiuolo 2015 Page 48
28)
In a paper presented by Anne Arnold to a Teaching and Learning Forum, she estimated several regression equations using data from BES results. (You may
assume that the assumptions and conditions for regression are met). Here is one of the regression equations:
Est. final = 30.87 + 3.54 tutorial with r = 0.65
The response variable is final: the final marks obtained by students who remained in the course, %, and the explanatory variable is tutorial: the mark (out of 10) for tutorial participation, where students were awarded 1 mark for each tutorial
attended, where there were 10 tutorials in that semester.
a) Interpret the slope of this equation.
b) Predict the final marks of a student who attended no tutorials.
c) Predict the final marks of a student who attended all 10 tutorials.
d) Interpret the coefficient of determination. How good are your predictions?
e) List 2 other factors that we might want to take into account when predicting a
student’s final mark in the course.
29) SDVV Chapter 4 Exercise 52 page 168
An online clothing retailer examined their transactional database to see if total yearly Purchases ($) were related to customers’ Incomes ($). (You may assume that the
assumptions and conditions for regression are met). The least squares regression equation is
Estimated Purchases = −31.6 + 0.012 Income.
(a) Interpret the intercept in this linear model.
(b) Interpret the slope in this linear model.
(c) If a customer has an Income of $20,000, what is his predicted total yearly Purchases?
(d) This customer’s yearly purchases were actually $100. What is the residual using this linear model? Did the model provide an underestimate or overestimate for this
customer?
Vivian Piovesan Cavaiuolo 2015 Page 49
30) Use the EXCEL output below to answer the following questions. The variables are
WAGE (hourly wage rate, $US per hour) and EDUC (years of formal education) for 1000 people in the US. The data are from the 1997 population survey.
a) Comment on the scatter plot. b) Write down the regression equation.
c) Write down the value of and interpret the correlation coefficient. d) Interpret the slope of the equation.
e) Predict the wage of someone with 10 years of education. f) Do you think your estimate would be any good? Explain your answer. g) Comment on the residual plot.
Mean 10.213 Mean 13.285
Standard Error 0.198 Standard Error 0.078
Median 8.790 Median 13
Mode 4.420 Mode 12
Standard Deviation 6.247 Standard Deviation 2.468
Sample Variance 39.021 Sample Variance 6.092
Kurtosis 7.051 Kurtosis 1.539
Skewness 1.956 Skewness -0.212
Range 58.160 Range 17
Minimum 2.030 Minimum 1
Maximum 60.190 Maximum 18
Sum 10213.020 Sum 13285
Count 1000 Count 1000
wage educ
scatter plot of wage and educ
0
10
20
30
40
50
60
70
0 5 10 15 20education
wage
Histogram of Wage
0
50
100
150
200
250
2 4 6 8 10 12 14 16 18 20 More
Fre
qu
en
cy
Histogram of Educ
0
100
200
300
400
6 7 8 9 10 11 12 13 14 15 16 17 18
Mor
e
Fre
quency
Response variable: WAGE
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.450
R Square 0.202
Adjusted R Square 0.202
Standard Error 5.582
Observations 1000
ANOVA
df SS MS F Significance F
Regression 1 7888.511 7888.511 253.200 5.59313E-51
Residual 998 31092.986 31.155
Total 999 38981.497
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -4.912 0.967 -5.081 0.000 -6.809 -3.015
educ 1.139 0.072 15.912 0.000 0.998 1.279
educ Residual Plot
-20
-10
0
10
20
30
40
50
0 5 10 15 20
educ
Resid
uals
Vivian Piovesan Cavaiuolo 2015 Page 50
31) You are thinking of selling your house and to try to decide what price to ask for.
You collect data on the Selling price ($ 000’s) and the Average Floor Area, (square metres) of properties from the last 10 recent house sales in your area.
Note: An empty block would have an Average Floor Area of 0 square metres. The following EXCEL output is provided.
Average Floor Area (sqm) Price ($)
245 625
540 875
458 900
150 500
270 500
100 300
290 635
300 700
350 835
200 430
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9171
R Square 0.841
Adjusted R Square 0.8211
Standard Error 84.892
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 304947.1623 304947.1623 42.314956 0.0001871
Residual 8 57652.83769 7206.604712
Total 9 362600
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%
Intercept 230.04 67.09068433 3.428723648 0.0089717 75.32402 384.7468
Average Floor Area (sqm) 1.3778 0.211800779 6.504994707 0.0001871 0.8893495 1.866176
0
200
400
600
800
1000
0 100 200 300 400 500 600P
rice
($)
Average Floor Area (sqm)
Scatterplot of Price against Average Floor Area
-150
-100
-50
0
50
100
150
0 100 200 300 400 500 600Re
sid
ual
s
Average Floor Area (sqm)
Average Floor Area (sqm) Residual Plot
(a) Check the conditions for linear regression.
(b) What is the estimated regression equation? (c) Interpret the slope.
(d) Interpret the correlation coefficient. (e) Your house has a floor area of 250 square metres. Predict your selling price. (f) How good is your prediction?
(g) If you receive an offer of $550,000 for your house, should you take it? Explain using residuals.
(h) You also own an empty block of land in the same area, measuring in land size of 600 square metres. Predict the sale of your empty block of land. Briefly explain if you
think that this regression model gives an accurate prediction for selling an empty block of land. (i) List at two other factors that could affect the selling price of your house.
Vivian Piovesan Cavaiuolo 2015 Page 51
32) A random sample of rental accommodations was taken from Adelaide, where the
distance from the city centre, in kilometres, and the weekly rental fee, in dollars, was recorded with the data provided in the table below.
Note: The city centre is defined as the central square kilometre in the centre of Adelaide bordered between North, South, East and West Terraces, within which rental accommodations exist.
Distance (km) Rent ($)
5 800
15 300
3 700
0 800
7 400
9 450
10 400
14 350
30 1,600
4 600
3 650
1 750
Please use EXCEL for the REGRESSION.
See HINTS on next page on how to use EXCEL for scatter plots and regression. Students MUST hand in the Excel regression output with their solution.
Select to create a residual plot when doing the regression, as needed later in question.
a) Use EXCEL to prepare a scatterplot of Price against Distance. b) What can you say about the direction of the association?
c) What can you say about the form of the relationship? d) What can you say about the strength of the relationship?
e) Does the scatterplot show any outliers? NOW REMOVE THE SUSPECTED OUTLIER Use EXCEL to fit the regression of Price on Capacity.
f) Write down the estimated equation. g) Interpret the slope.
h) Interpret the intercept. Is it meaningful in the context of this question? i) Give the value of the correlation coefficient. Interpret.
j) Give the value of the coefficient of determination. k) What is the weekly rental fee predicted using this model, for accommodation
that is 9 kilometres from the city centre?
l) Using you answer from part (j), how good is your prediction? m) What is the residual weekly rental fee for rental accommodation 9 km from the
city centre? n) Briefly, explain whether the model has overestimated or underestimated the
weekly rental fee, for rental accommodation, 9 kilometres from the city centre?
o) Submit and comment on the scatterplot of the residuals. p) Do you think we could use this model to predict the weekly rental fee for
accommodation 1,000 km from the city centre? Briefly explain.
Vivian Piovesan Cavaiuolo 2015 Page 52
HINTS on how to use EXCEL to create a Scatterplot and for Regression
i. Enter the data in columns, with the variable names (labels) in the first row.
ii. For the scatter plot;
Highlight the 2 columns of data
Include names (labels) only in some versions of Excel. For example, Excel 2010: highlight only the data, not the labels, to get a
scatter plot , where first column is the X variable and second column is the Y variable. Insert title, x-axis label and y-axis labels by inserting text
boxes.
Go to insert scatter plot etc.
Alternatively, see * Note below to create a scatterplot using the same dialogue box when creating a regression. NOTE: BE AWARE OF
DOING THIS IF YOU HAVE TO REMOVE A SUSPECTED OUTLIER
iii. For the regression; Data Analysis (You may have to Add in the Data Analysis tool pak in
your version of Excel)
Regression and click OK
Type in or select the input Y range (you must include the cell with the name)
Type in or select the input X range (you must include the cell with the
name)
Make sure the Labels box is checked.
Check you have chosen the correct columns for the X and Y variables.
Ensure you select the Output Range; then immediately click in the output
range section of dialogue box then click on a cell in your spreadsheet.
Select to create a residual plot when doing the regression, as may be needed in question.
* Note: By selecting Line fit plots in the Regression dialogue box, you will get a
scatter plot of the predicted (regression model) values and the actual data values of Y for the same value of X.
To get a scatter plot of Y vs X from this Line fit plot, delete the predicted values, (by clicking on the red coloured plot points and pressing delete). You can then change the title to Scatterplot of … vs … (using names).
Vivian Piovesan Cavaiuolo 2015 Page 53
33) A small specialist food store offers different types of olive oil for sale, amongst
other food items. To determine whether price impacts sales, the manager of the store recorded the volume of oil sales, measured in litres (L) and the price, measured
in dollars per litre ($/L) of the different qualities of olive oil the store sold last month. Here is a table of last months olive oil sales:
Please use EXCEL for the REGRESSION.
See HINTS on previous page on how to use EXCEL for scatter plots and regression.
Students MUST hand in the Excel regression output with their solution.
a) Use EXCEL to prepare a scatterplot of volume of oil sales against price. b) What can you say about the direction of the association?
c) What can you say about the form of the relationship? d) What can you say about the strength of the relationship?
e) Does the scatterplot show any outliers? NOW REMOVE THE SUSPECTED OUTLIER
Use EXCEL to fit the regression of volume of oil sales on price. f) Write down the estimated equation.
g) Interpret the slope. h) Interpret the intercept. Is it meaningful in the context of this question?
i) Give the value of the correlation coefficient. Interpret. j) Give the value of the coefficient of determination. k) What monthly volume of oil sales would you predict for an olive oil that sells in
the store for $10 per litre? l) Using you answer from part (j), how good is your prediction in part (k)?
m) What is the residual monthly volume of oil sales for olive oil that sells for $10 per litre?
n) Does the model overestimate or underestimate the monthly volume of oil sales? o) Create a scatterplot of the residuals. Comment. p) The store owner is thinking of stocking the most expensive olive oil in the
world; an ultra-premium olive oil called Lambda, made in Greece. The store owner would sell Lambda at $185 per litre to customers. Do you think we could
use this model to predict the store’s monthly volume of Lambda sales? Briefly explain.
Price
($/L)
Volume of
oil sales (L)
18 1,200
7 2,000
10 1000
50 4000
12 1,200
25 1,000
4 1,800
6.5 1,700
20 900
30 600
Vivian Piovesan Cavaiuolo 2015 Page 54
34) SDVV Chapter 5, Exercise 8, page 199
Multigenerational families can be categorized as having two adult generations such as parents living with adult children, “skip” generation families, such as grandparents
living with grandchildren, and three or more generations living in the household. Pew Research surveyed multigenerational households. This table is based on their reported results.
2 Adult Gens 3 Skip Gens
3 or more
Gens
White 509 55 222 786
Hispanic 139 11 142 292
Black 119 32 99 250
Asian 61 1 48 110
828 99 511 1438
(a) What is the probability that a multigenerational family is Hispanic? (b) What is the probability that a multigenerational family selected ta random is a
Black, two-adult-generation family? (c) What type of probability did you find in parts a and b?
35) SDVV Chapter 5 Exercise 10 page 199 Using the table from Exercise 8 (the previous question),
(a) What is the probability that a randomly selected Black multigenerational family is a 2 Adult Generation family? (b) What is the probability that a randomly selected multigenerational family is White,
given that it is a “skip” generation family? (c) What is P(3 or more Generations | Asian)?
36) SDVV Chapter 5, Exercise 54, page 204
A Mintel study asked consumers if electronic communications devices influenced whether or not they bought a certain car. The table below gives the results classified by household income: Communications influence on car purchase, by household income, July 2011
Income
Communication (e.g., hands free calling):
< $50K $50K – 99.9K $100K+ Total
Very much 30 57 41 128
Somewhat 26 39 62 127
Not at all 23 39 35 97
Total 79 135 138 352
If we select a person at random from this sample:
(a) What is the probability that electronic communication devices somewhat influenced their decisions? (b) What is the probability that the person is earning at least $100K?
(c) What is the probability that the person was somewhat influenced by electronic communications and earns at least $100K?
(d) What is the probability that electronic communications somewhat influenced the purchase or that the person earns at least $100K?
Vivian Piovesan Cavaiuolo 2015 Page 55
37) SDVV Chapter 5, Exercise 58, page 205 A European department store is developing a new advertising campaign for their new
U.S. location, and their marketing managers need to understand their target market better. A survey of adult shoppers found the probabilities that and adult would shop at their new U.S. store classified by age is shown below.
(a) What is the probability that a survey respondent will shop at the U.S. store? (b) What is the probability that a survey respondent will shop at the store given that
they are younger than 20 years old? (c) What is the probability that a survey respondent who is older than 40, shops at the store.
(d) What is the probability that a survey respondent is younger than 20 or will shop at the store?
38) SDVV Chapter 5, Exercise 62, page 206 The following questions use the table of data from Chapter 5 Exercise 54, reproduced below:
Communications influence on car purchase, by household income, July 2011
Income
Communication (e.g., hands
free calling):
< $50K $50K – 99.9K $100K+ Total
Very much 30 57 41 128
Somewhat 26 39 62 127
Not at all 23 39 35 97
Total 79 135 138 352
(a) If we select a respondent at random, what is the probability that we choose a person earning less than $50K and responded “somewhat”?
(b) Among those earning $50-99.9K, what is the probability that the person responded “not at all”?
(c) What is the probability that a person who responded “very much” was earning at least $100K? (d) If the person responded “very much”, what is the probability that they earn
between $50K and 99.9K? (e) Are the responses to the question and income level independent?
Ag
e
Shop
Yes No Total
< 20 0.26 0.04 0.30
20 – 40 0.24 0.10 0.34
>40 0.12 0.24 0.36
Total 0.62 0.38 1.00
Vivian Piovesan Cavaiuolo 2015 Page 56
39) SDVV Chapter 5 Exercise 64 page 206
Professional polling organizations face the challenge of selecting a representative sample of U.S. adults by telephone. This has been complicated by people who only
use cell phones and by others whose landline phones are unlisted. A careful survey by Democracy Corps determined the following proportions:
Cell phone only 39%
Both cell and landline 29%
Landline only listed 22%
Landline only unlisted 7%
(a) What is the probability a randomly selected U.S. adult has a landline?
(b) What is the probability that a U.S. adult has a landline given that he or she has a cell phone?
(c) Are having a cell phone and having a landline independent? Explain. (d) Are having a cell phone and a landline disjoint? Explain.
40) SDVV Chapter 6 Exercise 10 page 231 PLUS EXTRA PARTS IN BOLD
Given independent random variables, X and Y, with means and standard deviations shown, find the mean and standard deviation of each of the variables in parts a to d.
Note: Mean X = E[X] and SD X = SD[X], similarly for Y.
Mean SD
X 80 12
Y 12 3
a) X − 20 b) 0.5Y
c) X + Y d) X – Y e) X + 0.5Y + 4
f) 2X – 0.5Y
41) The monthly demand (in hundreds) for a magazine at a newsagent is listed
below along with corresponding probabilities.
Demand (x) P(x)
1 0.1 2 0.25 3 0.5
4 0.15
a) Find the expected demand and interpret.
b) Find the standard deviation of demand. c) A newsagent receives a payment of $100 for stocking the magazine plus 90
cents for each magazine sold. What is the mean and variance of the total
revenue of the newsagent from selling the magazine?
Vivian Piovesan Cavaiuolo 2015 Page 57
42) SDVV Chapter 6 Exercise 8 page 231
A motor home sales department has created three plans for purchasing a new or used motor home for leisure to increase potential sales for its fleets. They estimate
that 20% will choose plan 1, which includes no down payment with 10-years finance option; 40% will choose plan 2, which includes a 20% down payment with a 7-year finance option; and 40% will choose plan 3, which includes 40% down payment and
a 5-year finance option. (Hint: create a table showing the discrete probability model of X and p(X) , converting % to decimals. Then use formula for E[X] and V[X] and square root V[X] to get SD[X].)
(a) Find the expected value of the type of down payment potential customers will need.
(b) Find the standard deviation of the type of down payment potential customers will need.
43) SDVV Chapter 6, Exercise 24, page 232
A small software company will bid on a major contract. It anticipates a profit of $50,000 if it gets it, but thinks that there is only a 30% chance of that happening.
(Hint: create a table showing the discrete probability model of X and p(X) , converting % to decimals. Then use formula for E[X] and V[X] and square root V[X] to get SD[X].) (a) What is the expected profit? (b) Find the standard deviation for the profit.
44) SDVV Chapter 6 Exercise 32 page 233 PLUS EXTRA PART IN BOLD For warranty purposes, analysts want to model the number of defects on the screen of the new tablet they are manufacturing. Let X = number of defective pixels per
screen. If X can be modeled by: X = # of Defective pixels 0 1 2 3 4 or more
P(X = x) 0.95 0.04 0.008 0.002 0
(a) What is the expected number of defective pixels per screen? Interpret. (b) What is the standard deviation of the number of defective pixels per screen?
(c) What is the expected number of defective pixels in the next 100 screens? (d) What is the standard deviation of the number of defective pixels in the next 100
screens? 45) SDVV Chapter 6 Exercise 34 page 233
At a casino, people play the slot machines in hopes of hitting the jackpot, but most of the time, they lose their money. A certain machine pays out an average of $0.92 (for
every dollar played), with a standard deviation of $120. (a) Why is the standard deviation so large?
(b) If a gambler plays 5 times, what are the mean and standard deviation of the casino’s profit? (c) If gamblers play this machine a 1,000 times in a day, what are the mean and
standard deviation of the casino’s profit?
Vivian Piovesan Cavaiuolo 2015 Page 58
46) Chapter 7 Exercises 10 and 12 page 262 plus EXTRA PARTS IN BOLD
Exercise 10: What percent of a standard Normal model is found in each region? Draw a picture first.
(a) z > −2.05 (b) z < − 0.33 (c) 1.2 < z <1.8
(d) |z|<1.28 which is −1.28 < z < 1.28
Exercise 12: In a standard Normal model, what value(s) of z cuts off the region described? Don’t forget to draw a picture. (a) The lowest 20%
(b) The highest 15% (c) The highest 20%
(d) The middle 50% (e) The first quartile
(f) The third quartile
47) A sample of students were selected and asked to participate in a simple
experiment, measuring reaction length (in cm). Here are the sample data, descriptive statistics and a histogram.
a) Using the mean and standard deviation from the Excel output and the
68/95/99.7 rule, within what values would you expect the middle 95% of values
to lie? b) In fact, what % of all the values in the sample data, actually fall in that interval
you found in part a? c) Do you think a Normal model is appropriate for the reaction times? Explain. d) One student had a measurement of 15.5 cm. Was this unusual? Answer this by
calculating the standardised value and the area under the normal curve to the right of this value.
Histogram
0
10
20
30
40
0 4 8 12 16 20 24 28Bin
Fre
qu
en
cy
Vivian Piovesan Cavaiuolo 2015 Page 59
48) SDVV Chapter 7, Exercise 28 and 30, pages 264 and 265 PLUS EXTRA PARTS
IN BOLD Exercise 28: For the 300 trading days from January 11, 2012 to March 22, 2013, the
daily closing price of IBM stock (in $) is well modelled by a Normal model with mean $197.92 and standard deviation $7.16, According to this model, what is the probability that on a randomly selected day in this period the stock priced closed
(a) above $205.8? (b) below $212.24
(c) between $183.60 and $205.08? (d) Which would be more unusual, a day on which the stock price closed above $206 or below $180?
Exercise 30: According to the model in Exercise 28, what cut-off value of price would
separate the (a) lowest 16% of the days?
(b) highest 0.15%? (c) middle 68%? (d) highest 50%?
(e) lowest 25%? (f) highest 25%?
49) SDVV Chapter 7 Exercise 38 page 265 Every Normal model is defined by its parameters, the mean and the standard deviation. For each model described here, find the missing parameter.
Don’t forget to draw a picture.
a) µ = 1250; 35% below 1200; σ = ? b) µ = 0.64; 12% above 0.70; σ = ? c) σ = 0.50; 90% above 10; µ = ?
d) σ = 220; 3% below 202; µ = ?
50) SDVV Chapter 7 Exercise 46 page 266
A tyre manufacturer believes that the tread life of its snow tyres can be described by a Normal model, with a mean of 32,000 miles and standard deviation of 2,500 miles. (a) If you buy a set of these tyres, would it be reasonable for you to hope that they
will last 40,000 miles? Explain. (b) Approximately, what fraction of these tyres, can be expected to last less than
30,000 miles? (c) Approximately, what fraction of these tyres, can be expected to last between
30,000 and 35,000 miles? (d) Estimate the IQR for these data. (e) In planning a marketing strategy, a local tyre dealer wants to offer a refund to
any customer whose tyres fail to last a certain number of miles. However, the dealer does not want to take too big a risk. If the dealer is willing to give refunds to no
more than 1 of every 25 customers, for what mileage can he guarantee these tyres to last?
Vivian Piovesan Cavaiuolo 2015 Page 60
51) The active lifetime of a particular brand and model of smart phone is Normally
distributed with a mean of 34 months and a standard deviation of 5 months. Draw a picture for each of parts a to d to support your solutions.
(a) What is the probability that one randomly selected smart phone of this particular brand, will last less than 24 months?
(b) What is the probability that one randomly selected smart phone, of this particular brand, will last more than a year and a half?
(c) What is the probability that one randomly selected smart phone, of this particular brand, will last between 24 months and 48 months? (d) Determine the minimum number of whole months that this particular brand of
smart phone will last for in the top 1.1%.
52) SDVV Chapter 9, Exercises 22 and 24 pages 325 and 326 Exercise 22: An automatic character recognition device can successfully read about 85% of handwritten credit card applications. To estimate what might happen when
this device reads a stack of applications, the company did a simulation using samples of size 20, 50, 75 and 100. For each sample size, they simulated 1000 samples with
success rate p = 0.85 and constructed the histogram of the 1000 sample proportions, shown here. Explain what these histograms say about the sampling distribution
model for sample proportions. Be sure to talk about shape, centre and spread.
Exercise 24
The automatic character recognition device discussed in Exercise 22, successfully reads about 85% of handwritten credit card applications. In Exercise 22, you looked at the histograms showing distributions of sample proportions from 1,000 simulated
samples of size 20, 50, 75 and 100. The sample statistics from each simulation is provided in the following table.
Vivian Piovesan Cavaiuolo 2015 Page 61
(a) According to the Normal model, what should the theoretical mean and standard deviations be for these sample sizes? (b) How close are those theoretical values to what was observed in these
simulations? (c) Looking at the histograms provided in Exercise 22, at what sample size would you
be comfortable using the Normal model as an approximation for the sampling distribution? (d) What does the Success/Failure Condition say about the choice you made in
part (c).
53) Based on past experience, a car dealership believes that 30% of its customers
who purchase a car, using the car dealership’s lease-hire agreement, do not make their payments on time. The car dealership randomly selects 100 of its customers who purchased a car using the dealership’s lease-hire agreement.
Let p represent the population proportion of this dealership’s lease-hire agreement customers who do not make their payments on time.
(a) Describe the appropriate model for P
?
(Hint: Check conditions, specify the name of the distribution; specify the mean; specify the standard deviation) (b) What is the probability that more than one third of this sample do not make their
payments on time?
54) SDVV Chapter 9 Exercise 4 page 323
The proportion of adult women in Latvia is approximately 54%. A marketing survey telephones 400 people at random.
a) What is the sampling distribution of the observed proportion that are women?
b) What is the standard deviation of that proportion? c) Would you be surprised to find 56% women in a sample of size 400? Explain?
d) Would you be surprised to find 51% women in a sample of size 400? Explain? e) Would you be surprised to find that there were fewer than 180 women in the
sample? Explain. 55) SDVV Chapter 9 Exercise 58 page 329 PLUS EXTRA PART IN BOLD
During the period of Sep 2 – Oct 10, 2013, a Gallup Poll asked 1,500 Indian adults, aged 18 or over, how they rated economic conditions. Only 29% rated the economy
as “Getting better”. Construct a 95% confidence interval for the true proportion of Indians who rated the Indian economy as improving. Interpret your confidence
interval.
Vivian Piovesan Cavaiuolo 2015 Page 62
56) In a random sample of 100 customers in “The Olde Coffee Shoppe”, it was found that 25 customers had paid by credit card.
a) Find a 90% confidence interval for the proportion of all customers who pay by credit card. Interpret this interval.
b) Find a 95% confidence interval for the proportion of all customers who pay by credit card.
c) Suppose instead that the sample had been 60 people with 15 using their credit card. Now calculate the 90% and the 95% confidence intervals for the population proportion.
d) Compare the four confidence intervals and comment on the results.
57) SDVV Chapter 9 Exercise 20 page 325
In preparing a report on the economy, we need to estimate the percentage of businesses that plan to hire additional employees in the next 60 days.
(a) How many randomly selected employers must we contact in order to create an estimate in which we are 98% confident with a margin of error of 5%?
(b) Suppose we want to reduce the margin of error to 3%. What sample size will suffice? (c) Why might it not be worth the effort to try and get an interval with a margin of
error of 1%?
58) SDVV Chapter 9, Exercise 46 page 328
Recently, two students made worldwide headlines by spinning a Belgian euro 250 times and getting 140 heads-that’s 56%. That makes the 90% confidence interval (51%, 61%). What does this mean? Are the conclusions in parts a-e correct? Explain
your answers? a) Between 51% and 61% of all euros are unfair.
b) We are 90% sure that in this experiment this euro landed heads between 51% and 61% of the spins.
c) We are 90% sure that spun euros will land heads between 51% and 61% of the
time. d) If you spin a euro many times, you can be 90% sure of getting between 51%
and 61% heads. e) 90% of all spun euros will land heads between 51% and 61% of the time.
59) Receipts of a small clothing store show that customer purchases have a skewed
distribution with mean $32 and standard deviation $20. a) Explain why you cannot determine the probability that the next customer will
spend more than $40. b) Can you estimate the probability that the next 8 customers will spend an
average of more than $40? Explain. c) Can you estimate the probability that the next 50 customers will spend an
average of at least $40? Explain. Calculate an answer if possible.
Vivian Piovesan Cavaiuolo 2015 Page 63
60) Incomes for production line workers in a certain city average $38.74 per hour with a standard deviation of $8.78. The incomes are skewed to the right.
a) Sketch a frequency curve that would represent the original population.
b) Now describe the sampling distribution for the sample mean for samples of size 100 and sketch this distribution.
c) Assume instead that the sample size was 64. Without working out the problem, state what would happen to the sampling distribution.
61) The lift in the Nexus 10 Building says Max 17 people or 1140 kg. This means 17
people who average 67 kg. If people’s weights are modelled by a normal model with mean 68.68 kg and standard deviation 15.67kg, find the probability that 17 people
would weigh more than 1140 kg.
62) In a test, students averaged 14.2 errors with a standard deviation of 4.2 errors.
a) If errors are known to be normally distributed, what is the probability that a given student will have more than 13 errors in the test?
b) If errors are not known to be normally distributed, what is the probability that a
sample of 49 students will average more than 13 errors in the test?
c) Why are your answers different?
d) Why was the assumption of normality required in part (a) but not in (b)?
63) In their mini-project, a BES team posed the question: “What is the average price that people living in Adelaide would reasonably expect to pay for a good cup of
regular coffee? Their sample results were: n = 41, sample mean = $3.50 and sample standard deviation = $0.429
The histogram of sample data was unimodal and showed only a slight positive skew.
a) Check the conditions for inference about the mean.
b) Construct a 98% CI for the average price that people living in Adelaide would
reasonably expect to pay for a good cup of regular coffee.
c) Interpret the CI you calculated in (b) above.
d) How would their confidence interval width change, if this team chose to do a
90% confidence interval, holding all else the same?
e) How would their 98 % confidence interval in part b change, if this team chose to use a sample of size 100, holding all else the same?
Vivian Piovesan Cavaiuolo 2015 Page 64
64) SDVV Chapter 11, Exercises 8 and 14 (omit part c), page 387
Exercise 8 A random sample of 24 phone conversations was recorded by a local university
switch board and the time spent in conversation (in minutes) was noted below:
38.12 2.7 32.82 47.51 36.52 34.2
64 52 26.6 31 5 12.4
32 4 1 17 18 33
12 6 8 42 15 16
The average was 24.45 minutes and the standard deviation was 17.23 minutes. (a) Find the standard error of the mean.
(b) How would the standard error change if the sample size had been reduced to 10? (Assume that the sample standard deviation did not change).
Exercise 14 (omit part c)
For the purchase amounts in the Exercise 8: (a) Construct a 90% confidence interval for the mean purchases of all customers, assuming that the assumptions and conditions for the confidence interval, have been
met. (b) How large is the margin of error?
65) How many pages can you expect to get from a print cartridge? (based on a question 44 from Lind, Marchal and Mason, Statistical Techniques in Business and Economics, 11th edition, McGraw-Hill, page 324) Suppose we took a random sample of cartridges and wrote down how many pages each printed. Here is a histogram along with some summary statistics.
a) Check the conditions for inference about the mean.
b) Find a 90% confidence interval for the true mean. c) Interpret this interval. d) Write down, from the EXCEL output, the 95% CI for the mean.
e) Which of your two intervals is wider? Is this what you expected?
Number of pages
Mean 2597.783
Standard Error 65.558
Median 2698
Mode 2888
Standard Deviation 444.634
Sample Variance 197699.374
Kurtosis -0.901
Skewness -0.054
Range 1541
Minimum 1884
Maximum 3425
Sum 119498
Count 46
Confidence Level(95.0%) 132.040
1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 More
Number of pages
Vivian Piovesan Cavaiuolo 2015 Page 65
66) SDVV Chapter 10, Exercise 12, page 353 PLUS EXTRA PART IN BOLD
Write the null and alternative hypotheses to test each of the following situations. Briefly describe what the parameter you are testing, is in context of the
question. (a) A 2010 Harvard Business Review article looked at 1109 CEOs from global
companies and found that 32% had MBAs. Has the percentage changed?
(b) Recently, 20% of cars of a certain model have needed costly transmission work after being driven between 50,000 and 100,000 miles. The car manufacturer hopes that the redesign of a transmission component has solved this problem.
(c) A marketer researcher for a cola company decides to field test a new soft drink
flavor, planning to market it only if he is sure that over 60% of the people like the flavour.
67) It is estimated that 40% of service stations have fuel tanks that leak. A new design is supposed to lessen the prevalence of these leaks and has been tested in
South Australia. A random sample of 27 of the new design tanks finds that 7 show some signs of leaking.
a) What are the null and alternative hypotheses? b) Check the conditions necessary for inference.
c) Test the hypothesis at the 5% level. d) State your conclusion.
e) If the new design actually works have you made an error? If so, what kind of error?
f) What 2 things could you do to decrease the probability of making this kind of
error?
68) Two students studying BES I are worried about the failure rate.
a) The first has no idea what the failure rate is so takes a random sample of 100 students of whom 28 failed. Based on the sample evidence, calculate the 90%
confidence interval and interpret it. Would this student conclude that the failure rate is 20%?
b) The second student has a prior belief that the fail rate is 20%. Set up and test the appropriate hypotheses, at =0.05 using the same sample as above.
Would this student conclude that the fail rate is 20%?
c) In your own words, compare your answers to part (a) and (b) above - do they reach the same conclusion? Why or why not?
Vivian Piovesan Cavaiuolo 2015 Page 66
69) The Advertiser Voteline* on January 25th 2012 p 17 reported the results of the
poll of readers from the preceding day; the question was: Should Unis teach alternative medicine?
The Advertiser January 25th 2012 p 17 Of the 52 callers, 32 said YES and the remaining 20 said NO.
a) Assuming this was a properly conducted random sample, test whether a majority of people agree with the question (Should Unis teach alternative
medicine?); do this by calculating and using the appropriate p-value. b) Repeat the hypothesis test, this time using the critical value method.
i) Use the 5% level of significance. Provide a sketch.
ii) Use the 1% level of significance. Provide a sketch. c) Your answers in the 2 parts of part (b) above will differ – explain what is going
on here. d) The Advertiser is Adelaide’s daily newspaper; the Voteline consists of a topical
question posed each day. Respondents phone in and either agree or disagree.
The results are published the next day. Comment on the assumption in part a, that this was a properly conducted
random sample.
70) SDVV Chapter 10, Exercise 28, page 355 PLUS EXTRA PART IN BOLD A billing company that collects bills for doctor’s offices in the area is concerned that the percentage of bills being paid by Medicare has risen. Historically, that percentage
has been 31%. An examination of 8,368 recent bills reveals that 32% of these bils are being paid by Medicare. Is this evidence of a change in the percent of bills being
paid by Medicare? (a) Write appropriate hypotheses.
(b) Check the assumptions and conditions. (c) Perform the test and find the p-value. (d) State your conclusion.
(e) Do you think this difference is meaningful? Explain. (f) Interpret the p-value in the context of this question.
71) SDVV Chapter 12 34 page 419 PLUS EXTRA PART IN BOLD
Production managers on an assembly line must monitor the output to be sure that the level of defective products remains small. They periodically inspect a random sample of the items produced. If they find a significant increase in the proportion of
items that must be rejected, they will halt the assembly process until the problem can be identified and repaired. Write the null and alternative hypotheses for this
problem. (Hint: the population proportion of defective parts is not given, so must be a worded set of hypotheses.) Continued on next page.
Vivian Piovesan Cavaiuolo 2015 Page 67
a) In this context, what is a Type I error?
b) In this context, what is a Type II error? c) Which type of error would the factory owner consider more serious?
d) Which type of error might customers consider more serious? 72) A factory believes that the average cost of finishing a part after it comes out of
the mould is $260. A new design is supposed to lessen this average cost and so, to see if this is the case, a random sample of 27 parts built to the new design is taken
and their finishing costs measured. The sample has a mean of $253.80 and the standard deviation of $20.
a) What are the null and alternative hypotheses?
b) What conditions or assumptions will you need to assume in order to carry out inference?
c) Test the hypothesis at the 5% level. d) State your conclusion.
e) If the new design actually works have you made an error? If so, what kind of error?
f) What 2 things could you do to decrease the probability of making this kind of
error? g) Suppose that this study was done by the engineer who came up with this new
design and is keen to prove that their new design is better than the old one. Explain how they could modify
(1) The way they conducted the test and took the sample (2) The way they did the inference in order to “prove” what they wanted to.
h) Explain why the manufacturer should not proceed as suggested in part (g).
73) Insurance companies track life expectancy information to assist in determining the cost of life insurance policies. Last year, the average life expectancy was 77 years. A particular insurance company wants to determine if their clients have a
longer life expectancy, on average, so they randomly sample 20 of their recently paid policies and find the sample mean was 78.6 years with a standard deviation of 4.48
years. Is there significant evidence that life expectancy has increased? a) What are the null and alternative hypotheses?
b) What conditions or assumptions will you need to assume in order to carry out inference?
c) Test the hypothesis at the 5% level.
d) State your conclusion.
74) SDVV Chapter 12 Exercise 6 page 416 PLUS EXTRA PART IN BOLD For each of the following situations, find the critical value for z or t.
Draw a picture for each of parts a-f, labelling the rejection region(s). (a) Ho: µ = 105 vs HA: µ ≠ 105 at α = 0.05, n = 61, σ unknown
(b) Ho: p = 0.05 vs HA: p > 0.05 at α = 0.05
(c) Ho: p = 0.6 vs HA: p ≠ 0.6 at α = 0.01
(d) Ho: p = 0.5 vs HA: p < 0.5 at α = 0.01, n = 500
(e) Ho: p = 0.2 vs HA: p < 0.2 at α = 0.01
(f) Ho: µ = 10 vs HA: µ > 10 at α = 0.05, n = 30, σ unknown
Vivian Piovesan Cavaiuolo 2015 Page 68
75) SDVV Chapter 12 Exercise 12 page 416
For each type of the following scenarios, state whether a Type I, a Type II, or neither error has been made. (a) A test of Ho: µ = 20 vs HA: µ > 20 rejects the null hypothesis. Later it is discovered that
µ = 19.9 (b) A test of Ho: p = 0.7 vs HA: p < 0.7 fails to reject the null hypothesis. Later it is
discovered that p = 0.8
(c) A test of Ho: p = 0.4 vs HA: p ≠ 0.4 rejects the null hypothesis. Later it is discovered
that p = 0.55
(d) A test of Ho: p = 0.6 vs HA: p < 0.6 fails to reject the null hypothesis. Later it is
discovered that p = 0.5
76) SDVV Chapter 12 Exercise 16 page 417 Analysts evaluating a new program to encourage customer retention in a test market
find no evidence of an increased rate of retention in a test of 2,000 customers. They based this conclusion on a test using α = 0.01.
Would they have made the same decision at α = 0.05.? How about α = 0.001?
Explain.
77) SDVV Chapter 14 Exercise 12 page 496 PLUS EXTRA PARTS IN BOLD
To complete the poll reported in Exercise 9, (in Chapter 14 of text), Pew research surveyed respondents by telephone, drawing a random sample of landlines and
another random sample of cell phones. For those numbers that were valid, they report the following:
Are the results they find independent of the telephone type?
(i) Write the hypotheses.
(ii) Check the conditions.
(a) Under the usual null hypothesis, what are the expected values? (b) Compute the ² statistic.
(c) How many degrees of freedom does it have. (d) What do you conclude.
(e) Standardize the cell’s residual for Land and No Answer/Busy. Briefly comment.
Land Cell Total
No Answer/Busy 552 42 594
Voicemail 3347 2843 6190
Contact 8399 8612 17,011
Total 12,298 11,497 23,795
Vivian Piovesan Cavaiuolo 2015 Page 69
78) An online bookstore wants to determine if coupon redemption is independent of
gender. After a special coupon broadcast to its reward members, the following data on coupon redemption at checkout were collected.
Coupon redeemed?
Yes No Total
Gender Male 66 66 132
Female 125 74 199
Total 191 140 331
Perform the appropriate hypothesis test at the 5% level of significance, checking conditions.
79) A manufacturing plant for recreational vehicles receives shipments from three different parts vendors. There has been a defect issue with some of the electrical
wiring in the recreational vehicles manufactured at the plant. The plant manager believes that the defect issue is the fault of parts received from the plant’s parts
vendors. The plant manager reviews a sample of quality assurance inspections from the last six months.
Parts vendors
Perfect
Parts Co.
Made-4-U
Co.
25 Hours
Parts Co.
Part rejected 53 48 70
Part perfect 93 71 88
Part not perfect but acceptable 22 31 22
Perform the appropriate hypothesis test at the 5% significance level, checking required conditions.
Vivian Piovesan Cavaiuolo 2015 Page 70
80) The management of a chain of package delivery stores would like to develop a
model for predicting the weekly sales (Y, in thousands of dollars) for individual stores based on the number of customers who made purchases (X). A random
sample of 20 stores was selected from among all stores in the chain, and the following is the EXCEL output for a linear regression.
a) Write down the estimated linear regression equation.
b) Write down and interpret the correlation coefficient.
c) Write down and interpret the coefficient of determination.
d) Interpret the slope.
e) Test the significance of the slope using = 0.01.
Justify your choice between a one-tail and a two-tail alternative hypothesis.
f) Write down and interpret the 95% confidence interval for the intercept.
Vivian Piovesan Cavaiuolo 2015 Page 71
81) Excel was used to create a linear relationship between the capacity of disc
drives, in terabytes, (TB) and price (in dollars) based on a sample of disc drives. Here is the output. SUMMARY OUTPUT
Regression Statistics
Multiple R 0.933548
R Square 0.871512
Adjusted R Square0.86657
Standard Error 40.41969
Observations 28
ANOVA
df SS MS F Significance F
Regression 1 288117.653 288117.7 176.3534 4.3E-13
Residual 26 42477.532 1633.751
Total 27 330595.185
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%
Intercept 8.069286 13.6444352 0.591398 0.559361 -19.9773 36.11582
Capacity (TB) 73.44357 5.53046709 13.27981 4.3E-13 62.07553 84.81161
0
100
200
300
400
0 50 100 150
Pri
ce ($
)
Sample Percentile
Normal Probability Plot
-100
-50
0
50
100
0 2 4 6Re
sid
ual
s
Capacity (TB)
Capacity (TB) Residual Plot
0
100
200
300
400
0 1 2 3 4 5
Pri
ce ($
)Capacity (TB)
Scatterplot of Price ($) against Capacity (TB)
(a) Check, to the extent possible, the regression conditions.
(b) Write down the estimated regression equation. (c) Interpret the intercept. Does this make sense in the context of this question?
(d) Interpret the slope. (e) Write down and interpret the p-value of the slope.
(f) Test the significance of the slope against a suitable alternative hypothesis, justifying your choice of alternative hypothesis. (g) Test whether the intercept is significantly different from zero, using the 5% level of
significance. (h) Test whether the slope is significantly greater than 70, at the 5% level of
significance. (i) When testing the significance of the population correlation coefficient, the
calculated t statistic is 13.27981. What else has the same calculated t statistic? Explain why it is not surprising that these two have the same calculated t-statistic.
Vivian Piovesan Cavaiuolo 2015 Page 72
82) Nutritional information was collected on a number of muesli bars to investigate
a possible relationship between the number of calories and the protein content, (in grams) per serve. Excel output is provided below. SUMMARY OUTPUT
Regression Statistics
Multiple R 0.81815
R Square 0.66937
Adjusted R Square 0.662183
Standard Error 32.82055
Observations 48
ANOVA
df SS MS F Significance F
Regression 1 100316.828 100316.828 93.12839453 1.24701E-12
Residual 46 49550.6671 1077.18842
Total 47 149867.495
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%
Intercept 95.50172 9.70662211 9.83882099 6.8067E-13 75.96330041 115.040134
Protein 12.53007 1.29841164 9.65030541 1.24701E-12 9.91650179 15.1436359
-100
-50
0
50
100
0 5 10 15 20Re
sid
ual
s
Protein
Protein Residual Plot
0
100
200
300
400
0 5 10 15 20
Cal
ori
es
Protein
Scatterplot of Calories against Protein
0
100
200
300
400
0 50 100 150
Cal
ori
es
Sample Percentile
Normal Probability Plot
0
5
10
15
-60
-40
-20 0 20 40 60
Mo
re
Fre
qu
en
cy
Bin
Histogram of Residuals
Frequency
Vivian Piovesan Cavaiuolo 2015 Page 73
(a) Check, to the extent possible, the regression conditions.
(b) Write down the estimated regression equation. (c) Interpret the slope.
(d) Test the significance of the slope against a suitable alternative hypothesis, justifying your choice of alternative hypothesis. (e) Test whether the intercept is significantly different from zero, using the 5% level
of significance. (f) Test whether the slope is significantly greater than 10, at α of 5%.
(g) When testing the significance of the population correlation coefficient, the calculated t-statistic is 9.6503. What else has the same calculated t statistic? Explain why it is not surprising that these two have the same calculated t-statistic.
83) SDVV Chapter 16 Exercise 32 page 575
Each of the following scatterplots (a) to (d) shows a cluster of points, and one “stray” point. For each, answer questions (1) to (4).
(HINT: answer as follows (a): 1 to 4; (b): 1 to 4 etc.) (1) In what way is the point unusual? Does it have high leverage, a large
residual, or both?
(2) Do you think that point is an influential point? (3) If that point were removed from the data, would the correlation become
stronger or weaker? Explain. (4) If that point were removed from the data, would the slope of the
regression line increase, decrease or remain the same? Explain.
84) Below are residual plots for 3 separate linear regressions. Tell what each of the
following residual plots indicate about the appropriateness of the linear model that was fit to the data.
-15
-10
-5
0
5
10
15
0 2 4 6 8 10 12
(a) (b) (c)
-10
-5
0
5
10
15
0 5 10 15
-10
-5
0
5
10
15
0 5 10 15
Vivian Piovesan Cavaiuolo 2015 Page 74
85) This question relates to shows on Broadway for most weeks of 2006-2008 and is based on SDVV Chapter 17, Exercise 12, page 614.
Use the computer output below, which differs slightly from the output shown in the text, because of rounding. The response variable is Receipts, $m, and the explanatory variables are Paid Attendance, (thousands), #Shows (the number of shows) and Average Ticket Price, ($).
Vivian Piovesan Cavaiuolo 2015 Page 75
a) Check, to the extent possible, the regression conditions. b) If we found a simple linear regression to predict receipts only from paid
attendance, what would the R2 of that regression be?
c) Write out the multiple linear regression model. d) What does the coefficient on average ticket price mean in this regression?
Does that make sense? e) Estimate the receipts in a week in which the paid attendance was 300,000
customers attending 35 shows at an average ticket price of $70. f) Is this a good prediction? Why do you say that? g) Test the significance of the coefficient on shows at the 5% level.
h) Test the overall significance of the equation.
86) SDVV Chapter 17 Exercise 2 page 612 A candy maker surveyed chocolate bars available in a local supermarket and found the following least squares regression model, (ie: linear regression model):
Estimated calories = 28.4 +11.37 Fat (g) + 2.91 Sugar (g).
(a) The hand-crafted chocolate she makes has 15g of fat and 20g of sugar. How many calories does the model predict for a serving?
(b) In fact, a laboratory test shows that her candy has 227 calories per serving. Find
the residual corresponding to this candy. (Be sure to include the units).
(c) What does that residual say about her candy?
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.999
R Square 0.999
Adjusted R Square 0.999
Standard Error 0.093
Observations 78
ANOVA
df SS MS F Significance F
Regression 3 484.788 161.596 18633 2.122E-106
Residual 74 0.642 0.009
Total 77 485.430
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -18.320 0.3127 -58.6 0.000 -18.943 -17.697
Paid Attendance 0.076 0.0006 120.8 0.000 0.075 0.077
Shows 0.007 0.0044 1.6 0.116 -0.002 0.016
Avg Ticket Price 0.24 0.0039 61.0 0.000 0.231 0.246
Vivian Piovesan Cavaiuolo 2015 Page 76
87) Continuing with the muesli bar example. Another variable was added to the regression; dietary fibre (in grams). This was to
investigate a possible relationship between the number of calories, protein (in grams) and the dietary fibre content, (in grams) per serve. Assume that the conditions for inference with regression have been satisfied. Excel output is provided below. SUMMARY OUTPUT
Regression Statistics
Multiple R 0.819285677
R Square 0.671229021
Adjusted R Square 0.656616977
Standard Error 33.08980474
Observations 48
ANOVA
df SS MS F Significance F
Regression 2 100595.4 50297.7059 45.93669738 1.34928E-11
Residual 45 49272.08 1094.93518
Total 47 149867.5
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 92.7230491 11.23018 8.25659157 1.46317E-10 70.10429586 115.341802
Protein 12.41686809 1.328161 9.34891453 4.10306E-12 9.741813568 15.0919226
Fibre 1.125161988 2.230648 0.50441032 0.61643412 -3.367594182 5.61791816
(a) Write out the regression equation. (b) Interpret the coefficient of fibre. (c) If a muesli bar has 5 grams of protein and 3 grams of fibre, how many calories is
it estimated to contain? (d) Is this a good prediction? Explain. (e) Test the significance of the coefficient of fibre at α of 5%.
(f) Test the overall significance of the equation.
88) Here is the regression of ross Revenue of movies (in $millions) on the budget (in $millions) of the movie and an indicator (dummy) variable, Comedy; this takes the
value 1 for movies that are comedies and 0 otherwise.
Estimated Revenue = −7.03913 + 1.00428 Budget($m) + 25.4175 Comedy
a) Write out the regression equation for movies that are comedies and the regression equation for all non-Comedy movies.
b) Sketch the 2 equations from (a) above.
c) Interpret the estimated coefficient on the variable Comedy. d) Predict the gross revenue of a movie that is a comedy, which had a budget of
only $890,000.
Vivian Piovesan Cavaiuolo 2015 Page 77
89) Consider the following estimated regression equation which models the
expenditure on food of single people: Estimated Exp = 5060.24 + 528.99D + 0.089Inc
where Exp is annual expenditure on food ($) D is a dummy with D = 1 for males and D = 0 for females
Inc is annual after tax income ($) a) Write out the separate equations for males and for females. b) Sketch the 2 equations from (a) above.
c) Interpret the estimated coefficient on the variable D.
d) Predict expenditure on food for a male with an annual income of $100 000.
90) Continuing with the muesli bar example. On the market are a number of muesli bars containing chocolate. We would like to
investigate how chocolate together with protein and dietary fibre, influence the calorie content of muesli bars. A dummy variable for the chocolate variable was
added to the regression; where the variable chocolate takes the value of 1 if the muesli bar contained chocolate; or takes the value of 0, if the muesli bar did not contain chocolate. Assume the conditions for regression are satisfied.
Excel output is provided below.
Table of Correlation Coefficients Protein Fibre Calories Chocolate
Protein 1
Fibre 0.16897192 1
Calories 0.818150446 0.180739 1
Chocolate 0.196456524 -0.24895 0.25773578 1 SUMMARY OUTPUT
Regression Statistics
Multiple R 0.82754132
R Square 0.68482463
Adjusted R Square 0.66333541
Standard Error 32.764498
Observations 48
ANOVA
df SS MS F Significance F
Regression 3 102632.952 34211 31.8683 4.1801E-11
Residual 44 47234.5424 1073.5
Total 47 149867.495
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 87.247 11.809 7.388 0.000 63.447 111.046
Protein 11.949 1.358 8.798 0.000 9.212 14.687
Fibre 2.054 2.309 0.889 0.379 -2.600 6.708
Chocolate 14.207 10.312 1.378 0.175 -6.576 34.989
Vivian Piovesan Cavaiuolo 2015 Page 78
(a) If we found a regression to predict calories only from fibre, what would the R of
that regression be? (b) If we found a regression to predict calories only from chocolate, what would the
R2 of that regression be? (c) Write out the regression model. (d) What does the coefficient on chocolate mean in this regression? Does it make
sense? Explain. (e) Estimate the number of calories for a muesli bar that has 4 grams of protein, 3
grams of fibre and contains chocolate. (f) Is this a good prediction? Why do you say that? (g) Test the significance of the coefficient of chocolate at α of 5%.
(h) Test the overall significance of the equation using the F test.
91) This table shows a Laspeyres index number which has been calculated to
measure the change in prices of all inputs used in a production process.
Year 2005 2007 2009 2011
Index Number 100 149 157 193
a) Interpret the value for 2007. b) Change the base of the index number to 2009. c) In this example, would you expect a Paasche price index number to be higher
or lower than the corresponding Laspeyres price index number? Explain your answer.
92) The data in this question are the number of people entering Australia from New
Zealand for short term stays, quarterly, 1991 to 2012. The source is ABS, 3401.0 Overseas Arrivals and Departures, Australia, Table 5: Short-term Movement, Visitor
Arrivals---Selected Countries of Residence: Original. The plot below is from EXCEL:
0
50000
100000
150000
200000
250000
300000
350000
Ma
r-1
99
1
Ma
y-1
99
2
Jul-
19
93
Se
p-1
99
4
No
v-1
99
5
Jan
-19
97
Ma
r-1
99
8
Ma
y-1
99
9
Jul-
20
00
Se
p-2
00
1
No
v-2
00
2
Jan
-20
04
Ma
r-2
00
5
Ma
y-2
00
6
Jul-
20
07
Se
p-2
00
8
No
v-2
00
9
Jan
-20
11
Ma
r-2
01
2Visitor Arrivals: New Zealand
Vivian Piovesan Cavaiuolo 2015 Page 79
a) Describe which of the 4 components of time series you see in this chart and
which you do not see. b) The trend equation is given by Trend = 303,027 + 2509 t
where t is time in quarters, with origin March quarter 2011. Interpret the trend equation.
c) Calculate the trend for the March quarter of 2014.
d) Is your answer in part (c) likely to be a good estimate of the number of people entering Australia from New Zealand for short term stays in the March quarter
of 2014? Explain. 93) You are the manager of a large Australian seaside resort and have used EXCEL
to create the following multiple linear regression model:
Estimated Occupancy = 285 + 142Q1 – 197Q2 + 250Q4 + 5t with r = 0.8175 where the origin is March 1998. Q1 is the dummy variable for the March quarter,
where Q1 = 1 if the quarter is March, otherwise it is 0. Q2 is the dummy variable for the June quarter and Q4 is the dummy variable for December quarter, similarly.
(a) Describe which of the 4 components of time series you see in this chart and
which you do not see. (b) Interpret the coefficient of determination.
(c) Interpret the coefficient of trend in the regression model (d) Why is there no dummy variable for the September quarter? (e) Interpret the coefficient of the December quarter?
(f) The coefficients of the dummies for Q1 and Q4 are positive, but Q2 has a negative coefficient. Is this a mistake? Explain in the context of this question.
(g) Predict the occupancy of this resort in the March quarter of 2016. (h) Predict the occupancy of this resort in the September quarter of 2016.
(i) Using your predictions in the previous parts, how much of the difference between your predictions of occupancy in March 2016 and in September of 2016, is due to trend and how much of the difference is due to seasonal effects?
0
100
200
300
400
500
600
700
800
900
Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3
Hotel occupancy of Large Australian Seaside Resort origin March 1998
Vivian Piovesan Cavaiuolo 2015 Page 80
94) We have monthly number of new passenger cars sold in SA.
Source: ABS, 9314.0, Table 2. The EXCEL output below has a time series plot and a linear regression model. The extra variables are
Month which measures time in months with origin 0 in January 2010 and dummies for each month, so Feb = 1 if the month is February, 0 otherwise, Mar = 1 if the month is March, 0 otherwise and so on.
a) Why is there no dummy variable for January in the model? b) What does the coefficient on Month tell you?
c) The coefficients on all the dummies are positive. Is this a mistake? What does it mean if all these coefficients are positive?
d) Forecast sales of new passenger cars in SA for December 2013 and January 2014.
e) OPTIONAL: See if you can find the actual sales for either of these months: go to the ABS website (abs.gov.au) and find
the statistics on passenger vehicle sales. You can try typing “passenger vehicles “into the search box (or there other ways to search).
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.66
R Square 0.44
Adjusted R Square 0.41
Standard Error 311.12
Observations 228
ANOVA
df SS MS F Significance F
Regression 12 16122509.9 1343542.5 13.9 3.8283E-21
Residual 215 20811471.7 96797.5
Total 227 36933981.6
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95%
Intercept 2740.23 76.08 36.02 0.00 2590.27 2890.19
Month 2.03 0.31 6.48 0.00 1.41 2.65
February 442.39 100.94 4.38 0.00 243.43 641.35
March 596.93 100.94 5.91 0.00 397.97 795.90
April 117.32 100.95 1.16 0.25 -81.65 316.29
May 526.76 100.95 5.22 0.00 327.79 725.74
June 930.73 100.95 9.22 0.00 731.74 1129.72
July 442.96 100.96 4.39 0.00 243.96 641.96
August 514.67 100.97 5.10 0.00 315.66 713.67
September 460.00 100.97 4.56 0.00 260.98 659.02
October 526.76 100.98 5.22 0.00 327.72 725.80
November 640.94 100.99 6.35 0.00 441.88 839.99
December 537.38 101.00 5.32 0.00 338.30 736.45
Vivian Piovesan Cavaiuolo 2015 Page 81
95) This question is to give you a feel for how you would use EXCEL in time series analysis. However to keep the problem manageable, we will only use a small data
set. Be aware that in practice we often use huge datasets! Use the data on the number of live sheep exported, in thousands, quarterly, from ABS, 7215.0 Table 6. We will use only the data for 2003 to 2012 inclusive.
a) Go to the ABS website, abs.gov.au and choose statistics, then by catalogue number (since we know the catalogue number here); choose 7, then 72 and
then 7215.0. Go to downloads and download Table 6. b) Copy the data for 2003 to 2012 inclusive for sheep (first column). c) Construct a line fit plot of the data.
d) Comment on which of the components of time series you see in this plot. e) Construct dummy variables for the quarters. To avoid the dummy variable trap,
we need 3 so let’s take dummies for June, September and December quarters. Do this by simply creating new variables with names in the first row and then
type in 0 or 1 as appropriate. For example, for a June quarter observation, you will type 1 for the June dummy, 0 for the September and December dummies.
f) Construct a new variable called t for time in quarters. Make the origin the March
quarter of 2010: that is, make t=0 for Mar 2010. g) Run the regression as usual, including the dummy variable as well as the time
variable. h) Write out your estimated equation.
i) Interpret the coefficient on t. j) Interpret the coefficient on the September dummy. k) Comment on the seasonal pattern of your data.
l) Now consider December 2004. What was the actual value then? What would your model have predicted? What is the irregular component for that month?
96)
The number of employees in a small firm is given by
Number = 12 + Q1 − 2Q2 + 3Q4 + 1.5t
where t is time in quarters, with origin March 2011 Q1 is a dummy for the March quarter Q2 is a dummy for the June quarter
Q4 is a dummy for the December quarter
a) Write out the equations for each of the 4 quarters. b) Sketch the 4 equations from (a).
c) Comment on the seasonal pattern to the number of employees. d) Forecast the number of employees for all quarters of 2013. 2014 and 2015. e) Plot the forecasts in part (d) above against time and comment on the plot.
f) Consider the estimates for the December quarter of 2014 and the June quarter of 2015. How much of the difference between these two estimates is due to
seasonal effects and how much is due to trend?
Vivian Piovesan Cavaiuolo 2015 Page 82
97) Given the price index number for a particular item:
Year 2007 2008 2009 2010 2011 2012
100 112.5 117 119 131 152
a) Change the base to 2010. b) Calculate the % price increase from 2010 to 2012 by using the index in (a).
c) Calculate the % price increase from 2010 to 2012 by using the original index in the table above.
d) Compare your answers to (b) and (c).
98) This table lists values of two index numbers of median annual family income in XYZCountry:
a) Calculate the missing values of each index. b) Interpret the 2012 value for each index.
c) Can you say that one index is better than the other? Why or why not?
99) Data on the cost of living in Adelaide is provided for the guidance of international visitors. Here are some prices and costs for 2 years, 2005 and 2009.
Price ($) Cost ($ per
month)
2005 2009 2005 2009
Bowl of noodles 4.50 5.50 45 66
Slice of pizza 3.00 5.00 39 50
Bottle of water (600ml) 2.00 2.00 40 42
Takeaway coffee 2.50 3.00 25 27
McDonalds Big Mac 3.30 3.60 23.10 28.80
a) Calculate simple price relatives for the 4 items for 2009 using 2005 as base. b) Calculate the unweighted average of the simple price relatives calculated in (a).
What does this index mean? c) Compute a Laspeyres price index for 2009 with 2005 as base. d) Interpret the index you calculated in (c) above.
e) Compute a Paasche price index for 2009 with 2005 as base. f) Compute a Fisher price index for 2009 with 2005 as base.
g) Arrange the Laspeyres, Paasche and Fisher indexes in order; is this ordering what you expected?
Year Base 2005 index Base 2010 index
2005 100
2006 103
2007 105
2008 107
2009 110 98
2010 100
2011 101
2012 105
2013 107
Vivian Piovesan Cavaiuolo 2015 Page 83
100)
This table shows a Laspeyres index number which has been calculated to measure the change in prices of all inputs used in a production process.
a) Interpret the value for 2011. b) What has been the percentage change in input prices between 2011 and 2012? c) Change the base of the index number to 2012.
d) State ONE reason why you might want to change the base of an index number. e) Would you expect a Paasche index to be higher or lower? Explain your answer.
Year 2009 2010 2011 2012
Price index 100 121.8 132.5 142.0