ST104A_03_June

download ST104A_03_June

of 21

Transcript of ST104A_03_June

  • 8/18/2019 ST104A_03_June

    1/21

     © University of London 2015UL15/ 0850 Page 1 of 21 D1

    ~~ST104A ZA d0

    This paper is not to be removed from the Examination Halls

    UNIVERSITY OF LONDON ST104A

    BSc degrees and Diplomas for Graduates in Economics, Management, Financeand the Social Sciences, the Diplomas in Economics and Social Sciences andAccess Route

    Statistics 1

    Wednesday, 3 June 2015 : 10:00 to 12:00

    Candidates should answer THREE of the following FOUR questions: QUESTION 1 ofSection A (50 marks) and TWO questions from Section B (25 marks each). Candidatesare strongly advised to divide their time accordingly.

    A list of formulae and extracts from statistical tables are provided after the final questionon this paper.

    Graph paper is provided at the end of this question paper. If used, it must be detachedand fastened securely inside the answer book.

    A calculator may be used when answering questions on this paper and it must complyin all respects with the specification given with your Admission Notice. The make andtype of machine must be clearly stated on the front cover of the answer book.

    PLEASE TURN OVER

  • 8/18/2019 ST104A_03_June

    2/21

    SECTION A

    Answer   all  parts of Question 1 (50 marks in total).

    (a) Classify each one of the following variables as either measurable (continuous) orcategorical. If a variable is categorical, further classify it as either nominal or ordinal.Justify your answer. (Note that no marks will be awarded without a justification.)

    i. The manufacturer of a car.

    ii. The amount of money in a bank account.

    iii. The Gross Domestic Product (GDP) of a country.

    iv. The rating of a hotel according to the number of stars it has.

    [8 marks]

    (b) Consider the following sample dataset:

    4, x,   8,   7,   2

    You are told that the value of the sample mean is 5.

    i. Calculate the value of  x.

    ii. Find the sample variance.

    [4 marks]

    (c) The salaries of the employees of a company are normally distributed with mean£25, 000 and a standard deviation of  £10, 000.

    i. What is the proportion of employees with a salary of at least  £20, 000?

    ii. What is the proportion of employees with salaries between £15, 000 and £35, 000?

    [4 marks]

    (d) Suppose that x1   = −

    3,  x2   = 5,   x3   = 5,  x4   = −

    1,  x5   = 2, and  y1   = 1,   y2   = −

    4,y3 = 5,  y4 = −1,  y5 = 2. Calculate the following quantities:

    i.i=5

    i=3

    2xi   ii.i=4

    i=2

    3(yi − 3) iii.   y2

    4 +

    i=3

    i=1

    (2xi + y2

    i).

    [6 marks]

    UL15/0217D00

    Page 2 of 6

    UL15/0850 Page 2 of 21

  • 8/18/2019 ST104A_03_June

    3/21

    (e) The variable X  takes the values 2, 4, 6 and 8 according to the following distribution

    x   2 4 6 8 pX (x) 0.3 0.2 0.1 0.4

    i. What is the probability that  X   is an odd number?

    ii. Find E(X ), the expected value of  X .

    iii. Find the probability that X/2 >  3.

    [5 marks]

    (f) You toss two fair dice independently.

    i. What is the probability that both numbers are sixes?

    ii. What is the probability that both numbers are odd?ii. You are now told that the first one of them shows a two. What is the probability

    in this case that both are twos?

    [4 marks]

    (g) It is stated in a consumer magazine that the average price of football shirts inLondon is  £19.00. A random sample is taken by obtaining a single football shirtfrom each of 16 randomly chosen London retailers. The sample mean is   £20.20and the sample standard deviation is  £2.40. Carry out a hypothesis test, at two

    appropriate significance levels, to determine whether the price of football shirts inLondon is more expensive than the price stated in the consumer magazine. Stateyour hypotheses, the test statistic and its distribution under the null hypothesis,and your conclusion in the context of the problem.

    [7 marks]

    (h) State whether the following are true or false and give a brief explanation. (Note that no marks will be awarded for a simple true/false answer.)

    i. The chance that a normal random variable is less than two standard deviationsfrom its mean is 99%.

    ii. The lower the regression coefficient in absolute value the weaker the correlation.

    iii. Increasing the sample size will increase the width of a confidence interval for apopulation mean (assuming that everything else remains constant).

    iv. When testing a hypothesis, we use a two tailed test if we want to test whetherthe parameter is greater than what is stated in the null hypothesis.

    v. A population list is needed in order to conduct quota sampling.

    vi. The regression of the variable Y  on the variable  X  will always have the sameslope as the regression of the variable  X  on the variable  Y  .

    [12 marks]

    UL15/0217D00

    Page 3 of 6

    UL15/0850 Page 3 of 21

  • 8/18/2019 ST104A_03_June

    4/21

    SECTION B

    Answer   two  questions from this section (25 marks each).

    2. (a) Questionnaires were mailed to 300 households, in three diff erent areas of a city,to assess the level of local sporting facilities. The collected data are shown inthe table below

    Sporting Facilities LevelVery good Fairly good Poor Total

    Area 1 44 30 26 100Area 2 29 26 45 100Area 3 45 28 27 100Total 118 84 98 300

    i. Based on the data in the table, and without conducting a significance test ,would you say there is an association between areas and level of localsporting facilities?

    ii. Calculate the   χ2 statistic and use it to test for independence, using twoappropriate significance levels. What do you conclude?

    [14 marks]

    (b) i. Provide the definition of simple random sampling and cluster samplingdesigns.

    ii. Why might a researcher prefer cluster sampling rather than simple randomsampling?

    iii. Name one other random sampling scheme, provide its definition and oneof its advantages.

    [11 marks]

    UL15/0217D00

    Page 4 of 6

    UL15/0850 Page 4 of 21

  • 8/18/2019 ST104A_03_June

    5/21

    3. The following data shows the recorded times (y) in seconds taken by 10 internationalathletes to run 100 metres together with the corresponding wind speeds (x) atthe time of running. A positive wind speed indicates the wind is in the directionof running and therefore considered to be helpful whereas a negative wind speedindicates the wind is against the runner.

    Athlete #1 #2 #3 #4 #5 #6 #7 #8 #9 #10x   -2.45 -1.23 -0.78 -0.33 -0.37 0.34 0.53 1.17 2.35 2.91y   10.52 10.47 10.41 10.25 10.54 10.09 10.30 9.99 9.92 9.87

    The summary statistics for these data are:

    Sum of  x data: 2.14 Sum of the squares of  x data: 24.13Sum of  y  data: 102.36 Sum of the squares of  y  data: 1048.34

    Sum of the products of  x and  y  data: 18.56

    (a) i. Draw a scatter diagram of these data on the graph paper provided. Labelthe diagram carefully.

    ii. Calculate the sample correlation coefficient. Interpret your findings.

    iii. Calculate the least squares line of  y  on x and draw the line on the scatterdiagram.

    iv. Based on the regression equation in part (iii.), what will be the predictedtime for a runner for a wind speed of 1.5? Will you trust this value? Justifyyour answer.

    [13 marks]

    (b) Behavioural researchers have developed an index designed to measuremanagerial success. Of interest is whether there is a diff erence in averagemanagerial success based on the level of interaction with people outside amanager’s immediate work unit. Managers in group 1 engage in a highvolume of interactions with people outside their work unit, while those in group2 rarely do. The data are summarised in the table below:

    Sample size Sample mean Sample standard deviationGroup 1 22 65.33 6.61

    Group 2 25 61.58 5.37i. Carry out a hypothesis test to determine whether the mean managerial

    success index scores are diff erent between the two groups. Test at twosuitable significance levels, stating clearly the hypotheses, the test statisticand its distribution under the null hypothesis. Comment on your findings.

    ii. State clearly any assumptions you made in (i.).

    iii. Adjust the procedure above to determine whether the mean managerialsuccess for managers who have a high volume of interactions with peopleoutside their work unit is higher than that of those who rarely do.

    [12 marks]

    UL15/0217D00

    Page 5 of 6

    UL15/0850 Page 5 of 21

  • 8/18/2019 ST104A_03_June

    6/21

    4. (a) The following data show the length (in inches) of fish caught in one day in ariver:

    10.1 10.4 10.5 10.9 11.1

    11.2 11.2 11.5 11.7 11.912.1 12.1 12.2 12.2 12.312.4 12.5 12.6 12.8 12.913.2 13.4 13.5 13.6 13.714.3 14.5 14.8 15.2 15.5

    i. Carefully construct, draw and label a histogram of these data on the graphpaper provided.

    ii. Find the mean (given that the sum of the data is 376.3), the median andthe modal group.

    iii. Comment on the data given the shape of the histogram and the measuresyou have calculated.

    iv Name two other types of graphical displays that would be suitable torepresent the data.

    [12 marks]

    (b) In order to estimate the percentage of city households that have high speedinternet access, a random sample of 140 city households was taken. Of these,70 had high speed internet access. A similar sample of 170 rural householdswas also taken and it was found that 61 of them had high speed internet access.

    The data are summarised in the table belowCity Households Rural Households

    With high speed internet 70 61Total 140 170

    i. Give a 95% confidence interval for the diff erence between the proportionsof high speed internet access in city and rural households.

    ii. Carry out a hypothesis test, at two suitable significance levels, to determinewhether city households are more likely to have high speed internet accesscompared to rural households. State the test hypotheses, and specify yourtest statistic and its distribution under the null hypothesis. Comment onyour findings.

    iii. State any assumptions you made in (ii.).

    [13 marks]

    END OF PAPER

    UL15/0217D00

    Page 6 of 6

    UL15/0850 Page 6 of 21

  • 8/18/2019 ST104A_03_June

    7/21

    ST104a Statistics 1

    Examination Formula Sheet

    Expected value of a discrete randomvariable:

    µ = E(X ) =N i=1

     pixi

    Standard deviation of a discrete randomvariable:

    σ =√ σ2 =

      N i=1

     pi(xi − µ)2

    The transformation formula:

    Z  = X − µ

    σ

    Finding  Z  for the sampling distributionof the sample mean:

    Z  =X̄ − µσ/√ n

    Finding  Z  for the sampling distributionof the sample proportion:

    Z  =

      P 

     −π 

    π(1 − π)/n

    Confidence interval endpoints for asingle mean (σ   known):

    x̄±zα/2

    ·σ

    √ n

    Confidence interval endpoints for asingle mean (σ  unknown):

    x̄± tα/2, n−1 ·s√ n

    Confidence interval endpoints for asingle proportion:

     p± zα/2 ·  p(1 − p)

    n

    Sample size determination for a mean:

    n ≥ zα/22σ2

    e2

    Sample size determination for aproportion:

    n ≥ zα/22 p(1 − p)e2

    z  test of hypothesis for a single mean (σknown):

    Z  =X̄ − µ0σ/√ n

    t  test of hypothesis for a single mean (σunknown):

    T   =X̄ − µ0

    S/√ n

    1

    UL15/0850 Page 7 of 21

  • 8/18/2019 ST104A_03_June

    8/21

    z   test of hypothesis for a singleproportion:

    Z  ∼=   P  − π0 π0(1 − π0)/n

    z test for the difference between two means(variances known):

    Z  =X̄ 1 −  X̄ 2 − (µ1 − µ2) σ

    2

    1/n1

     + σ

    2

    2/n2

    t test for the difference between two means(variances unknown):

    T   =X̄ 1 −  X̄ 2 − (µ1 − µ2) 

    S 2 p (1/n1 + 1/n2)

    Confidence interval endpoints for thedifference between two means:

    (x̄1−x̄2)±tα/2, n1+n2−2 · s2 p

    1

    n1+

      1

    n2

    Pooled variance estimator:

    S 2 p  = (n1 − 1)S 21 + (n2 − 1)S 22

    n1 + n2 − 2

    t   test for the difference in means in

    paired samples:

    T   =X̄ d − µdS d/

    √ n

    Confidence interval endpoints for thedifference in means in paired samples:

    x̄d ± tα/2, n−1 ·sd√ n

    z   test for the difference between twoproportions:

    Z  =  (P 1 − P 2) − (π1 − π2)

     P (1 − P ) (1/n1 + 1/n2)

    Pooled proportion estimator:

    P   = R1 + R2n1 + n2

    Confidence interval endpoints for thedifference between two proportions:

    ( p1− p2)±zα/2·  p1(1 − p1)

    n1+ p2(1 − p2)

    n2

    χ2 test of association:

    ri=1

    c j=1

    (Oij − E ij)2E ij

    Sample correlation coefficient:

    r =

    ni=1

    xiyi − nx̄ȳ n

    i=1x2i − nx̄2

    n

    i=1y2i − nȳ2

    Spearman rank correlation:

    rs  = 1−

    6n

    i=1d2i

    n(n2 − 1)

    Simple linear regression line estimates:

    b   =

    n

    i=1xiyi − nx̄ȳ

    ni=1

    x2i − nx̄2

    a   = ȳ − bx̄2

    UL15/0850 Page 8 of 21

  • 8/18/2019 ST104A_03_June

    9/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 9 of 21

  • 8/18/2019 ST104A_03_June

    10/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 10 of 21

  • 8/18/2019 ST104A_03_June

    11/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 11 of 21

  • 8/18/2019 ST104A_03_June

    12/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 12 of 21

  • 8/18/2019 ST104A_03_June

    13/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 13 of 21

  • 8/18/2019 ST104A_03_June

    14/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 14 of 21

  • 8/18/2019 ST104A_03_June

    15/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 15 of 21

  • 8/18/2019 ST104A_03_June

    16/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 16 of 21

  • 8/18/2019 ST104A_03_June

    17/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 17 of 21

  • 8/18/2019 ST104A_03_June

    18/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 18 of 21

  • 8/18/2019 ST104A_03_June

    19/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 19 of 21

  • 8/18/2019 ST104A_03_June

    20/21

    Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.

    UL15/0850 Page 20 of 21

  • 8/18/2019 ST104A_03_June

    21/21