Advanced Data Analysis 1 Stat 427/527 - Clicker Questions€¦ · Advanced Data Analysis 1 Stat...

Advanced Data Analysis 1Stat 427/527

Clicker Questions

Erik B. Erhardt

Department of Mathematics and StatisticsMSC01 1115

1 University of New MexicoAlbuquerque, New Mexico, 87131-0001

Office: MSLC [email protected]

Fall 2014

[email protected]

Ch 00Introduction and

R+Rstudio

Erik B. Erhardt, UNM Stat 427/527, ADA1, Ch 00, Clicker Questions 2/1

Ch 0, Learning outcomesGeneral

Q 1. More generally, thinking of what you want toget out of your college education and this course,which of the following is most important to you?

A Acquiring factual knowledgeB Learning how to use knowledge in new

situationsC Developing skills to continue learning after

collegeMark this number on your sheet.


Ch 0, R building blocksSubset

Q 2. What value will R return for z?x <- 3:7

y <- x[c(1, 2)] + x[-c(1:3)]

z <- prod(y)

z

A 99

B 20

C 91

D 54

E NA


AnswerCh 0, R building blocks, Subset

x <- 3:7

x

## [1] 3 4 5 6 7

x[c(1, 2)]

## [1] 3 4

x[-c(1:3)]

## [1] 6 7

y <- x[c(1, 2)] + x[-c(1:3)]

y

## [1] 9 11

z <- prod(y)

z

## [1] 99


Ch 0, R building blocksT/F selection 1

Q 3. What value will R return for z?x <- seq(-3, 3, by = 2)

a <- x[(x > 0)]

b <- x[(x < 0)]

z <- a[1] - b[2]

z

A −2

B 0

C 1

D 2

E 6


AnswerCh 0, R building blocks, T/F selection 1

x <- seq(-3, 3, by = 2)

x

## [1] -3 -1 1 3

a <- x[(x > 0)]

a

## [1] 1 3

b <- x[(x < 0)]

b

## [1] -3 -1

z <- a[1] - b[2]

z

## [1] 2


Ch 0, R building blocksT/F selection 2

Q 4. What value will R return for z?a <- 2:-3

b <- a[(a > 0) & (a <= 0)]

d <- a[!(a > 1) & (a <= -1)]

z <- sum(c(b,d))

z

A −6

B −3

C 0

D 3

E 6


AnswerCh 0, R building blocks, T/F selection 2

a <- 2:-3a

## [1] 2 1 0 -1 -2 -3

a[(a > 0)]

## [1] 2 1

a[(a <= 0)]

## [1] 0 -1 -2 -3

b <- a[(a > 0) & (a <= 0)]b

## integer(0)

a[!(a > 1)]

## [1] 1 0 -1 -2 -3

a[(a <= -1)]

## [1] -1 -2 -3

d <- a[!(a > 1) & (a <= -1)]d

## [1] -1 -2 -3

z <- sum(c(b,d))z

## [1] -6


Ch 01Summarizing andDisplaying Data


Ch 1, Random variablesDarts

Q 5.Draw the following dart board: A dart board is constructedfrom three concentric circles with radii 1 inch, 2 inches, and 3inches, respectively. If a dart lands in the innermost circle,the player receives 4 points. If the dart lands between theinnermost circle and the middle circle, the player receives 2points. If the dart lands between the middle circle and theoutermost circle, the player receives 1 point. Assume thatthe probability of a dart landing in any particular region isproportional to the area of that region.Define the random variable X to be the sum of the player’sscore on two successive throws. Then X is what type ofrandom variable?

A discrete

B continuous

Ok49


AnswerCh 1, Random variables, Darts

(A). The possible values for X are 2, 3, 4, 5, 6, and8–countable number of values.Ok49


Ch 1, Random variables

Q 6.

ABCDE

Ok50


AnswerCh 1, Random variables,

Ok50


Ch 1, Random variablesRadioactive 1

Q 7.A radioactive mass emits particles at an averagerate of 15 particles per minute. Define the randomvariable X to be the number of particles emitted ina 10-minute time frame. Then X is what type ofrandom variable?

A discreteB continuous

Ok STT.04.03.030


AnswerCh 1, Random variables, Radioactive 1

(A). The possibles values for X are all integersbetween 0 and the number of particles in themass. Even if there were an infinite number ofparticles in the mass, this would still be a discreterandom variable, since the possible values arecountable (1, 2, 3, . . .).Ok STT.04.03.030


Ch 1, Random variablesRadioactive 2

Q 8.A radioactive mass emits particles at an averagerate of 15 particles per minute. A particle isemitted at noon today. Define the random variableX to be the time elapsed between noon and thenext emission. Then X is what type of randomvariable?

A discreteB continuous

Ok STT.04.03.040


AnswerCh 1, Random variables, Radioactive 2

(B) X can take on any positive value, which is anuncountable set of values.Ok STT.04.03.040


Ch 1, Numerical summariesUnemployment

Q 9.Many individuals, after the loss of a job, receive temporary pay(unemployment compensation) until they are re-employed. Consider thedistribution of time to reemployment as obtained in an employmentsurvey. One broadcast reporting on the survey said that the average timeuntil re-employment was 4.5 weeks. A second broadcast reported thatthe average was 9.9 weeks. One of your colleagues wanted a betterunderstanding of the situation and learned (through a Google search)that one report was referring to the mean and the other to the medianand also that the standard deviation was about 14 weeks. Knowing thatyou are a statistically-savvy person, your colleague asked you which ismost likely the mean and which is the median?

A 4.5 is the mean and 9.9 is the median.

B 4.5 is the median and 9.9 is the mean.

C Neither (A) nor (B) is possible given the SD of the data.

D I am not a statistically-savvy person, so how should I know?

Ok STT.01.02.020Erik B. Erhardt, UNM Stat 427/527, ADA1, Ch 00, Clicker Questions 19/1

AnswerCh 1, Numerical summaries, Unemployment

(B) The data must be right-skewed since thedistribution is truncated at 0 weeks on the left-sideof the distribution. Data that are truncated atone-end tend to have a skew in the direction awayfrom the truncated end.Ok STT.01.02.020


Ch 1, Stem-and-leaf plot

Q 10.A data set consists of fifty three-digit numbersranging from 180 to 510. The best choice for stemsin a stem-and-leaf display would be to use .

A 1 digit stems (1, 2, . . . , 5)

B 2 digit stems (18, 19, . . . , 51)

C 3 digit stems (180, 181, . . . , 510)

Ok STT.01.01.010


AnswerCh 1, Stem-and-leaf plot,

(A) 1 digit stems (1, 2, . . . , 5)Ok STT.01.01.010


Ch 1, BoxplotOk10, STT.01.02.070

Below are boxplots for two data sets.

1 20

2

4

6

8

TRUE or FALSE: There is a greater proportion of valuesoutside the box for the set on the right than for the set on theleft.

A True, and I am very confident.

B True, and I am not very confident.

C False, and I am not very confident.

D False, and I am very confident.


AnswerCh 1, Boxplot, , Ok10, STT.01.02.070

Answer: (False).These are boxplots, so the box represents themiddle 50% of data in both cases, meaning thatwhat’s outside of the box is also 50% in bothcases. (The only exception is if the data set has alot of repeated values right at the first or thirdquartile. These values would be “in” the box andcould increase the proportion of data in the boxbeyond the standard 50%).


Ch 1, Mean vs medianHistogram Ok06, STT.01.02.030

For the data set displayed in the followinghistogram, which would be larger?

A meanB medianC Can’t tell from the given histogram.


AnswerCh 1, Mean vs median, Histogram, Ok06, STT.01.02.030

Answer: (A).(A) Mean is larger because of the right-skew.


Ch 2, Inference for a population meanVLBW Ok43, STT.03.02.010

Researchers believe that one possible cause ofVery Low Birth Weight (VLBW) infants is thepresence of undiagnosed infections in the mother.To assess this possibility, they collected data on allpregnant women presenting themselves forprenatal care at large urban hospitals. What is theappropriate population for this study?

A All infants.B All infants born as VLBW infant.C All infants born in large urban centers.D All pregnant women.E All pregnant women living in large urban

centers.Erik B. Erhardt, UNM Stat 427/527, ADA1, Ch 00, Clicker Questions 27/1

AnswerCh 2, Inference for a population mean, VLBW, Ok43, STT.03.02.010

Answer: (E).(A), (B), (C) Infants are not the unit of analysis.The researchers believe that VLBW infants resultfrom undiagnosed infections in the mother, thuspregnant women are the unit of analysis.(D) This approach is not the most conservativebecause where the pregnant women live may havean impact on VLBW infants.(E)* correct – This approach is the mostconservative. Pregnant women at large urbancenters is the target population.


Ch 2, Inference for a population meanfundamental concept Ok84, STT.06.01.010

The fundamental concept underlying statisticalinference is that

A through the use of sample data we are able todraw conclusions about a sample from whichthe data were drawn.

B through the examination of sample data wecan derive appropriate conclusions about apopulation from which the data were drawn.

C when generalizing results to a sample wemust make sure that the correct statisticalprocedure has been applied.

D Two of the above are true.E All of the above are true.


AnswerCh 2, Inference for a population mean, fundamental concept, Ok84, STT.06.01.010

Answer: (B).(A) With statistical inference, we use samples todraw conclusions about the population, not thesample.(B)* correct — This statement is the definition ofstatistical inference.(C) We do not generalize results to a sample but apopulation. Furthermore, using the correctprocedure (to generalize to a population) is not thefundamental concept of inferential statistics.(D), (E) Only (B) is correct.


Ch 2, CI for µdefinition Ok85, STT.06.01.020

A 95% confidence interval is an interval calculated from

A sample data that will capture the true populationparameter for at least 95% of all samples randomlydrawn from the same population.

B population data that will capture the true populationparameter for at least 95% of all samples randomlydrawn from the same population.

C sample data that will capture the true sample statisticfor at least 95% of all samples randomly drawn fromthe same population.

D population data that will capture the true samplestatistic for at least 95% of all samples randomly drawnfrom the same population.


AnswerCh 2, CI for µ, definition, Ok85, STT.06.01.020

Answer: (A).Note: One point of this question is that inferentialstatistics is about estimating populationparameters from sample data.(A)* correct — This statement refers to the ideasbehind sampling and the Central Limit Theorem.(B) A calculation from population data wouldcapture the true population parameter with 100%confidence.(C) Sample statistics have a sampling distributionso there is no one true sample statistic.(D) See the explanations for (B) and (C).


Ch 2, CI for µdefinition Ok86, STT.06.01.050

A 95% confidence interval has been constructed around asample mean of 28. The interval is (21, 35). Which of thefollowing statement(s) is true?

A The margin of error in the interval is 7.

B 95 out of 100 confidence intervals constructed aroundsample means will contain the true population mean.

C The interval (21,35) contains the true population mean.

D Both (a) and (b) are true.

E (a), (b), and (c) are true.


AnswerCh 2, CI for µ, definition, Ok86, STT.06.01.050

Answer: (A).(B) The probability is 0.95, but no guarentee that95 of 100 will contain µ.


Ch 2, CI, quickieOk89, STT.06.01.080

A 95% confidence intervals for birthweights isfound to be (6.85, 7.61). Is it correct to say that95% of all birth weights will be between 6.85 and7.61 pounds?

A YesB No


AnswerCh 2, CI, quickie, , Ok89, STT.06.01.080

Answer: (B)No. This confidence interval gives us a sense ofwhere the population mean lies, not whichindividual observations are likely to occur (that’scalled a prediction interval).


Ch 2, One-sided tests on µFoster care Ok95, STT.06.02.030

Child and Protective Services, a branch of theDepartment of Health and Human Services isinvestigating the monthly average number ofchildren in foster care over the last several years.They are interested in seeing if the average isdropping from 235 children per month in 2001.The null hypothesis for this problem would be:

A H0 : µ < 235

B H0 = 235

C H0 : p = 235

D H0 : µ = 235

E None of the aboveErik B. Erhardt, UNM Stat 427/527, ADA1, Ch 00, Clicker Questions 37/1

AnswerCh 2, One-sided tests on µ, Foster care, Ok95, STT.06.02.030

Answer: (D).Null hypothesis, not alternative.


Ch 2, P-valueOk100, STT.06.03.020

A P -value represents

A the probability, given the null hypothesis is true, that theresults could have been obtained purely on the basis ofchance alone.

B the probability, given the alternative hypothesis is true,that the results could have been obtained purely on thebasis of chance alone.

C the probability that the results could have beenobtained purely on the basis of chance alone.

D Two of the above are proper representations of aP -value.

E None of the above is a proper representation of aP -value.


AnswerCh 2, P-value, , Ok100, STT.06.03.020

Answer: (a).(A)* correct — This answer gives the definition ofp-value.(B) The definition of p-value is not conditional onthe alternative hypothesis because the probabilitythat the alternative hypothesis is difficult todetermine (The Bayesian Problem).(C) A hypothesis test begins with the assumptionthat the null hypothesis is true (a conditionalprobability, not an unconditional probability).(D) Only A is correct.(E) A is correct.


Ch 3, Independent or paired 1Ok127, STT.08.02.030

Two catalysts are being analyzed to determine howthey affect the mean yield of a chemical process.Catalyst 1 is used in the process eight times andthe yield in percent is measured each time. Thencatalyst 2 is used in the process eight times andthe yield is measured each time. What kind oft-test should be used to compare these data?

A Independent t-testB Paired t-test


AnswerCh 3, Independent or paired 1, , Ok127, STT.08.02.030

Answer: (a). In this case, catalyst 1 is applied to adifferent set of processes than catalyst 2, thusthere is no way to match data from the first set withdata from the second set.


Ch 3, Independent or paired 2Ok128, STT.08.02.040

Six river locations are selected and the zincconcentration is determined for both surface waterand bottom water at each location. What kind oft-test should be used to compare these data?

A Independent t-testB Paired t-test


AnswerCh 3, Independent or paired 2, , Ok128, STT.08.02.040

Answer: (b). In this case, each pair of data hassomething in common–they are taken from thesame river.


Ch 3, t-intervalOk125, STT.08.02.010, variation

A two-sample t-interval interval was constructed for thedifference in the two population means, µ1 − µ2. Theresulting 99% confidence interval was (−0.004, 0.12). Aconclusion that could be drawn is:

A There is no significant difference between µ1 and µ2.

B There is a significant difference between µ1 and µ2.

C The range of possible differences between the twomeans could be from a difference of 0.004 with µ2

being larger up to a difference of 0.12 with µ1 beinglarger.

D Both (a) and (c) are correct.

E Both (b) and (c) are correct.


AnswerCh 3, t-interval, , Ok125, STT.08.02.010, variation

Answer: (a).Answer (c) is almost correct, but we could only saythat with 99% confidence.


Ch 3, Reporting resultsOk99, STT.06.03.010

Robert is asked to conduct a clinical trial on thecomparative efficacy of Aleve versus Tylenol forrelieving the pain associated with muscle strains.He creates a carefully controlled study and collectsthe relevant data. To be most informative in hispresentation of the results, Robert should report

A whether a statistically significant differencewas found between the two drug effects.

B a P -value for the test of no drug effect.C the mean difference and the variability

associated with each drug’s effect.D a confidence interval constructed around the

observed difference between the two drugs.Erik B. Erhardt, UNM Stat 427/527, ADA1, Ch 00, Clicker Questions 47/1

AnswerCh 3, Reporting results, , Ok99, STT.06.03.010

Answer: (d).(A) Reporting only a statistically significant difference is theleast informative.(B) Reporting a p-value is more informative than reportingonly a statistically significant difference (answer (A)) andmore informative than reporting the mean difference andvariability (answer (C)), but not as informative as reporting aconfidence interval (answer (D)).(C) Reporting the mean difference and the variability givesno indication of statistical significance.(D)* correct — A confidence interval simultaneously providesinformation about the mean differences, variability, direction,a sense of minimum and maximum effect, as well as aconservative and unconservative estimate.


Ch 5, ANOVAFat 1/2

What are the correct hypotheses for testing for adifference between the mean doughnut fatabsorption amounts among the four types?

A H0 : µ1 = µ2 = µ3 = µ4 vsHA : µ1 6= µ2 6= µ3 6= µ4.

B H0 : µ1 6= µ2 6= µ3 6= µ4 vsHA : µ1 = µ2 = µ3 = µ4.

C H0 : µ1 = µ2 = µ3 = µ4 vsHA : At least one pair of means is di�erent.

D H0 : µ1 = µ2 = µ3 = µ4 = 0 vsHA : µ1 6= µ2 6= µ3 6= µ4.

E H0 : µ1 = µ2 = µ3 = µ4 vsHA : µ1 > µ2 > µ3 > µ4.


AnswerCh 5, ANOVA, Fat 1/2,

Answer: (c).


Ch 5, ANOVAFat 2/2

What is the conclusion of the hypothesis test? Thedata provide convincing evidence that the mean fatabsorption amounts

A are different for all types.B is lower for fat4 than the other fats.C are different for at least two of the types.D are the same for all types.


AnswerCh 5, ANOVA, Fat 2/2,

Answer: (c).


Ch 5, ANOVABonferroni

In the doughnut data set fat has 4 types: 1, 2, 3,and 4. If α = 0.05, what should be the modifiedBonferroni significance level for two sample t-testsfor determining which pairs of groups havesignificantly different means?

A α∗ = 0.05

B α∗ = 0.05/2 = 0.0250

C α∗ = 0.05/3 = 0.0167

D α∗ = 0.05/4 = 0.0125

E α∗ = 0.05/6 = 0.0083


AnswerCh 5, ANOVA, Bonferroni,

Answer: (e). There are 6 comparisons:(1,2), (1,3), (1,4), (2,3), (2,4), and (3,4).


Ch 5, ANOVAMultiple Comparisons

Goal: Create a summary for the multiplecomparisons of 4 groups.Story: Percent of a Standard 50-word list heardcorrectly in the presence of background noise. 24subjects with normal hearing listened to standardaudiology tapes of English words at low volumewith a noisy background. They repeated the wordsand were scored correct or incorrect in theirperception of the words. The order of listpresentation was randomized.(5 slides). . .



20

30

40

List1 List2 List3 List4Code for each list played

Sco

re r

ecei

ved

on h

earin

g te

st

ListID

List1

List2

List3

List4

Hearing

(Consider the order of the means). . .



ANOVA results:

## Df Sum Sq Mean Sq F value Pr(>F)

## ListID 3 920 306.8 4.92 0.0033 **

## Residuals 92 5738 62.4

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

. . .



Bonferroni multiple comparisons:

##

## Pairwise comparisons using t tests with pooled SD

##

## data: hearing$Hearing and hearing$ListID

##

## List1 List2 List3

## List2 1.0000 - -

## List3 0.0085 0.3347 -

## List4 0.0135 0.4594 1.0000

##

## P value adjustment method: bonferroni

. . .



A) List: 3 4 2 1

B) List: 1 2 4 3

C) List: 1 2 3 4

D) List: 3 4 2 1

E) None of A–D


AnswerCh 5, ANOVA, Multiple Comparisons,

Answer: (d). Only List 1 is different from 3 and 4.


Ch 7, CI for proportionsOk114, STT.08.01.010

To estimate the proportion of students at a university whowatch reality TV shows, a random sample of 50 students wasselected and resulted in a sample proportion of .3. A 95%confidence interval for the proportion that watches reality TVwould be ______ a 90% confidence interval.

A narrower than

B the same width as

C wider than


AnswerCh 7, CI for proportions, , Ok114, STT.08.01.010

Answer: (C).


Ch 7, Test statisticOk112, STT.07.01.057

In a random sample of 2013 adults, 1283 indicated that theybelieve that rudeness is a more serious problem than in pastyears. Which of the test statistics shown below would beappropriate to determine if there is sufficient evidence toconclude that more than three-quarters of U.S. adults believethat rudeness is a worsening problem?

Ap̂− .5√

(.5)(1− .5)/2013

Bp̂− .75√

(.75)(1− .75)/2013

Cx̄− .75√s/2013


AnswerCh 7, Test statistic, , Ok112, STT.07.01.057

Answer: (b).


Ch 7, Parachute null hypothesisOk117, STT.08.01.040

A parachute manufacturer is concerned that thefailure rate of 0.1% advertised by his companymay in fact be higher. What is the null hypothesisfor the test he would run to address his worries.

A H0 : µ = 0.001

B H0 : p > 0.001

C H0 : µ < 0.001

D H0 : p = 0.001


AnswerCh 7, Parachute null hypothesis, , Ok117, STT.08.01.040

Answer: (d).


Ch 7, Parachute conclusionOk117, STT.08.01.050

A parachute manufacturer is concerned that thefailure rate of 0.1% advertised by his companymay in fact be higher. A hypothesis test was runand the result was a P -value of 0.03333. The mostlikely conclusion the manufacturer might make is:

A My parachutes are safer than I claim.B My parachutes are not as safe as I claim them

to be.C I can make no assumption of safety based on

a statistical test.D The probability of a parachute failure is

0.03333.E Both (b) and (d) are true.


AnswerCh 7, Parachute conclusion, , Ok117, STT.08.01.050

Answer: (b).


Ch 7, Parachute p-valueOk119, STT.08.01.060

To explain the meaning of a P -value of 0.033, youcould say:

A There is approximately a 96.7% chance ofobtaining my sample results.

B Assuming the null hypothesis is accurate,results like those found in my sample shouldoccur only 3.3% of the time.

C We can’t say anything for sure withoutknowing the sample results.

D There is approximately a 3.3% chance ofobtaining my sample results.


AnswerCh 7, Parachute p-value, , Ok119, STT.08.01.060

Answer: (b).


Ch 7, Excess successesOk125, STT.05.01.030

In 1938, Duke University researchers Pratt and Woodruffconducted an experiment looking for evidence of ESP(extrasensory perception). In the experiment, students werepresented with five standard ESP symbols (square, wavylines, circle, star, cross). The experimenter shuffled a desk ofESP cards, each of which had one of the five symbols on it.The experimenter drew a card from this deck, looked at it,and concentrated on the symbol on the card. The studentwould then guess the symbol, perhaps by reading theexperimenter’s mind. This experiment was repeated with 32students for a total of 60,000 trials. The students werecorrect 12,489 times.If the students were selecting one of the five symbols asrandom, the probability of success would be p = 0.2 and wewould expect the students to be correct 12,000 times out of60,000. Should we write off the observed excess of 489 asnothing more than random variation?

A Yes

B NoErik B. Erhardt, UNM Stat 427/527, ADA1, Ch 00, Clicker Questions 71/1

AnswerCh 7, Excess successes, , Ok125, STT.05.01.030

Answer: (b). The Central Limit Theorem gives usthat if X ∼ Bin(n, p), then X is approximatelynormal with the same mean and standarddeviation. This fact can be used to computeP (X ≥ 12489), which turns out to be a very smallnumber.binom.test(x = 12489, n = 60000, p = 0.2, alternative = "two.sided")

#### Exact binomial test#### data: 12489 and 60000## number of successes = 12489, number of trials = 60000, p-value## = 6.85e-07## alternative hypothesis: true probability of success is not equal to 0.2## 95 percent confidence interval:## 0.2049 0.2114## sample estimates:## probability of success## 0.2082


Ch 7, Comparing two proportionsOk125, STT.08.02.010

A two proportion z interval was constructed for the differencein the two population proportions, p1 and p2. The resulting99% confidence interval was (−0.004, 0.12). A conclusionthat could be drawn is:

A There is no significant difference between p1 and p2.

B There is a significant difference between p1 and p2.

C The range of possible differences between the twoproportions could be from a 0.4% difference with p2being larger up to a 12% difference with p1 being larger.

D Both (a) and (c) are correct.

E Both (b) and (c) are correct.


AnswerCh 7, Comparing two proportions, , Ok125, STT.08.02.010

Answer: (a). Answer (c) is almost correct, but wecould only say that with 99% confidence.


Ch 8, Correlation coefficientsOk STT.02.02.010

The scatterplots below display three bivariate datasets. The correlation coefficients for these datasets are 0.03, 0.68, and 0.89. Which scatter plotcorresponds to the data set with r = 0.03?

A Plot 1B Plot 2C Plot 3


AnswerCh 8, Correlation coefficients, , Ok STT.02.02.010

Answer: (b).


Ch 8, Strong correlationOk STT.02.02.020

Joe Bob found a strong correlation in an empiricalstudy showing that individuals’ physical abilitydecreased significantly with age. Which numericalresult below best describes this situation?

A −1.2

B −1.0

C −0.8

D +0.8

E +1.0

F +1.2


AnswerCh 8, Strong correlation, , Ok STT.02.02.020

Answer: (c).(A), (F) The range of the correlation coefficient is |r| < 1.(B) This is a perfect negative correlation, which is unlikely tohappen with empirical data.(C)* correct — The problem statement assumes increasingage, so the best answer is a strong, negative correlation.(D) Although this correlation is strong, it is also positive,whereas the problem statement implies that the correlationshould be negative.(E) This correlation is both perfect (unlikely with empiricaldata) and positive (whereas the problem statement impliesthat the correlation should be negative).


Ch 8, Coney IslandOk STT.02.02.050

A researcher found that r = +.92 between thehigh temperature of the day and the number of icecream cones sold at Coney Island. This result tellsus that

A high temperatures cause people to buy icecream.

B buying ice cream causes the temperature togo up.

C some extraneous variable causes both hightemperatures and high ice cream sales.

D temperature and ice cream sales have astrong positive linear relationship.


AnswerCh 8, Coney Island, , Ok STT.02.02.050

Answer: (d).(A) This claim may be true, but correlation tells usonly about the strength and direction of arelationship, not about the cause-effect aspect ofthe relationship.(B) Correlation does not imply causation, in eitherdirection.(C) Correlation does not imply the existence of alurking variable.(D)* correct — A correlation of r = +.92 implies astrong, positive, linear relationship.


Ch 8, EquationOk STT.02.03.010

A store manager conducted an experiment in which hesystematically varied the width of a display for toothpastefrom 3 ft. to 6 ft. and recorded the corresponding number oftubes of toothpaste sold per day. The data was used to fit aregression line, which was

tubes sold per day = 20 + 10(display width)

What is the predicted number of tubes sold per day for adisplay width of 12 feet?

A 120

B 140

C It would be unwise to use the regression line to make aprediction for a display width of 12 ft.


AnswerCh 8, Equation, , Ok STT.02.03.010

Answer: (c).


Ch 8, Salariescoefficient of determination Ok STT.02.02.060

The salary and the numbers of years of teachingexperience were recorded for 20 social studiesteachers in rural west Texas. When the data pointswere plotted, there was a roughly linearrelationship and a positive correlation betweensalary and number of years of teachingexperience, with r = 0.8. What percentage of thevariation in the salaries is explained by the linearrelationship between salary and years of service?

A 80%B 64%C 36%D 20%


AnswerCh 8, Salaries, coefficient of determination, Ok STT.02.02.060

Answer: (b).


Ch 8, OutliersOk STT.02.04.040

Why is it important to look for outliers in data priorto applying regression?

A Outliers always affect the magnitude of theregression slope.

B Outliers are always bad data.C Outliers should always be eliminated from the

data set.D Outliers should always be considered

because of their potential influence.E We shouldn’t look for outliers, because all the

data must be analyzed.


AnswerCh 8, Outliers, , Ok STT.02.04.040

Answer: (d).(A) Outliers don’t always affect the regressionslope.(B), (C) Outliers may be the data of most interestand are certainly not always bad data.(D)* correct — Outliers should always beconsidered but are not always influential.(E) Even if one analyzes all the data, one shouldbe aware of outliers because of their impact.


Advanced Data Analysis 1 Stat 427/527 - Clicker Questions€¦ · Advanced Data Analysis 1 Stat...

Documents

Transcript of Advanced Data Analysis 1 Stat 427/527 - Clicker Questions€¦ · Advanced Data Analysis 1 Stat...