Exam II Answers Fall 2018 -...

14
1 Exam II Answers Fall 2018 1. (10pts) In a clinical trial of a new drug designed to alleviate some of the tremors associated with Parkinson's disease, patients were randomly assigned to one of three treatments, (1) current medication, (2) placebo, (3) new medication, and the number of tremors over a 3 hour observation period was estimated after one month on the new treatment regime. Use the data in the file drugtrial.csv to determine if there are differences in the numbers of tremors among treatments? Answer This study is a single factor ANOVA design in which we compare the mean number of tremors experienced by patients receiving one of three different therapies. As man ANOVA which assumes that the observations arise from normally distributed populations, we first need to check for normality. Given that our data are slightly skewed, I used a log transformation, and the data appear approximately normal on the log scale. Therefore I performed my analysis on the log scale data, and as well performing a Kruskal Wallis non- parametric analysis on the untransformed data. The results were similar indicating that there were difference in response among the treatments, so the Null hypothesis that the treatments did not differ in their effects on tremors is rejected. I followed up my analysis with a graph of the treatment means and their standard errors, which indicates that the new drug resulted is the smallest number of tremors. # Problem 1 pp=read.csv("G:/Biometry/exams/Exam2/drugtrial.csv",header=TRUE) head(pp) ## X tr tremors ## 1 1 1 11 ## 2 2 1 12 ## 3 3 1 15 ## 4 4 1 13 #Check normality library(QuantPsyc) ## Attaching package: 'QuantPsyc' norm(pp$tremors) ## Statistic SE t-val p ## Skewness 0.7828540 0.4472136 1.750515 0.04001476 ## Kurtosis 0.9006515 0.8944272 1.006959 0.15697725

Transcript of Exam II Answers Fall 2018 -...

  • 1

    Exam II Answers Fall 2018

    1. (10pts) In a clinical trial of a new drug designed to alleviate some of the tremors associated with Parkinson's disease, patients were randomly assigned to one of three treatments, (1) current medication, (2) placebo, (3) new medication, and the number of tremors over a 3 hour observation period was estimated after one month on the new treatment regime. Use the data in the file drugtrial.csv to determine if there are differences in the numbers of tremors among treatments?

    Answer

    This study is a single factor ANOVA design in which we compare the mean number of tremors experienced by patients receiving one of three different therapies. As man ANOVA which assumes that the observations arise from normally distributed populations, we first need to check for normality. Given that our data are slightly skewed, I used a log transformation, and the data appear approximately normal on the log scale. Therefore I performed my analysis on the log scale data, and as well performing a Kruskal Wallis non-parametric analysis on the untransformed data. The results were similar indicating that there were difference in response among the treatments, so the Null hypothesis that the treatments did not differ in their effects on tremors is rejected. I followed up my analysis with a graph of the treatment means and their standard errors, which indicates that the new drug resulted is the smallest number of tremors.

    # Problem 1 pp=read.csv("G:/Biometry/exams/Exam2/drugtrial.csv",header=TRUE) head(pp)

    ## X tr tremors ## 1 1 1 11 ## 2 2 1 12 ## 3 3 1 15 ## 4 4 1 13

    #Check normality library(QuantPsyc)

    ## Attaching package: 'QuantPsyc'

    norm(pp$tremors)

    ## Statistic SE t-val p ## Skewness 0.7828540 0.4472136 1.750515 0.04001476 ## Kurtosis 0.9006515 0.8944272 1.006959 0.15697725

  • 2

    logtrem=log(pp$tremors) norm(logtrem)

    ## Statistic SE t-val p ## Skewness -0.1173669 0.4472136 -0.2624405 0.3964909 ## Kurtosis -0.2396873 0.8944272 -0.2679785 0.3943579

    m2=aov(logtrem~factor(pp$tr)) summary(m2)

    ## Df Sum Sq Mean Sq F value Pr(>F) ## factor(pp$tr) 2 1.722 0.8612 14.96 4.25e-05 *** ## Residuals 27 1.555 0.0576 ## --- kruskal.test(pp$tremors~pp$tr)

    ## Kruskal-Wallis rank sum test ## ## data: pp$tremors by pp$tr ## Kruskal-Wallis chi-squared = 17.358, df = 2, p-value = 0.0001701

    # plot graph library(ggplot2) mlogtrem=aggregate(logtrem, by=list(factor(pp$tr)),FUN=mean) sdlogtrem=aggregate(logtrem, by=list(factor(pp$tr)),FUN=sd) trr=c("current", "placebo","new") mtrem=mlogtrem[,2] setrem=(sdlogtrem[,2])/sqrt(10) data=data.frame(trr,mtrem,setrem) ggplot(data) + geom_bar( aes(x=trr, y=mtrem), stat="identity", fill="skyblue", alpha=0.5) + geom_linerange( aes(x=trr, ymin=mtrem-setrem, ymax=mtrem+setrem), colour="orange", alpha=0.9, size=1.3)

  • 3

    2. (20pts) The ability of a beetle to infest stored products Lasioderme serricorne (the cigarette beetle) was examined using 5 stored products that were selected at random from a long list of possible products, and three storage temperatures (60o, 70o, and 80o F) were selected to represent the range of temperature commonly used in storing these products. The products were flour, paprika, chili powder, cayenne pepper, and millet. For each combination of stored product and temperature, 5 replicate containers with 5 ounces of the product were seeded with 2 male-female pairs of beetles. After storage for 30 days, the number of beetle in each container were counted. Use the data in the file beetles.csv to test all relevant hypotheses about effects of product type and temperature on the success of the beetle using those products.

    Answer

    This experiment is a 2 factor ANOVA. All factors are between subjects, but the product factor is a random factor. This will necessitate modification of the F test for the temperature effect. I first check for normality of the data and concluded it was approximately normal. I then used aov to perform the ANOVA. From the ANOVA table, I extracted the MS(temperature) and the MS(interaction) to compute the modified F value. Based on this analysis I conclude that there are significant main effects of temperature and product type on beetle abundance, but no interactive effects. The boxplot shows that beetle numbers were much higher on flour and that numbers were higher at higher temperatures.

    # Problem 2 beet=read.csv("G:/Biometry/exams/Exam2/beetles.csv", header=TRUE) head(beet)

    ## X temperature product number ## 1 1 60 1 26 ## 2 2 60 1 29 ## 3 3 60 1 19 ## 4 4 60 1 31 ## 5 5 60 1 33 ## 6 6 60 2 20

    # check normality norm(beet$number)

    ## Statistic SE t-val p ## Skewness 0.4088240 0.2828427 1.445411 0.07417123 ## Kurtosis -0.7598773 0.5656854 -1.343286 0.08958970

    beet.mod1=aov(number~factor(temperature)*factor(product), data=beet) summary(beet.mod1)

    ## Df Sum Sq Mean Sq F value Pr(>F) ## factor(temperature) 2 1467 733.4 35.844 5.73e-11 *** ## factor(product) 4 3308 826.9 40.415 < 2e-16 *** ## factor(temperature):factor(product) 8 139 17.4 0.848 0.565

  • 4

    ## Residuals 60 1228 20.5 ## --- # since product is a random factor temperature must be tested over the interaction temF=733.4/17.4 temF

    ## [1] 42.14943

    1-pf(temF,2,8)

    ## [1] 5.64384e-05

    # plot graph

    # first reshape the data to create a data frame with one observation per line ftemp=factor(beet$temperature) num=beet$number prod1=c(rep("flour",5),rep("paprika",5),rep("chili powder",5),rep("cayenne pepper",5),rep("millet",5)) prodt=c(rep(prod1,3)) data2=data.frame(prodt,ftemp,num) p=ggplot(data2,aes(x= prodt,y= num,fill=ftemp)) + geom_boxplot(position=position_dodge(1)) p + xlab("product") +labs(fill="temperature")

  • 5

    3. (20pts) To determine if effects of competing plants and shading on the growth a Peltandra virginica, an ecologist established 20 plots, 10 of which were shaded and 10 were not. On each of these plots, half of the plot was randomly assigned to be weeded leaving only Peltandra virginica in the weeded portion of the plot. In the unweeded portion P. virginica and competitors were left undisturbed. At the end of the growing season the average biomass of P. virginica in each weeded or unweeded plot was determined. Use the data in file peltandra.csv to determine the individual and interactive effects of competitors and shading on P. virginica. 20

    Answer

    This experiment is a 2 factor ANOVA with the sunshade factor being between subject, and the weed noweed factor being within subject. I first checked normality by combining all the observations. The data appeared to be approximately normal. I then created a matrix of the within subject observations called "weed" and a factor giving the within-subjects structure of the data called "weeding." I fit the linear model (pelt.mod1), which was then passed to the car package function Anova for univariate analysis. The results (highlighted below) lead me to reject the null hypothesis and conclude that there is an effect of weeding, but effect of shading and no weed by shading interaction. The boxplot seems to support these conclusions since there are large difference between weed and noweed, but the plots are broadly overlapping for the sun-shade treatment.

    # Problem 3 pelt=read.csv("G:/Biometry/exams/Exam2/peltandra.csv", header=TRUE) head(pelt)

    ## X sunshade weed noweed ## 1 1 sun 38.73525 36.79654 ## 2 2 sun 44.78438 39.24548 ## 3 3 sun 43.23150 35.52886 ## 4 4 sun 50.54296 40.30452 ## 5 5 sun 38.43025 36.53835 ## 6 6 sun 38.31575 30.96264

    # check normality by combining observations weedcomb=c(pelt$weed,pelt$noweed) norm(weedcomb)

    ## Statistic SE t-val p ## Skewness -0.1286118 0.3872983 -0.3320741 0.3699166 ## Kurtosis -0.6224907 0.7745967 -0.8036321 0.2108048

    # since weeding is a within subjects factor weed=cbind(pelt$weed,pelt$noweed)

  • 6

    weeding=factor(c("weed","noweed")) pelt.mod1=lm(weed~pelt$sunshade) library(car)

    ## Attaching package: 'car'

    pelt.mod2=Anova(pelt.mod1,idata=data.frame(weeding),idesign=~weeding,type="II") summary(pelt.mod2, multivariate=FALSE, univariate=TRUE)

    ## ## Univariate Type II Repeated-Measures ANOVA Assuming Sphericity ## ## SS num Df Error SS den Df F Pr(>F) ## (Intercept) 58078 1 1967.01 18 531.4668 8.180e-15 *** ## pelt$sunshade 390 1 1967.01 18 3.5720 0.07498 . ## weeding 204 1 97.07 18 37.9106 8.182e-06 *** ## pelt$sunshade:weeding 3 1 97.07 18 0.4714 0.50110 ## --- # plot graph ss=rep(pelt$sunshade,2) wnw=c(rep("weed",20),rep("noweed",20)) data3=data.frame(ss,wnw,weedcomb) p2=ggplot(data3,aes(x= ss,y=weedcomb,fill=wnw)) + geom_boxplot() p2

  • 7

    4. (10 pts) Name that design! Gn = group of subjects A, B, C, etc. are factors. an, bn, cn, etc are levels of factors. Describe each factor - how many levels and the type of factor.

    a. A

    a1 a2

    B b1 b2 b3 b4 b5 b6

    G1 G1 G1 G2 G2 G2

    b. A

    a1 a2

    B b1 b2 b3 b1 b2 b3

    c1 G1 G1 G1 G5 G5 G5

    c2 G2 G2 G2 G6 G6 G6

    C c3 G3 G3 G3 G7 G7 G7

    c4 G4 G4 G4 G8 G8 G8

    C. A

    a1 a2

    B b1 b2 b3 b4 b5 b6

    G1 G2 G3 G4 G5 G6

    Answer

    a. This is a 2 factor ANOVA. Factor A has 2 levels and is between subject, and factor B has 6 levels 3 each nested within levels of A, and B is within subject. This design as outlined is impossible, and everyone will get credit for it.

    b. This is a 3 factor ANOVA. Factor A has 2 levels and is between subject, Factor C has 4 levels and is between subject. Factor B has 3 levels and is within subject.

    c. This is a two factor ANOVA. Factor A has 2 levels and is between subject. Factor B has 6(3) levels and the levels of B are nested within the levels of A.

  • 8

    5. (20pts) The leaf-mining moth, Cameraria hamadryadella, is found at higher densities low in the crown of white oak trees (Quercus alba). To test the hypothesis that this pattern is due to differences in foliage quality between low and high foliage, potted trees which had been treated identically prior to the experiment, so should have similar foliage quality, were randomly assigned to 3 different heights (0, 1.25, and 3.25 m above the ground), and intervening foliage was placed under half the trees in each height treatment. Intervening foliage was non-oak foliage and used to simulate the interference that moths emerging from the leaf litter might encounter when seeking oak foliage. Because of the limited availability of potted trees, the sample sizes differed among treatment combinations. Use the cameraria.csv which contains data on the density of leaf mines on each experimental tree (number/leaf) to test all relevant hypotheses concerning the effects of height and intervening foliage on leaf miner density.

    Answer

    This experiment is a 2 Factor ANOVA with both factors between subject. However, there are unequal sample sizes so the analysis will require the car package Anova function to obtain Type II sums of squares. I first checked for normality and decided that the data was approximately normal on the log scale. Therefore, I conducted the analysis with the log transformed data. I first fit the model using the lm function, and then passed the resulting model to the Anova function to obtain the ANOVA table with Type II sums of squares. The highlighted results indicate that there is a significant effect of height, but no effect of intervening foliage nor an interaction of height and intervening foliage. The graph supports these conclusions. I had to reorder the factor levels because ggplot would plot the levels alphabetically rather than in the sequence low-mid-high.

    # Problem 5 cam=read.csv("G:/Biometry/exams/Exam2/cameraria.csv", header=TRUE) head(cam)

    ## X internointer height density ## 1 1 inter low 3.969302 ## 2 2 inter low 5.369349 ## 3 3 inter low 6.120054 ## 4 4 inter low 3.190005 ## 5 5 inter low 6.453231 ## 6 6 inter mid 2.813481

    #check normality norm(cam$density)

    ## Statistic SE t-val p ## Skewness 0.7329206 0.4264014 1.7188511 0.04282074 ## Kurtosis -0.8473702 0.8528029 -0.9936296 0.16020161

    lgden=log(cam$density) norm(lgden)

  • 9

    ## Statistic SE t-val p ## Skewness 0.01295367 0.4264014 0.03037904 0.4878824 ## Kurtosis -1.27796186 0.8528029 -1.49854311 0.0669961

    cam.mod1=lm(lgden~internointer*height, data=cam) Anova(cam.mod1,type="II")

    ## Anova Table (Type II tests) ## ## Response: lgden ## Sum Sq Df F value Pr(>F) ## internointer 0.1523 1 1.4506 0.2389 ## height 13.2868 2 63.2793 6.437e-11 *** ## internointer:height 0.4560 2 2.1720 0.1335 ## Residuals 2.8346 27 ## --- # to reorder factor levels for graph cam$height=factor(cam$height,levels(cam$height)[c(2,3,1)]) # plot graph p2=ggplot(cam,aes(x=height,y=density,fill=internointer)) + geom_boxplot() p2

    6. (20pts) Using the data from problem 1, test the a priori hypotheses that the average number of tremors for the drug treatments differs from the placebo, and that the new medication differs from the current medication. Adjust you tests for experiment-wise error at the 5% level.

    Answer

  • 10

    This problem asks specifically that you test 2 a priori contrasts. Therefore, I set up the contrasts weights to test whether the drugs differ from the placebo, and if the new drug differs from the old. I then adjusted the per contrast error rate so that the experimentwise error was 0.05 (alphac = 1 - ((1-0.05)^(1/2))=0.0253. The results highlighted below indicate that for the first contrast the effect of the drugs on average differs from the placebo. However, the second contrast suggest that the effect of the old and new drug do not differ. Note that the first F is 27.11842 given the exponential notation in R.

    # Problem 6

    library(knitr) read_chunk('G:/Biometry/Biometry-Fall-2018/conestim.R')

    # set up contrasts weights c1=c(-1,2,-1) c2=c(1,0,-1) # run conest function on log transformed number of tremors conest(pp$tr,logtrem,c1)

    ## [1] "F-value" "df-num" "df-denom" "sig level"

    ## [1] 2.711842e+01 1.000000e+00 2.700000e+01 1.744437e-05

    conest(pp$tr,logtrem,c2)

    ## [1] "F-value" "df-num" "df-denom" "sig level"

    ## [1] 2.7916354 1.0000000 27.0000000 0.1063108

    Extra Credit

    7. (5 pts) What is a fixed and what is a random factor? How do you distinguish between

    them?

    Answer

    A fixed factor is one that the experimenter has control over the exact magnitude of the

    experimental treatments, while the magnitude of a random factor is a random variable. One can

    distinguish between a fixed and random factor since a random factor will represent a random

    sample of only a few levels from what may be a large or infinite number of possible levels,

    where a fixed factor will exhaust the possible levels or at least represent the relevant range of

    variation in the magnitude of the factor levels.

  • 11

    8. (20pts) In an experiment to determine the effects of conventional and reduced tillage

    agriculture on crop yield for oats, 3 varieties of oats and two levels of fertilization (0.5 and

    1 kg/ acre) were examined using conventional and reduce tillage techniques. Twenty 20 x

    60m plots were each partitioned into 6 -10 x 20m subplots and subplots assigned at random

    to receive a combination of oat variety and fertilizer treatment. Ten of these plots were

    subjected to conventional tillage practices and 10 were subjected to reduced tillage

    practices. Each subplot was harvested at seasons' end and crop yield is expressed in

    bushels/acre. Examine the data provided in the data file oats.csv. Report hypothesis tests

    for all possible effects.

    Answer

    This experiment is a 3 factor ANOVA with tillage being a between subject factor, and both variety and fertilizer being within subjects factors. I first combined the columns of within subject observations into a matrix (mat). I then fit a linear model to mat with tillage as the between subjects factor (mod1). I created a separate .csv file (form2.csv) that contained the correspondences between the columns of the matrix (mat) and the combinations of the levels of the factors variety and fertilizer. Using the Anova function from the car package, and the dataframe into which I read the form2.csv file (formnew), I specified the within subject structure of the data and the experimental design. I then obtained the univariate test for the effects. The highlighted results (GG epsilon adjusted tests) indicate that all main effects were statistically significant, as was the variety by fertilizer interaction. In the graph, the lines being separated illustrates the tillage main effect, and that fact that they are not level illustrates the variety main effect. That the right of the two plots has higher values indicates the fertilizer main effect. The fertilizer by variety interaction is illustrated by the differences between varieties being greater for the 2nd level (higher level) of fertilizer. To produce the graph, all the data needed to be placed in a dataframe with one observation per line (unlike the within subjects data file that has all observations on a subject on a single line of data. I then used a function from the package "dae" to obtain the plot.

    # Problem 8 qq=read.csv("G:/Biometry/exams/Exam2/oats.csv", header=TRUE) head(qq)

    ## X tr fert1v1 fert1v2 fert1v3 fert2v1 fert2v2 fert2v3 ## 1 1 1 55.59555 62.28367 54.02856 52.30820 66.11916 47.28625 ## 2 2 1 54.83042 66.45280 58.34965 51.20269 70.35817 59.59235 ## 3 3 1 47.17417 53.11721 48.09960 42.20308 56.72854 49.01972 ## 4 4 1 49.92158 59.02375 51.37804 47.13270 63.70916 51.65932 ## 5 5 1 63.28770 69.37552 67.33111 60.75426 76.44359 64.48788 ## 6 6 1 43.26808 51.84271 43.80456 43.46588 58.33783 41.24954

    # create data matrix for within subjects factors mat=cbind(qq$fert1v1,qq$fert1v2,qq$fert1v3,qq$fert2v1,qq$fert2v2,qq$fert2v3) #rename tr to tillage tillage=qq$tr # fit model with tillage as between subject factor mod1=lm(mat~factor(tillage))

  • 12

    formnew=read.csv('G:/Biometry/exams/Exam2/form2.csv',header=TRUE) head(formnew)

    ## Variety Fertilizer ## 1 VAR1 Fert1 ## 2 VAR2 Fert1 ## 3 VAR3 Fert1 ## 4 VAR1 Fert2 ## 5 VAR2 Fert2 ## 6 VAR3 Fert2

    library(car) mm1=Anova(mod1, idata=formnew, idesign=~Variety*Fertilizer, type="II") summary(mm1, multivariate =FALSE, univariate=TRUE)

    ## Warning in summary.Anova.mlm(mm1, multivariate = FALSE, univariate = TRUE):

    ## Univariate Type II Repeated-Measures ANOVA Assuming Sphericity ## ## SS num Df Error SS den Df F ## (Intercept) 407378 1 3552.9 18 2063.9104 ## factor(tillage) 1138 1 3552.9 18 5.7652 ## Variety 2386 2 332.1 36 129.3171 ## factor(tillage):Variety 20 2 332.1 36 1.1071 ## Fertilizer 28 1 41.4 18 11.9726 ## factor(tillage):Fertilizer 2 1 41.4 18 0.7688 ## Variety:Fertilizer 295 2 104.8 36 50.7147 ## factor(tillage):Variety:Fertilizer 1 2 104.8 36 0.0933 ## Pr(>F) ## (Intercept) < 2.2e-16 *** ## factor(tillage) 0.027366 * ## Variety < 2.2e-16 *** ## factor(tillage):Variety 0.341492 ## Fertilizer 0.002793 ** ## factor(tillage):Fertilizer 0.392147 ## Variety:Fertilizer 3.373e-11 *** ## factor(tillage):Variety:Fertilizer 0.911185 ## --- ## ## Mauchly Tests for Sphericity ## ## Test statistic p-value ## Variety 0.99613 0.96759 ## factor(tillage):Variety 0.99613 0.96759 ## Variety:Fertilizer 0.73028 0.06913 ## factor(tillage):Variety:Fertilizer 0.73028 0.06913 ##

  • 13

    ## Greenhouse-Geisser and Huynh-Feldt Corrections ## for Departure from Sphericity ## ## GG eps Pr(>F[GG]) ## Variety 0.99615 < 2.2e-16 *** ## factor(tillage):Variety 0.99615 0.3413 ## Variety:Fertilizer 0.78758 2.884e-09 *** ## factor(tillage):Variety:Fertilizer 0.78758 0.8676 ## --- # put data into flat file for plotting yield=c(qq$fert1v1,qq$fert1v2,qq$fert1v3,qq$fert2v1,qq$fert2v2,qq$fert2v3) till=c(rep("conventional",10),rep("reduced",10)) tillage=c(rep(till,6)) fertilizer=c(rep("fert1",60),rep("fert2",60)) varint=c(rep("var1",20),rep("var2",20),rep("var3",20)) variety=(rep(varint,2)) data8=data.frame(tillage,fertilizer,variety,yield) head(data8)

    ## tillage fertilizer variety yield ## 1 conventional fert1 var1 55.59555 ## 2 conventional fert1 var1 54.83042 ## 3 conventional fert1 var1 47.17417 ## 4 conventional fert1 var1 49.92158 ## 5 conventional fert1 var1 63.28770 ## 6 conventional fert1 var1 43.26808

    # plot graph library(dae)

    ## Warning: package 'dae' was built under R version 3.4.4

    interaction.ABC.plot(yield,variety,tillage,fertilizer,data=data8,title="Yield by fertilizer, variety, and tillage", fun="mean")

  • 14

    9. (5pts) What test can be performed in lieu of the omnibus ANOVA F-test?

    Answer

    In lieu of the omnibus ANOVA F-test, one can use a priori contrasts.