Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint -...
Transcript of Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint -...
Visualization
• Exam data from CSCE 350
• Analysis?
• Mean & SD?
Anscombe’s Quartet
• Start by visualization
• Model selection
• Example: import education.csv
• Just for chuckles: calculate mean
• How to visualize?
• Suggestions?
• Plot individual columns?
• Plot everything?– Plot(education)
• No obvious correlation between high school and other dimensions
• What about bs and adv?
• How can we investigate?
• plot(education$bs ~ education$adv)
• Model with linear regression
• How?
Linear model
• Check out lm: ?lm
• lm(education$bs ~ education$adv)
• Visualize: use abline check out: ?abline
• abline(lm(education$bs ~ education$adv), col=“red”)
• Is it linear?
• Try non‐linear model: ?lowess
• lowess(education$bs ~ education$adv)
• Now plot with different color
• Try lines: ?lines
• lines(lowess(education$bs ~ education$adv), col="blue")
Drug trials
• Null hypothesis: 4 drugs have similar outcome
• Create data:– trials = sample(c("drug1", "drug2", "drug3","drug4"), size=1000, replace =T)
What does this command do?
Look at result
• outcome = ifelse(trials=="drug1", rlnorm(1000,meanlog=log(35)), ifelse(trials=="drug2", rlnorm(1000,meanlog=log(50)), ifelse(trials=="drug4", rlnorm(1000,meanlog=log(60)), rlnorm(1000,meanlog=log(80)))))
• What does this command do?
• Create data frame:– dt = data.frame(trial=trials, results=outcome)
• Examine the data frame
• How?
• Could print it
• Use summary command– summary(dt)
– What does the summary tell us?
– So which drug is better?
• Find the mean response of each drug
• How?
• Lookup aggregate function: ?aggregate
• Find the mean response of each drug
• aggregate(x=dt$results, by=list(dt$trial), FUN="mean")
• Now plot result using boxplot– boxplot(results ~ as.factor(trial), data = dt)
• What does “data=dt” do?
• Scaling not so good. What to do?
• Try log plot– boxplot(results ~ as.factor(trial), data = dt, log="y")
– Better!
• Use lm again to create model– model = lm(log10(results) ~ as.factor(trial), data=dt)
– What does this command do?• log10()?• as.factor()?
• Examine the resulting model– summary(model)
– How do we interpret the summary?
– Look at the p‐value on the F‐stat.
– Should we accept or reject the null hypothesis?
• Lets try some more visualization– Library(lattice)
– densityplot(~ results, group=trials, data=dt, auto.key=T)
– densityplot(~ log10(results), group=trials, data=dt, auto.key=T)