Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint -...

21

Transcript of Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint -...

Page 1: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...
Page 2: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

Visualization

• Exam data from CSCE 350

• Analysis?

• Mean & SD?

Page 3: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

Anscombe’s Quartet

Page 4: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Start by visualization

• Model selection

• Example: import education.csv

Page 5: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Just for chuckles: calculate mean

• How to visualize?

• Suggestions?

Page 6: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Plot individual columns?

• Plot everything?– Plot(education)

Page 7: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• No obvious correlation between high school and other dimensions

• What about bs and adv?

• How can we investigate?

Page 8: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• plot(education$bs ~ education$adv)

• Model with linear regression

• How?

Page 9: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

Linear model

• Check out lm: ?lm

• lm(education$bs ~ education$adv)

• Visualize: use abline check out: ?abline

• abline(lm(education$bs ~ education$adv), col=“red”)

Page 10: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Is it linear?

• Try non‐linear model: ?lowess

• lowess(education$bs ~ education$adv)

• Now plot with different color

Page 11: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Try lines: ?lines

• lines(lowess(education$bs ~ education$adv), col="blue")

Page 12: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

Drug trials

• Null hypothesis: 4 drugs have similar outcome

• Create data:– trials = sample(c("drug1", "drug2", "drug3","drug4"), size=1000, replace =T)

What does this command do?

Look at result

Page 13: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• outcome = ifelse(trials=="drug1", rlnorm(1000,meanlog=log(35)), ifelse(trials=="drug2", rlnorm(1000,meanlog=log(50)), ifelse(trials=="drug4", rlnorm(1000,meanlog=log(60)), rlnorm(1000,meanlog=log(80)))))

• What does this command do?

Page 14: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Create data frame:– dt = data.frame(trial=trials, results=outcome)

• Examine the data frame

• How?

• Could print it

Page 15: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Use summary command– summary(dt)

– What does the summary tell us?

– So which drug is better?

Page 16: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Find the mean response of each drug

• How?

• Lookup aggregate function: ?aggregate

Page 17: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Find the mean response of each drug

• aggregate(x=dt$results, by=list(dt$trial), FUN="mean")

• Now plot result using boxplot– boxplot(results ~ as.factor(trial), data = dt)

• What does “data=dt” do?

• Scaling not so good. What to do?

Page 18: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Try log plot– boxplot(results ~ as.factor(trial), data = dt, log="y")

– Better!

Page 19: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Use lm again to create model– model = lm(log10(results) ~ as.factor(trial), data=dt)

– What does this command do?• log10()?• as.factor()?

Page 20: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Examine the resulting model– summary(model)

– How do we interpret the summary?

– Look at the p‐value on the F‐stat.

– Should we accept or reject the null hypothesis?

Page 21: Visualizationrose/590B/PDF/Presentation1.pdf · auto.key=T) Title: Microsoft PowerPoint - Presentation1.pptx Author: rose Created Date: 2/5/2013 9:25:54 AM ...

• Lets try some more visualization– Library(lattice)

– densityplot(~ results, group=trials, data=dt, auto.key=T)

– densityplot(~ log10(results), group=trials, data=dt, auto.key=T)