Graphical Display 1 Pictures of Data. Edward Tufte Visual Display of Quantitative Information Using...
-
date post
19-Dec-2015 -
Category
Documents
-
view
240 -
download
4
Transcript of Graphical Display 1 Pictures of Data. Edward Tufte Visual Display of Quantitative Information Using...
Graphical Display 1
Pictures of Data
Edward Tufte
• Visual Display of Quantitative Information
• Using graphics for– Data– Concepts– Instruction– Finding patterns and answering
questions
Simple Plots
• One variable or one variable plus a grouping variable
• Reveals shape of the distribution• Distribution of cases over a
categorical variable• Evidence that metric variable
follows a normal distribution
Tufte’s Rules
• Above all else show data• Maximize the data/ink ratio• Erase non-data-ink• Erase redundant data-ink• Revise and edit
Pie Charts
• One variable, nominal scale, percentages
• A few categories (3-6)• Exploded for emphasis• Best in groups (2-4)• Best replaced by other graphic
displays
Rcmdr pie diagram of language familyin the CA Indians data set
3D Pies
• Strongly discouraged• plotrix has them:
– Install.packages(“plotrix”)– library(plotrix)– pie3D(table(CAIndians$Language),
radius=.85, labels = levels(CAIndians$Language), explode=.1)
Bar Charts
• One variable, nominal scale, counts or percentages
• More categories than pie charts(4-15)
• Color or shaded• Can be stacked, set side-by-side or
rotated• Can compare 2-3 groups
barplot
• Rcmdr uses this command:– barplot(table(CAIndians$Language),
xlab="Language", ylab="Frequency")• Just insert the following before the
left parenthesis to get it in color:– , col=rainbow_hcl(6)
Barplots with 2 groups
• Side by side or stacked bar plots make it possible to compare two categories at a time
• Load MissFauna and extract the Lilbourn samples:– Lilbourn <- as.matrix(MissFauna[5:7,])– LilbournPct <- prop.table(Lilbourn,
1)*100
Lilbourn Barplots
• par(mfrow=c(2,2))• barplot(LilbournPct)• barplot(t(LilbournPct))• barplot(LilbournPct, beside=TRUE,
legend=TRUE)• barplot(t(LilbournPct),
beside=TRUE, legend=TRUE)
Dot Chart
• Dot charts plot the amounts in each group along a common scale so they are more easily comparable:– dotchart(as.vector(table(CAIndians$L
anguage)), pch=16, labels = levels(CAIndians$Language), xlab = "Frequency")
Stem and Leaf Plot
• Tukey proposed as a way of looking at the distribution of a numeric variable
• With a small sample, can preserve the original data while showing its shape
> stem.leaf(DartPoints$Length, unit=1, na.rm=TRUE)1 | 2: represents 12 leaf unit: 1 n: 55 7 3* | 1223334 13 3. | 556788 25 4* | 011222333344 (11) 4. | 55777888999 19 5* | 224 16 5. | 55679 11 6* | 01144 6 6. | 567 3 7* | 4 2 7. | 8 1 8* | 4
Histograms
• Like a bar plot but the x-axis is a continuous measurement.
• Shape of data distribution is shown, but number of bars can change the shape.
Boxplot
• Also called box and whiskers plots show the quartiles and outlier points
• Multiple boxplots let you compare groups
Stripchart
• A stripchart plots the actual values along the y-axis
• It hides less information, but is not as familiar to many people
Kernel Density Plot
• Use the data to approximate a smooth distribution – varies according to the bandwidth
• A normal (or other distribution) is placed on each point and then the distributions are summed
• plot(density(DartPoints$Length))
Example
• x <- c(53, 59, 62, 63, 65, 67, 69, 71, 72, 77)
• hist(x, col="blue", las=1, cex.axis=1.5, cex.lab=1.5)
plot(density(x), main="Kernel Density Plot")rug(x)xi <- seq(45, 90, .5)for (i in 1:10) lines(xi, dnorm(xi, mean=x[i], sd=3)/10)d <- dnorm(x[1], mean=x[1], sd=3)/10matlines(rbind(x, x), rbind(rep(0,10), rep(d, 10)), lty=3, lwd=2, col="dark gray")
# Density plots# Load DartPoints.RDatapar(mfrow=c(2, 2))# Vary bandwidthplot(density(DartPoints$Length), cex.main=.8)plot(density(DartPoints$Length, bw=2), cex.main=.8)plot(density(DartPoints$Length, bw=6), cex.main=.8)plot(density(DartPoints$Length, bw=8), cex.main=.8)
# To compare two distributionspar(mfrow=c(1,1))a <- density(DartPoints$Length[DartPoints$Name=="Darl"])b <- density(DartPoints$Length[DartPoints$Name=="Pedernales"])plot(a, main="Dart Points", xlab="Length", ylab="Density", xlim=c(min(a$x, b$x), max(a$x, b$x)), ylim=c(0, max(a$y, b$y)), col="red")lines(b, col="blue")legend("topright", levels(DartPoints$Name), lty=1, col=c("red", "blue"))
Violin Plot
• Combine box plot and kernel density plot (package vioplot)
• with(DartPoints, vioplot(Length[Name=="Darl"], Length[Name=="Pedernales"], names=levels(Name)))
Beanplot
• Add strip plot to violin plot (package beanplot)
• with(DartPoints, beanplot(Length[Name=="Darl"], Length[Name=="Pedernales"], names=levels(Name)))