1 Graphics in EG and R HRP223 – 2009 November 16 th, 2009 Copyright © 1999-2009 Leland Stanford...

111
1 Graphics in EG and R HRP223 – 2009 November 16 th , 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Transcript of 1 Graphics in EG and R HRP223 – 2009 November 16 th, 2009 Copyright © 1999-2009 Leland Stanford...

Slide 1HRP223 – 2009
Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved.
*
Robbins
*
Why Do Data Visualization?
Well designed pictures will show you the details and the whole pattern in your data.
Numeric descriptions can easily hide important patterns.
Some patterns are hard to detect in tables.
*
*
Based on code written by Robert Allison at SAS Institute
Year 1
Year 2
*
*
Bad Things
First, I want to talk about bad graphics that I frequently see.
3d
Pie
Donuts
Don’t, Don’t, Don’t
*
Tufte is a God to many.
The empiricist in me is very nervous about the amount of pontificating in his books…
I want to have evidence-based advice.
His best advice is to put no extra ink on the page.
Think about the ink-to-information ratio.
Remove all chart junk.
*
You can remove ink rather than adding .
*
*
Based on Soukup & Davidson, 2002 Visual Data Mining
*
Cleveland
If you want to know how to do scientific visualization, you must read William Cleveland’s work.
He attempted to quantify what makes a good graphic good.
*
people are bad at judging the relative magnitude of angles
if you twist the rotation of the pie you can cause people to systematically misjudge the size of the angles
a 3rd dimension makes judgment worse
If you get a glossy handout with a 3D pie, assume someone is lying to you.
Don’t use them.
Don’t Explode!
*
Forbidden Donut….
*
Stacking is Bad
*
*
Don’t make your audience think unnecessarily!
Minimize the amount of ink on the page.
This needs to be studied.
Show the central tendency and the variability.
Plot the quantity (inference) that you want people to notice.
Be sure colorblind people can understand it.
*
Avoid Thinking
But labels on the graphic directly instead of using a key.
*
Bivariate Comparisons with Lines
People are extremely bad at judging the distance between two curves. Never ask people to judge up and down (vertical) distances between curves.
Based on: Robbins Creating More Effective Graphs, 2005
*
*
A one dimensional quantity is being “expanded” into two dimensions.
*
SAS Bar Charts
SAS makes the reader do extra work by rotating the axis labels in ActiveX images.
They pointlessly include variable labels by default.
*
Notice you can Edit the data and apply filters.
*
First create the format.
*
The GUI is Solid
*
Saving the Graphic for Publication
*
Default Output and Graphics
*
Types of Images
The default formats of the images are determined by the ODS destinations you are using:
LISTING: pgn visible in the Windows Image Fax Viewer
HTML: png, gif, jpg contained in web pages and visible in Internet Explorer, Firefox or Opera
LATEX: PostScrpt, epsi, gif, jpeg, pgn are visible in GhostView
PCL or PS: contained in Postscript file are visible in GhostView
PDF: contained in pdf, which is visible with Adobe Reader
RTF: visible in MS Word
*
I Typically Use HTML
Include image_dpi = 200 to set the resolution to be higher than the default 100 dots per inch. Try 200 for final images pasting into MS Office.
This says the images should show tooltips with extra statistical details when you hover the mouse over parts of the graphic. (I can’t image these.)
This is the appearance template. For optimal results use:
Analysis: color
Journal or journal2, etc: black and white
Statistical or statistical2, etc: color
*
imagename = “fileName”
imagefmt = jpg
width = 4.5 in
height = 4.5 in
*
ODS Graphics Editor
If you want to do extensive tweaking to a graphic, you can use the WYSIWYG ODS Graphics editor. Unfortunately it only works with ODS graphics procedures and you need to rerun the code in SAS to invoke it.
Right click on the graphic node and choose Open… Open Last Submitted Code.
Copy the code beginning with the SQL that makes the data.
*
*
*
WYSIWYG Editing
Right click and/or double click to set properties for objects in the plot.
*
Colors, text details, fonts
Point and click annotation
Symbols, arrows, text, circles
What is ODS?
The Output Delivery System (ODS) controls the type and appearance of SAS output.
Different appearance templates
Different output destinations/types.
*
*
ODS Graphics
Compared to the competition, for the last 10 years SAS graphics have been between poor and pathetic.
Graphics procedures that rendered okay quality, at best .
No “what you see is what you get” editing.
Many plots were nearly impossible to render.
Custom graphics required extensive programming.
SAS 9.x has attempted to solve this problem.
*
The old (commonly used) graphics procedures were gchart, gplot.
Now most analysis procedures have built in high quality graphics that can be invoked with an ODS graphics on statement.
Early on in the class I told you to tweak the EG options to include “ODS graphics on” with every run.
*
*
proc sgPlot
proc sgScatter
proc sgPanel
Proc sgRender
It replaces proc greplay
*
*
Download R for Mac or PC cran.cnr.berkeley.edu/bin/macosx/ cran.cnr.berkeley.edu/bin/windows/base
*
If you use a PC, also get PERL and Tinn-R
PERL is a text manipulation language that is used by a couple of key R packages. It ships with Mac OS X. PC users can get ActivePerl (what I use) or Strawberry Perl for Windows.
R Help
R help files are user hostile. To learn about the options for dotchart type:
?dotchart
To see why people use R for graphics look here:
Additional Libraries
*
Waffle Plots
I have not found software to do them. I need to find their real name…
*
*
*
*
*
Violin
*
Grouped Categorical Data
To graph categorical data in SAS you need to get Michael Friendly’s Visualizing Categorical Data. Unfortunately, his macros are copyrighted with the book… So I will show you the R versions.
Fourfold plots
Mosaic plots
Association plots
Fourfold Plots
*
There is clear evidence of sexist policies in admissions!
*
Department A admitted more females than males and every other department had no bias!
The joy of Simpsons paradox.
*
Mosaic Plots
*
Some basic voodoo in R shows which combinations are over (in blue) or under represented (in red).
values = c(5, 29, 14, 16, 15, 54, 14, 10, 20, 84, 17, 94, 68, 119, 26, 7);
values = matrix(values, nrow = 4, byrow=TRUE)
rownames(values) = c("Green", "Hazel", "Blue", "Brown")
colnames(values) = c("Black", "Brown", "Red", "Blond")
mosaicplot(values, shade = TRUE)
I prefer the simpler association plots.
values = c(5, 29, 14, 16, 15, 54, 14, 10, 20, 84, 17, 94, 68, 119, 26, 7);
values = matrix(values, nrow = 4, byrow=TRUE)
rownames(values) = c("Green", "Hazel", "Blue", "Brown")
colnames(values) = c("Black", "Brown", "Red", "Blond")
marg <- margin.table(values, c(1, 2))
*
*
Grouped Continuous Variables
You can use the Distribution Analysis to get basic grouped plots.
*
*
*
*
endpoints = 2 to 10 by 0.5
midpoints = 5.6 5.8 6.0 6.2 6.4
*
*
*
*
Notice the family is gone.
They jittered off of the graphic.
*
*
40 45
50 55
60 65
F in
al c
om po
ne nt
s iz
e (m
40 45
50 55
60 65
Ordinary Least Squares Regression
*
*
Bisquare
*
*
Loess
*
*
*
*
*
*
*
*
*
*
Specialized Plots
Most analysis procedures now have customized high resolution graphics. Some are automatically produced if you type ods graphics on.
Proc Freq
I wanted a deviation plot for a 2x2 (or really any sized table) showing which cell is driving a significant chi-square. They only give you a plot for a one-way table.
The ORPlot is very nice.
*
proc freq data = sashelp.heart;
;
format smoking_status $amount.;
run;
WYSIWYG Editing
Right click and/or double click to set properties for objects in the plot.
*
Colors, text details, fonts
Point and click annotation
Symbols, arrows, text, circles
Too Many Graphics
*
*
Proc phreg has a lot of new features but nothing major in the graphics. With phreg, if you specify ods graphics on you do not automatically get any plots. Here I request survival and cumulative hazard plots including the global confidence limits option (cl).
*
Proc lifetest can show the number at risk but
the implementation is weak. It labels the groups with numbers even if the strata are character strings. You have to manually edit them and this affords ample opportunity for mistakes.
*
Splitting a Grid
Some procedures produce a grid of plots. You can get access to the individual plots by specifying plots(unpack). Then you can use plots(only)=tableName to get just the right parts.
ODS select or exclude statements will not work.
*
*
Beyond the Basic Univariate plots
*
proc sgPlot
proc sgScatter
proc sgPanel
Proc sgRender
It replaces proc greplay
loess, regression, penalized b-splines, ellipse
Distributions
Categorization
From Heath 2007. SAS/Graph procedures for creating statistical graphics
*
As you add more requests to the plot, it resizes and shifts things to make room. It draws them in the order you request them. It reads the requests from the first listed to the bottom. Change the order if you want to have an item appear layered on top of, or behind, another thing.
*
*
*
*
*
*
*
*
*
*
*
A B C