Post on 01-Jan-2016
description
Marti HearstSIMS 247
SIMS 247 Lecture 4SIMS 247 Lecture 4Graphing Multivariate InformationGraphing Multivariate Information
January 29, 1998January 29, 1998
Marti HearstSIMS 247
Follow-up previous lectureFollow-up previous lecture
• Docuverse: Docuverse: – length of arc is proportional to number of subdirectories– radius for a given arc is long enough to
contain marks for all the files in the directory
• Nightingale’s “coxcomb”Nightingale’s “coxcomb”– keep arc length constant– vary radius length (proportional to sqrt(freq))
Marti HearstSIMS 247
Today: Multivariate InformationToday: Multivariate Information
• We see a 3D worldWe see a 3D world• How do we handle more than 3 How do we handle more than 3
variables?variables?– multi-functioning elements
• Tufte examples• cinematography example
– multiple views
Marti HearstSIMS 247
Example Data SetsExample Data SetsHow do we handle 9 variables?How do we handle 9 variables?– Our web access dataset– Factors involved in alcoholism
• ALCOHOL – USE– AVAILABILITY– CONCERN ABOUT USE– COPING MECHANISMS
• PERSONALITY MEASURES– EXTROVERSION– DISINHIBITION
• OTHER– GENDER– GPA
Marti HearstSIMS 247
Graphing Multivariate InformationGraphing Multivariate Information
How do we handle cases with more How do we handle cases with more than three variables?than three variables?– Scatterplot matrices– Parallel coordinates– Multiple views– Overlay space and time– Interaction/animation across time
Marti HearstSIMS 247
Multiple Variables: Scatterplot MatricesMultiple Variables: Scatterplot Matrices(from Wegman et al.)(from Wegman et al.)
Marti HearstSIMS 247
Mu
ltip
le V
aria
ble
s: S
catt
erp
lot
Mat
rice
sM
ult
iple
Var
iab
les:
Sca
tter
plo
t M
atri
ces
(fro
m S
chal
l 95)
(fro
m S
chal
l 95)
Marti HearstSIMS 247
Multiple Views: Star PlotMultiple Views: Star Plot(Discussed in Feinberg 79. Works better with animation. Example (Discussed in Feinberg 79. Works better with animation. Example
taken from Behrans & Yu 95.)taken from Behrans & Yu 95.)
Marti HearstSIMS 247
Multiple Dimensions: Parallel CoordinatesMultiple Dimensions: Parallel Coordinates(earthquake data, color indicates longitude, y axis severity (earthquake data, color indicates longitude, y axis severity
of earthquake, from Schall 95)of earthquake, from Schall 95)
Marti HearstSIMS 247
Multiple Dimensions: Multivariate Star PlotMultiple Dimensions: Multivariate Star Plot(from Behran & Yu 95)(from Behran & Yu 95)
Marti HearstSIMS 247
Chernoff FacesChernoff Faces• Assumption: people have built-in face Assumption: people have built-in face
recognizersrecognizers• Map variables to features of a cartoon faceMap variables to features of a cartoon face
– Example: eyes• location, separation, angle, shape, width
– Example: entire face• area, shape, nose length, mouth location, smile
curve
• Originally tongue-in-cheek, but taken seriouslyOriginally tongue-in-cheek, but taken seriously• Sometimes seems to work for small numbers Sometimes seems to work for small numbers
of pointsof points
Marti HearstSIMS 247
Chernoff Example Chernoff Example (Marchette)(Marchette)
• Three groups of pointsThree groups of points– each drawn from a different distribution with
5 variables
• First show scatter-plot matrixFirst show scatter-plot matrix• Then graph with Chernoff facesThen graph with Chernoff faces
– vary faces overall– vary eyes– vary mouth and eyebrows
• Which seems to be most effective?Which seems to be most effective?
Marti HearstSIMS 247
Chernoff Experiment Chernoff Experiment (Marchette)(Marchette)
Marti HearstSIMS 247
Chernoff Experiment Chernoff Experiment (Marchette)(Marchette)
Marti HearstSIMS 247
Chernoff Experiment Chernoff Experiment (Marchette)(Marchette)
Marti HearstSIMS 247
Chernoff Experiment Chernoff Experiment (Marchette)(Marchette)
Marti HearstSIMS 247
Overlaying Space and TimeOverlaying Space and Time(Minard’s graph of Napolean’s march through Russia)(Minard’s graph of Napolean’s march through Russia)
Marti HearstSIMS 247
A Detective StoryA Detective Story(Inselberg 97)(Inselberg 97)
• Domain: Manufacture of computer chipsDomain: Manufacture of computer chips• Objectives: create batches with Objectives: create batches with
– high yield (X1)– high quality (X2)
• Hypothesized cause of problem:Hypothesized cause of problem:– 9 types of defects (X3-X12)
• Some physical properties (X13-X16)Some physical properties (X13-X16)
• Approach:Approach:– examine data for 473 batches– use interactive parallel coordinates
Marti HearstSIMS 247
Multidimensional DetectiveMultidimensional Detective
• Long term objectives: Long term objectives: – high quality, high yield
• Logical approach given the Logical approach given the hypothesis:hypothesis:– try to eliminate defects
• First clue: First clue: – what patterns can be found among
batches with high yield and quality?
Marti HearstSIMS 247
Detectives aren’t intimidated!Detectives aren’t intimidated!
X1 seems to be normally distributed; X2 bipolar
Marti HearstSIMS 247
High quality yields obtained despite defectsHigh quality yields obtained despite defects
goodbatches
some low X3defect batchesdon’t appear here
X15breaksinto twoclusters(importantphysicalproperty)
at least onegood batch withdefects
Marti HearstSIMS 247
Low-defect batches are not highest quality!Low-defect batches are not highest quality!few defects
lowyield,low quality
Marti HearstSIMS 247
Original plot shows defect X6 behaves Original plot shows defect X6 behaves differently; exclude it from the 9-out-of-10 differently; exclude it from the 9-out-of-10 defects constraint; the best batches returndefects constraint; the best batches return
Marti HearstSIMS 247
Isolate the best batches.Isolate the best batches.Conclusion: defects are necessary!Conclusion: defects are necessary!
The very best batchhas X3 and X6 defects
Ensure this is not anoutlier -- look attop few batches.The same result is found.
Marti HearstSIMS 247
How to graph web page traversals? How to graph web page traversals?
Marti HearstSIMS 247
References for this LectureReferences for this Lecture
• Visualization Techniques of Different Dimensions, John Behrens and Chong Ho Visualization Techniques of Different Dimensions, John Behrens and Chong Ho Yu, 1995 Yu, 1995 http://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.htmlhttp://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.html
• Feinberg, S. E. Graphical methods in statistics. Feinberg, S. E. Graphical methods in statistics. American Statisticians, 33, American Statisticians, 33, 165-165-178, 1979178, 1979
• Friendly, Michael, Gallery of Data Visualization. Friendly, Michael, Gallery of Data Visualization. http://www.math.yorku.ca/SCS/Galleryhttp://www.math.yorku.ca/SCS/Gallery– scan of Minard’s graph from Tufte 1983– multivariate means comparison
• Wegman, Edward J. and Luo, Qiang. High Dimensional Clustering Using Parallel Wegman, Edward J. and Luo, Qiang. High Dimensional Clustering Using Parallel Coordinates and the Grand Tour., Conference of the German Classification Coordinates and the Grand Tour., Conference of the German Classification Society, Freiberg, Germany, 1996. http://galaxy.gmu.edu/papers/inter96.htmlSociety, Freiberg, Germany, 1996. http://galaxy.gmu.edu/papers/inter96.html
• Cook, Dennis R and Weisberg, Sanford. An Introduction to Regression Graphics, Cook, Dennis R and Weisberg, Sanford. An Introduction to Regression Graphics, 1995. http://stat.umn.edu/~rcode/node3.html1995. http://stat.umn.edu/~rcode/node3.html
• Schall, Matthew. SPSS DIAMOND: a visual exploratory data analysis tool. Schall, Matthew. SPSS DIAMOND: a visual exploratory data analysis tool. Perspective, 18 (2), Perspective, 18 (2), 1995. http://www.spss.com/cool/papers/diamondw.html1995. http://www.spss.com/cool/papers/diamondw.html
• Marchette, David. An Investigation of Chernoff Faces for High Dimensional Marchette, David. An Investigation of Chernoff Faces for High Dimensional Data Exploration. http://farside.nswc.navy.mil/CSI803/Dave/chern.htmlData Exploration. http://farside.nswc.navy.mil/CSI803/Dave/chern.html
• Chernoff, H. The use of Faces to Represent Points in k-Dimensional Space Chernoff, H. The use of Faces to Represent Points in k-Dimensional Space Graphically. Graphically. Journal of the American Statistical Association, 68,Journal of the American Statistical Association, 68, 361-368, 1973. 361-368, 1973.
Marti HearstSIMS 247
Next Time: Brushing and LinkingNext Time: Brushing and Linking
• An interactive techniqueAn interactive technique• Brushing:Brushing:
– pick out some points from one viewpoint
– see how this effects other viewpoints– (Cleveland scatterplot matrix
example)
• Graphs must be linked togetherGraphs must be linked together
Marti HearstSIMS 247
Brushing and Linking SystemsBrushing and Linking Systems
• VISAGE: Roth et. alVISAGE: Roth et. al• Attribute Explorer: Tweedie et. alAttribute Explorer: Tweedie et. al• SpotFire (IVEE): Ahlberg et. alSpotFire (IVEE): Ahlberg et. al