PPL IVS 9500170 - gatech.edu

16
Information Visualization (2008) 7, 18 -- 33 ©2008 PalgraveMacmillan Ltd.All rightsreserved 1473-8716 $30.00 www.palgrave-journals.com/ivs DataMeadow: a visual canvas for analysis of large-scale multivariate data Niklas Elmqvist 1 John Stasko 2 Philippas Tsigas 3 1 INRIA, Université Paris-Sud, France; 2 School of Interactive Computing and the GVU Center, Georgia Institute of Technology, Atlanta, GA, U.S.A.; 3 Department of Computer Science & Engineering, Chalmers University of Technology, Gothenburg, Sweden Correspondence: Niklas Elmqvist, INRIA/LRI, Bat 490, Université Paris-Sud, 91405 Orsay Cedex, France. Tel: +33 1 69 15 61 97; Fax: +33 1 69 15 65 86; E-mail: [email protected] Received: 1 December 2007 Accepted: 5 January 2008 Online publication date: 28 February 2008 Abstract Supporting visual analytics of multiple large-scale multidimensional data sets requires a high degree of interactivity and user control beyond the conven- tional challenges of visualizing such data sets. We present the DataMeadow, a visual canvas providing rich interaction for constructing visual queries using graphical set representations called DataRoses. A DataRose is essentially a starplot of selected columns in a data set displayed as multivariate visualiza- tions with dynamic query sliders integrated into each axis. The purpose of the DataMeadow is to allow users to create advanced visual queries by itera- tively selecting and filtering into the multidimensional data. Furthermore, the canvas provides a clear history of the analysis that can be annotated to facilitate dissemination of analytical results to stakeholders. A powerful direct manipu- lation interface allows for selection, filtering, and creation of sets, subsets, and data dependencies. We have evaluated our system using a qualitative expert review involving two visualization researchers. Results from this review are favorable for the new method. Information Visualization (2008) 7, 18 -- 33. doi:10.1057/palgrave.ivs.9500170 Keywords: Multivariate data; Visual analytics; Parallel coordinates; Dynamic queries; Progressive analysis; Starplots Introduction Managing and presenting large, high-dimensional data sets is one of the core problems in information visualization, and the vast number of different approaches to solving this problem attests to its difficulty. 1 However, allowing for efficient analysis of such data sets also requires smooth and powerful interaction techniques for selecting, filtering, and combining the data. In fact, realistic multivariate analysis tasks often involve correlation and comparison of data from several sources, requiring that these techniques be capable of operating on multiple large-scale data sets instead of just one. Finally, insights are meaningless if they are not communicated to others, so our tools should support the dissemination of results of the analysis to an outside audience. 2 In this paper, we present a method called the DataMeadow that was designed to meet these requirements. The DataMeadow (Figure 1) provides users with a canvas for exploring multidimensional data sets using advanced visual queries. The data itself are represented by a DataRose, a color-coded, parallel coordinate starplot displaying selected variables of the set. Each displayed variable can be filtered using dynamic query bars 3,4 attached to each rose axis. Individual DataRoses are connected in a data flow fashion; these connections are illustrated by arrows exiting the center of one DataRose and entering the center of another, as illus- trated in the figure. In this way, the user can progressively build more and more complex queries with varying subsets of the data being passed along.

Transcript of PPL IVS 9500170 - gatech.edu

Page 1: PPL IVS 9500170 - gatech.edu

Information Visualization (2008) 7, 18 -- 33© 2008 Palgrave Macmillan Ltd. All r ights r eser ved 1473-8716 $30.00

www.palgrave-journals.com/ivs

DataMeadow: a visual canvas for analysis oflarge-scale multivariate data

Niklas Elmqvist1

John Stasko2

Philippas Tsigas3

1INRIA, Université Paris-Sud, France; 2Schoolof Interactive Computing and the GVUCenter, Georgia Institute of Technology,Atlanta, GA, U.S.A.; 3Department of ComputerScience & Engineering, Chalmers Universityof Technology, Gothenburg, Sweden

Correspondence:Niklas Elmqvist, INRIA/LRI, Bat 490, UniversitéParis-Sud, 91405 Orsay Cedex, France.Tel: +33169156197;Fax: +33 169156586;E-mail: [email protected]

Received: 1 December 2007Accepted: 5 January 2008Online publication date: 28 February 2008

AbstractSupporting visual analytics of multiple large-scale multidimensional data setsrequires a high degree of interactivity and user control beyond the conven-tional challenges of visualizing such data sets. We present the DataMeadow,a visual canvas providing rich interaction for constructing visual queries usinggraphical set representations called DataRoses. A DataRose is essentially astarplot of selected columns in a data set displayed as multivariate visualiza-tions with dynamic query sliders integrated into each axis. The purpose ofthe DataMeadow is to allow users to create advanced visual queries by itera-tively selecting and filtering into the multidimensional data. Furthermore, thecanvas provides a clear history of the analysis that can be annotated to facilitatedissemination of analytical results to stakeholders. A powerful direct manipu-lation interface allows for selection, filtering, and creation of sets, subsets, anddata dependencies. We have evaluated our system using a qualitative expertreview involving two visualization researchers. Results from this review arefavorable for the new method.Information Visualization (2008) 7, 18--33. doi:10.1057/palgrave.ivs.9500170

Keywords: Multivariate data; Visual analytics; Parallel coordinates; Dynamic queries;Progressive analysis; Starplots

IntroductionManaging and presenting large, high-dimensional data sets is one ofthe core problems in information visualization, and the vast numberof different approaches to solving this problem attests to its difficulty.1

However, allowing for efficient analysis of such data sets also requiressmooth and powerful interaction techniques for selecting, filtering, andcombining the data. In fact, realistic multivariate analysis tasks ofteninvolve correlation and comparison of data from several sources, requiringthat these techniques be capable of operating on multiple large-scale datasets instead of just one. Finally, insights are meaningless if they are notcommunicated to others, so our tools should support the dissemination ofresults of the analysis to an outside audience.2

In this paper, we present a method called the DataMeadow that wasdesigned to meet these requirements. The DataMeadow (Figure 1) providesusers with a canvas for exploring multidimensional data sets usingadvanced visual queries. The data itself are represented by a DataRose,a color-coded, parallel coordinate starplot displaying selected variablesof the set. Each displayed variable can be filtered using dynamic querybars3,4 attached to each rose axis. Individual DataRoses are connected ina data flow fashion; these connections are illustrated by arrows exitingthe center of one DataRose and entering the center of another, as illus-trated in the figure. In this way, the user can progressively build more andmore complex queries with varying subsets of the data being passed along.

Page 2: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al19

Figure 1 Sample house value and acreage vs number of rooms and owner income query in the DataMeadow.

Furthermore, the incrementally refined queries canbe annotated with various visual representations inorder to communicate the results to stakeholders (i.e.communication-minded visualization5). For added flex-ibility, the roses can be freely moved around, resized,and manipulated on the meadow canvas to allow foreasy comparison to other data sets. To provide for morecomplex comparisons, DataRoses come in different types,either representing a data source or a specific set operationsuch as union, intersection, or uniqueness. This allowsroses to be connected to other roses using dependencies,forming visual query chains. In essence, the DataMeadowprovides a form of visual pivot table, enabling the userto refine and examine selected portions of a large multi-variate data set in parallel.

In order to assess the utility and interaction effi-ciency of the method, we performed an expert reviewusing a think-aloud protocol involving two visualizationresearchers. Our observations from this study indicatethat the DataMeadow is a useful way of thinking andinteracting with multivariate data. The participants bothremarked on the ease of creating queries and the powerof being able to play with the data using the interactiontechniques and getting immediate feedback.

The rest of this paper is organized as follows: We beginwith a tour of the existing work on visualization andexploration of multivariate data. We then formulate therequirements for an analysis tool intended for such data,including identifying the user group and the main usertasks. We describe the DataMeadow visual canvas in detailand describe two typical scenarios using the tool. This isfollowed by our user evaluation and the results we gainedfrom it. We finish the paper with a discussion and ourconclusions.

Related workThe work presented in this paper builds on ideasand inspiration both from techniques for visualizing

multivariate data, as well as the application of thesetechniques to visual exploration, visual analytics, andprogressive analysis. We describe these areas in turn inthe following sections.

Multivariate visualizationMuch work has been conducted on visually presentingmultidimensional data in a form suitable for under-standing; Keim1 gives an overview and taxonomy of suchtechniques. For large sets of multidimensional data, stan-dard 2D or 3D symbolic displays such as plots, diagrams,and charts are generally insufficient due to scalabilityreasons, and more advanced methods are needed. Exam-ples of such methods include geometrically transformeddisplays,6 iconic displays,7 dense pixel displays,8–10 andstacked displays.11–13

One prolific geometrically transformed display tech-nique is parallel coordinates,14,15 which abandons thestandard practice of orthogonal dimension axes andinstead stacks up the axes in parallel, tracing a lineinstead of a point through the axes for each data case.The diagram is then easily extended with new parallelaxes for each new dimension that is to be visualized. Toavoid a linear extension of the diagram, an alternativerepresentation called a starplot is constructed by trans-forming the diagram into polar space, mapping eachaxis on the radius of a circle. The DataRose compo-nent presented in this work is a direct descendant ofthe starplot and has indeed a parallel coordinate mode,but also other visual representations showing data dis-tribution.

Fua et al.16 introduce hierarchical parallel coordinatesthat are rendered in clusters using opacity bands insteadof drawing each individual data point. The approachwas later extended to starplot displays. The DataRoseuses opacity bands as one visual representation, but theyare manually clustered by the analyst. The system also

Information Visualization

Page 3: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al20

supports histogram bands that show the distribution ofthe underlying data.

The parallel sets technique17 is another approach torepresenting distribution for categorical data in a parallelcoordinate diagram. It uses proportional scales and colorpaths to show how different categories divide amongadjacent dimensions. Sifer18 extends the idea by removingthe color paths and instead relying on implict colorcoding. The DataRose also makes data distribution inthe parallel coordinate display explicit, but our approachdoes not require categorical or hierarchical data.

The DataMeadow presented here can support a verylarge number of data cases, but if the number of variablesto visualize grows too large, the scalability of the tech-nique is affected. In such cases, we must employ tech-niques for very high-dimensional representation, such asthe dense pixel displays8–10 and stacked displays11–13

mentioned above. While our current data sets are not ofthis magnitude, we can easily foresee integrating visualelements based on these visualizations onto our canvasas well.

General visual explorationVisual exploration enables gaining insights from (possiblyunknown) large data sets using visualization. Its precursor,data exploration, was performed primarily using variousstatistical analyses6,19 combined with static diagrams. Forexample, Trellis displays20 combine several diagrams –typically 2D scatterplots or line diagrams – into one panel,enabling comparison of related views with small param-eter changes.

Visual exploration is traditionally employed for multi-dimensional data where the analyst will typically havelittle or no a priori intuition about the relationship andcorrelation between the dimensions. Thus, the initialpart of the exploration task often becomes one of gettingan overview of the data before looking at details.21

What primarily distinguishes visualization from statis-tical approaches to data exploration is interaction.22

Dynamic queries4,23 were one of the early approaches tointeractive multidimensional exploration.

In a more recent work, Theron presents the conceptof interactive parallel coordinate plots (IPCPs)24 asan interactive tool for analysis. IPCPs provide interac-tion techniques such as brushing25 and axis filtering26

similar to the DataRose approach in this paper. However,the DataMeadow allows for linking several DataRosestogether to construct composite queries that are dynam-ically updated as the analyst interacts with the visualelements.

Finally, other work on visual exploration has tackledthe problem of multivariate data: Xie et al.27 consider twoapproaches to incorporating quality information in multi-variate visualization. Pivot graphs28 extend the conceptof pivot tables (foundd in applications such as MicrosoftExcel) by visually aggregating data cases on values forselected dimensions. The Dust & Magnets29 technique is

an example of the power of interaction for exploration,and shows how a simple interaction can provide impor-tant insights into a complex data set through anima-tion. Another example is the parallel coordinate tree30

introduced by Brodbeck and Girardin for presentinghierarchical and multidimensional data using a treerepresentation. Their use of focus + context distortion forinteracting with the visualization fulfills an integral rolein the exploration of the data.

Visual exploration environmentsSeveral visual environments have been proposed andbuilt for explicitly supporting multidimensional visualexploration. Roughly speaking, they are divided intofree-form and structured platforms, where the formerprovides a sketchboard for constructing analyses, andthe latter uses a more rigid data format to allow forprocessing and visualizing structured information. Thetwo main systems that characterize these approaches arethe Sandbox and Polaris, respectively.

The Sandbox31 system is an example of a free-formvisual analytics platform. The tool emphasizes fluid inter-action on a 2D canvas using direct manipulation topromote visual thinking. Towards this end, the Sandboxeven supports a gesture detection component. Analysesare iteratively built and refined, bringing together manydifferent types of media ranging from documents andtext, to images and annotations.

Polaris32 and Tableau (http://www.tableausoftware.com/), its commercial successor, are structured visualanalysis platforms that allow for exploring large multidi-mensional databases based on data cubes (basically thepivot table mechanism popularized by Microsoft Excel).Users explore the data by visually constructing a specifi-cation for the graphical representation, mapping dimen-sions to fields and columns in a table, and aggregating theothers. The results can then be visualized using standardtechniques such as bar charts, scatterplots, and parallelcoordinate displays.

The VITE33,34 system represents a mix of the two aboveextremes by maintaining a two-way mapping betweenstructured information in a database and the informal,spatial arrangements of information that is most naturalto a human analyst. Similar to the DataMeadow, thesystem provides a visual canvas for interacting andmanipulating information. Instead of enforcing a strictformalization on the information created by the user, thesystem stores both informal and formal representations.This allows for retaining intermediate and semi-structuredinformation that is important for problem solving.33

Similar to VITE, the DataMeadow system presentedin this paper is a combination of free-form and struc-tured visual exploration. We provide a semi-structuredvisual canvas for effortlessly constructing and refiningvisual queries, just like the Sandbox. We even support thesame kind of gesture detection support to further stream-line the interaction. However, visual components on the

Information Visualization

Page 4: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al21

DataMeadow canvas all obey the same multivariate dataformat, supporting data flow and automatic processing ofthe data, akin to more structured systems like Polaris. TheDataMeadow canvas also allows for visual pivot table-likequeries. On the other hand, the data flow model in theDataMeadow is not nearly as strict as the one employedby Polaris and Tableau.

Several other systems beyond these exist in the litera-ture; a few representative examples follow here. Brennanet al.35 present a framework for exploration of multidi-mensional data and employ a visual canvas, but focuson collaborative aspects of the platform. Yang et al.36

present an analysis-guided exploration system thatsupports the user by automatically identifying impor-tant data – ‘nuggets’ – based on the interests of theusers. ClusterSculptor37 is a visual environment for high-dimensional cluster analysis consisting of several visualand analytical components.

Progressive analysisGotz et al.38 define progressive analysis as an interactiveprocess where newly synthesized knowledge becomes thefoundation for future discovery. Their HARVEST systemallows for marshaling both existing analytical knowl-edge as well as new knowledge discovered during theanalysis process. This concept of progressive analysis iscore to many visual analytics environments, such as theSandbox31 and Polaris.32

Spreadsheet applications, such as Microsoft Excel, areclassic examples of tools supporting progressive analysis,where users perform calculation by chaining togethercells using dependencies and mathematical operations.However, the visualization aspects of most spreadsheetapplications are primitive and typically lack interaction.Another problem is that dependencies are virtually in-visible on the spreadsheet; Shiozawa et al.39 addressthis by lifting dependencies into 3D, but their work isstill concerned with traditional spreadsheets.

Chi et al.40 present an approach to unifying visual-ization of multidimensional data sets into a spreadsheetframework. The implicit data dependency model in theirwork resembles the more explicit data flow model in theDataMeadow, and similarly allows for the visualizationand modification of (possibly several different branchesof) intermediate results in a larger analysis. Furthermore,the tabular layout is familiar to users, allows for compar-ison between different cells, and supports customizedlayout depending on the task. Since single cells can holdentire data sets, their framework also provides compoundoperations such as set subtraction and addition.

However, Chi et al. also note that one of the featuresof a spreadsheet approach is that it hides the operatorsand instead show the operands, unlike data flow envi-ronments that typically hide operands and focus on theoperators. Our objective with the DataMeadow data flowmodel was to make both operands and operators visibleon the analysis canvas.

RequirementsA number of both functional (task-centered) and non-functional (general) requirements apply for a visualanalytics application designed for multivariate data.These requirements have been derived from theoret-ical treatments1,2,21 as well as existing cognitive taskanalyses31 of visual exploration and the analysis process.

The primary users of the DataMeadow tool are expertanalysts familiar with multidimensional data manage-ment and representation. Some of the operations, suchas filter and set operations, may be too complex for anovice user to easily grasp, yet are necessary to satisfy therequirements of the target user group.

GeneralBelow are the main non-functional guiding requirementsnecessary to fulfill analyst goals:

(R1) Interaction: Interaction with the system must besmooth and effortless in order not to slow down or distractthe user. It should also cater to expert-level users who areexperienced in using the system.

(R2) Exploration: Support and encourage data explo-ration by providing easy access to analysis tools such asfiltering, sorting, correlation, etc.21

(R3) Iterative refinement: The approach should lend itselfto progressive analysis38 of the data in small multiples.41

(R4) Communication: Support the production, presenta-tion, and dissemination of analytical results.2,5

User tasksVisual exploration1,42 often follows the ‘information-seeking mantra’:21 overview first, zoom and filter, andprovide details on demand. Any visual analytics applica-tion should support these basic tasks, and this is the casefor the system discussed in this paper.

More specifically, in this work we are targeting simulta-neous visualization of multiple large-scale data sets. Themain user task the application needs to support is compar-ison; either comparison between different data sets, suchas data for different states in the United States, or betweensubsets of the same or different sets, such as data fordifferent cities or counties in the same state.

In the task taxonomy of Amar et al.,43 comparison isclassified as a higher-level meta-operation. In our model ofthe DataMeadow, this is certainly true: in order to supportthis broad comparison operation, we must provide for awide range of lower-level user tasks such as (using theterminology of Amar et al.) retrieve value, filter, correlate,characterize distribution, etc. Wehrend and Lewis44 referto this operation as compare within and between relations– this also applies to the DataMeadow, where we supportboth comparison between data sets as well as betweensubsets within the same data set.

The DataMeadow methodThe DataMeadow method is a free-form visual analyticsenvironment based on a visual canvas with a structured

Information Visualization

Page 5: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al22

data flow model designed for visualization of multiplehigh-dimensional data sets. Users build visual queries byconnecting visual components into dependency chainsthat allow for iterative refining. The main driving usertask behind the design of the technique is comparisonbetween different sets or subsets of data.

In this section, we describe the visualization method,including the data flow model, the visual elements, andthe interaction techniques.

DataMeadow canvasThe actual DataMeadow component is an infinite 2Dcanvas populated with a collection of visual elements usedfor multivariate visual exploration. A visual element is agraphical entity with an appearance, a number of inte-grated user controls, an operation on its input data cases(potentially the identity function), and input and outputdependencies. Elements can be created, modified, anddestroyed as needed. Individual elements are chainedtogether using dependencies, forming a data flow thatpropagates data cases between the elements.

• Canvas: The infinite 2D plane on which all componentsare anchored. Supports sort and layout operations ofelements. Has an associated data format that describesthe meta information about the available dimensionsand their data type.

• Visual elements: A graphical entity used for data analysis.Different element types perform different operations.Example types include DataRoses, textual annotations,data viewers, etc.

• Dependencies: Directed connections linking one visualelement to another. Data cases are propagated throughdependencies from the source to the destinationelement.

Taken together, these components can be assembledto construct complex visual queries. In the followingsections, we will describe this process in more detail.

Data formatEach DataMeadow conforms to a specific data formatthat describes the format of the data sets, that is, thecolumns and their metadata (name, data type, domain,labels, operations, etc.). The format specifies what infor-mation is stored in the visual elements as well as what ispassed through the dependencies connecting them. Themeadow can contain several different data sets as long asthey all conform to the same data format, allowing forcomparison of multiple related data sets (such as baseballstatistics for different seasons, teams, or players, or USCensus data from different years or states).

The data types for each dimension may be one of{literal, nominal, ordinal, quantitative}, and govern permis-sible operations on the dimension (i.e. ordinal data canbe sorted, nominal data can be reordered to optimize

readability, quantitative data supports mean, median andmin/max computation, etc.). Data labels for nominal orordinal data may be used as labels in the visual elementson the DataMeadow canvas.

Visual elementsThe basic building block of the DataMeadow method isthe visual element, a component consisting of a visualappearance, a variable number of user controls (nonefor some elements), and input and output dependencies.Furthermore, elements have a transformation operationthat is applied to all of the input data cases (potentiallythe identity function for basic elements).

Each element follows a strict multivariate data modelbased on the currently active data format for the canvas.As discussed above, the data format governs how informa-tion flows through the system through the dependenciesand how it can be transformed by the elements.

There are three types of visual elements in theDataMeadow method (examples of each type are givenin brackets):

• Sources: Producer elements from where data origi-nates. The data are propagated to other elements usingoutgoing dependencies (database readers, noise gener-ators, number generators, etc.).

• Sinks: Consumer elements that accept incoming dataand consume it, potentially changing its visual appear-ance to reflect the nature of the data (viewers, annota-tions, flags).

• Transformers: Input/output elements that transformincoming data using some operation and output it tooutgoing dependencies (DataRoses).

In the following sections, we will be describing some ofthe elements in greater detail.

Dependencies and data flowA dependency is a directed connection between two visualelements on the same DataMeadow. Data cases from thesource flow along the dependency to the destinationelement using the data format of the meadow. This is thebasic principle supporting the iterative refinement (R3)requirement from the section General.

Dependencies are never filtered or constrained (filteringis performed in the visual elements). Mutual or circulardependencies are not possible on the DataMeadow canvasdue to the flow-directed nature of the underlying datamodel. Attempting to create a circular dependency will bedetected and prevented by the system.

Dependencies ensure that changes in source data areproperly propagated to all dependees. Thus, when ananalyst changes the parameters of a visual element in achain, all elements further down in the chain are imme-diately updated to provide feedback to the user. This way,the user can directly see the effect of a parameter changeto the visual query.

Information Visualization

Page 6: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al23

Figure 2 Sample DataRose visualization for a university student database of a computer science department. (A) Color histogrammode (high brightness equals high density). (B) Opacity bands mode. (C) Parallel coordinate mode.

DataRoseThe core visual elements in the DataMeadow methodare called DataRoses: 2D starplots displaying multivariatedata of the currently selected dimensions of the dataset. The data can have different visual representationsdepending on the task; examples include color histogrammode, opacity band mode,16 and standard parallel coor-dinates mode. The design intention of the DataRose isto provide a self-contained visual entity that lends itselfto side-by-side comparison to other data sets.

A DataRose represents one specific data set, and can bederived either from a database or noise generator source,or be the result of a set operation (see below for more onthis). More specifically, a DataRose is a mathematical set,that is, all entities contained in a rose appear only once.

Visual representation Figure 2 shows the three visual roserepresentations for a fictitious university student database.

The database contains 500 student entries and maintainsfive value dimensions (beyond the student name): theage (quantitative), major (nominal), gender (nominal),GPA (quantitative), and graduation year (ordinal) ofeach student (Figure 6 gives a sample of the data formatspecification).

For all three visual representations, a single black poly-line is used to show the average for each dimension. Lowvalues are close to the origin, high values reside on theouter radius. Data labels can be visualized on the axes, butrendering them all may cause high visual clutter. Instead,hovering the cursor over an axis or moving the dynamicquery filter handles (see below) will show the currentlyhighlighted data value.

In color histogram mode, the data distribution for eachdimension is shown on the surface of the rose using acontinuous color scale. The color transitions betweencolor values of adjacent axes are rendered using smoothinterpolation.

Information Visualization

Page 7: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al24

Figure 2(A) shows a color histogram using the OCS46

color scale. High brightness indicates high density in theunderlying distribution, so it appears that age (the 12o’clock dimension) is fairly evenly distributed across thedata set. Going clock-wise around the rose, for major thereis a brighter cyan area around the mid-term mark of thedimension, indicating a high concentration for this value.As it turns out, this is a database of students attendingcourses in a computer science department, and lookingat the value legend reveals that this particular value is forstudents who have computer science as their major. Thegender dimension reinforces this fact, as there appearsto be a skewed gender balance in the data set (i.e. theinner half of the gender dimension, representing males,in the DataRose is brighter than the outer part, repre-senting females, indicating a higher number of male thanfemale students) that is characteristic of computer sciencecourses. The students have an above-average, bell-shapedGPA (there is a gradual ‘hump’ of brighter values centeredin the upper part of this dimension), and most of themseem to be freshmen or sophomores (the lower half (yearsone and two) of the year dimension is brighter than thehigher half (years three and four)).

In opacity band mode, the underlying data are abstractedusing opacity bands that smoothly go from full opacity atthe average to full transparency at the extremes (minimaand maxima). Transitions between adjacent axes are againrendered using smooth interpolation. Figure 2(B) showsan opacity band where the amount of purple color indi-cates the data density. The same trends that we notedfrom the color histogram representation are visible hereas well, albeit at a higher abstraction level. Furthermore,the density of the data for different values is less obvious,and the observation about most students being computerscience majors is hard to make here.

Finally, the parallel coordinate mode uses traditionalparallel coordinate rendering, where all cases of theunderlying data set are rendered using polylines thatconnect the values for each dimension. However, thedisadvantage is that data distribution is more difficult tosee in this visual representation.

Different representations are suitable for different tasks;while parallel coordinates certainly display the most infor-mation, it is sometimes useful to be able to abstract awaysome of the details when trying to get an overview of thedata set. Opacity bands are suitable for getting an idea ofthe average and extreme values of the underlying data set.For some analysis tasks, it is important to be able to seethe data distribution, something that can be very diffi-cult in parallel coordinate mode where a lot of data casesmight map to the same position on the axis (especiallyfor nominal dimensions). Color histogram mode shows adetailed breakdown of how the data cases divide amongthe values along each dimension.

Starplot layout Starplots are constructed by splittinga full 360◦ circle into n parts, one for each of the

dimensions D = {d1, d2, . . . , dn} to be visualized. Thisassigns each dimension 360◦/n of the circle. For eachdata field, an axis is drawn radially from the center ofthe circle to its perimeter. The center part of the plot isreserved for interaction, such as dragging the plot aroundthe canvas and creating dependencies, and this part isalso used for the visual icon for the specific plot type. Theremaining part of the axis is normalized to the range ofthe associated data field and is used for plotting individualdata cases.

Note that in all visual modes, we use the starplot axesas continuous dimensions, even for nominal data. Thisis perhaps counter-intuitive and imposes an artificialordering between these values. For future iterations ofthe technique, it would be useful to employ the DQC45

reordering approach to impose an optimal ordering ofcoordinate mappings of nominal variables.

Axis filtering In the DataMeadow, as shown in Figure 1,each DataRose starplot axis also has a dynamic query4,23

slider to allow for axis filtering.26 The handles for eachslider are shown as two small opposing triangles on theaxis plotted at the extremes of the current filter selec-tion. In addition, a semi-transparent area is drawn overthe areas of the DataRose falling outside of the currentfilter selection. The user can grab the query handles andmove them, dynamically changing the filter selection andcausing the visual elements further down in the chain ofconnected elements to be updated. This allows the analystto go back and make upstream filter changes that affect awhole query.

This iterative refinement using dynamic queries isan important distinction to software systems that arebased on dynamic queries (DQ), such as Spotfire. Inthese systems, the sliders are typically global in scope,whereas they are local for the data flow chain in the Data-Meadow.

Figure 3 shows an example of axis filtering where theanalyst has filtered the student database example fromabove to only include students of 25 or above with acertain range of graduation year, major, and GPA. Anyoutgoing dependencies from this rose will only propagatethe filtered data. Furthermore, the data flow model showsinteractive feedback at all times as the analyst is changingthe DQ, promoting visual exploration of the data. Movingthe slider handles also shows the labels of the currentrange of the filter.

In Figure 1, the analyst has constructed a completevisual query from a house database for the state ofVermont. By changing the DQ filter settings in the middlerose, the analyst is able to study the correlation of highvalue and acreage on the number of rooms and bedroomsof a house by looking at the average and extreme valuesin the result rose to the right. The leftmost database roseand the result rose have also been connected to a barchartviewer (the section viewer elements) to show the relativesizes of the two roses.

Information Visualization

Page 8: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al25

Figure 3 Dynamic query axis filtering for the student database.

Rose types To support complex user tasks such as corre-lation and characterization, we introduce additionalDataRose types other than the standard database source,which represents an external database loaded from a fileor connected to over the network. We define the rosetypes as set operations, allowing us to construct advancedvisual queries through constrained and unconstraineddependencies. All rose types accept variable input depen-dencies, that is, they have been generalized from standardbinary set theory operations.

• Source: External database loaded from a file or connectedto over the network (Figure 4(A)).

• Union: Set representing the union of all input depen-dencies, that is, the combination of all input cases(Figure 4(B)).

• Intersection: Set representing the intersection of all inputcases, that is, only cases that are present in all inputdependencies (Figure 4(C)).

• Uniqueness: Set representing unique inputs, that is,only cases that exist in only one input dependency(Figure 4(D)).

Set operation rose types are useful for advanced correla-tions, such as between different visual query branches. Forexample, in the case study below, the analyst uses an inter-section rose to see whether any of the high value houseshe has identified in one visual query also are present inthe high acreage subset he derives in another (Figure 7).

Additional rose types representing other, more complexoperations can added to the DataMeadow framework.

Viewer elementsViewers are sinks that accept input and have nooutput dependencies, typically changing their visual

Figure 4 DataRose type icons. (A) Database (source). (B)Union. (C) Intersection. (D) Uniqueness.

representation to reflect the incoming data. They areuseful for studying the results of more complex queriesinvolving DataRoses. Lacking the complexity of trans-former elements such as the DataRose, they are alsosuitable for inclusion in reports and presentations.

The following viewer elements are supported by theDataMeadow canvas:

• Quantity barchart: Shows the relative amount of casescoming in from the different dependencies as a bar-chart.

• Quantity piechart: Same as the above, but using a piechartrepresentation.

• Linear histogram: Data distribution of each dimensionshown as a standard linear histogram.

Examples of viewer elements can be seen in Figures 5(barchart and piechart).

Annotation elementsAnnotations are sink elements whose primary purpose isto support the communication requirement by providinga way for the analyst to incrementally annotate findingsusing free-text messages and media. Because they aresinks, an annotation object typically has inbound depen-dencies, and can thus present reports on the data. Thefollowing annotation element types are supported in theDataMeadow:

• Labels: Names and labels to denote a specific element oranalysis result.

• Notes: Longer textual descriptions (more than a singleline).

Information Visualization

Page 9: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al26

Figure 5 The DataMeadow prototype implementation. The main panel shows the visualization canvas and the smaller panels tothe right show the available and currently visualized dimensions in the data.

• Images: Bitmap images to illustrate particular elementsor analysis results.

• Reports: Textual reports of the incoming data, suchas average, minima and maxima, etc. Automaticallyupdated as the data changes.

ImplementationThe DataMeadow application was implemented using theC# programming language and the Microsoft .NET frame-work. The application uses the Tao (http://taoframework.com/) C# bindings for OpenGL to get access to both 2Dand 3D accelerated graphics functionality but no specialvisualization toolkit was used. The interface componentswere realized using the Windows Forms toolkit.

The prototype implementation has been optimized todeliver interactive framerates even for very large data sets(more than 500,000 data cases). On the data managementside, we achieve this using cascaded data tables, datacolumns, and filters inspired by the InfoVis toolkit47 andsoftware design patterns for information visualization.48

On the rendering side, we render the color histogram andopacity band modes using hardware-accelerated OpenGLtriangle rendering; parallel coordinate rendering has amuch larger performance overhead and is discouragedfor data sets of this size (more than 100,000 entities).

The DataMeadow application supports loading ofcomma-separated spreadsheet files (commonly havingthe .csv suffix) based on an XML-encoded data format. Aspecial loader has been added to the application to allowfor loading the Public Use Microdata Sample (PUMS) dataset from the US Census data.

As can be seen from Figure 5, the prototype imple-mentation has three distinct interface parts: (i) a mainvisualization window, (ii) a dimension selection part(upper right), and (iii) a currently visible dimension part(lower right). The main visualization window is a contin-uously zoomable viewport into the infinite 2D canvasrepresenting the DataMeadow. Users can zoom and panacross the whole canvas using simple mouse interactions.The dimension selection interface boxes allow the user toselect which dimensions in the data format to visualize– this can be dynamically changed, hiding or showingdimensions as necessary.

Dynamic queriesSince the axis-filtering dynamic queries are so centralto the DataMeadow application, special attention hasbeen devoted to optimizing these. The mechanism mayfilter potentially hundreds of thousands of data casesfor every update, putting severe requirements on theimplementation.

Information Visualization

Page 10: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al27

Following the example of the InfoVis Toolkit,47 ourimplementation assigns one bit for every axis-filtereddimension per data case. If a particular row falls withinthe query for a particular axis, the bit will be set. Updatingthe query for one axis will only cause recomputation ofthe particular bit corresponding to that axis, resulting inperformance linear to the number of rows (data cases).When computing its output, the DataRose only propa-gates cases whose bits are all set.

Tanin et al.49 describe additional optimization tech-niques for dynamic queries, but these have so far not beennecessary for interactive performance in the DataMeadowimplementation.

Interaction techniquesThe DataMeadow implementation provides a numberof interaction techniques (supporting the interactionrequirement of the section General):

• Mouse navigation: The viewport can be panned bypressing the center mouse button and dragging, orzoomed in or out by pressing the right mouse buttonand dragging.

• Brushing: Selecting a data case in one DataRose will high-light the case in all of its appearances in other DataRoses(parallel coordinates only).

• Keyboard accelerators: Common functionality can beaccessed using keyboard hotkeys.

• Mouse gesture detection: The user can perform complexmouse gestures on the canvas to create new set opera-tion DataRoses.

The keyboard accelerators and mouse gesture supportallows the analyst to construct visual queries withouthaving to leave the visualization window to access menuoptions or toolbar buttons. For example, drawing aU-shaped gesture on the canvas will create a union rose,and an upside-down U will create an intersection rose.

While interface shortcuts, such as mouse gesturesand keyboard accelerators, are hard to learn and lackvisibility,50 they are important factors for supporting theinteraction requirement (R1) of the DataMeadow system.

Layout mechanismsThe DataMeadow canvas lends itself nicely to employinga number of layout mechanisms for arranging the rosesand their dependencies. A number of simple layouts suchas circle, grid, and dependency depth order have beenimplemented. A more complex physically based layoutscheme using springs and dampers can also be employed.The ambition is to provide semi-automatic layout (akinto31) to aid the user in organizing the visual elements.

Format and data set builderAs mentioned above, the current DataMeadow imple-mentation supports loading comma-separated spread-sheet (.csv) files as well as the US Census PUMS data set.

<?xml ...?>

<dm:format>

<dm:dimension id="Name" index="0" type="4"/>

<dm:dimension id="Age" index="1" type="0"/>

<dm:dimension id="Major" index="2" type="0">

...

<dm:value val="5" id="ComputerScience"/>

...

</dm:dimension>

<dm:dimension id="Gender" index="3" type="0">

<dm:value val="0" id="Male"/>

<dm:value val="1" id="Female"/>

</dm:dimension>

<dm:dimension id="GPA" index="4" type="0"/>

<dm:dimension id="Year" index="5" type="1"/>

</dm:format>

Figure 6 Sample data format for a university student database.

However, the application requires a data format to be ableto process and visualize data sets correctly. Data formatsare stored as XML files (Figure 6 shows an example).Manually building such a format can be difficult as wellas tedious, particularly for complex data sets.

To remedy this problem, we have implemented apreprocessing format and data set builder in Java. Theapplication has two different purposes:

1. To build a data format using a large set of raw data setsas input; and

2. To build structured data set files using a data format.

The first task simply involves loading all (or at least a repre-sentative subset) of the data sets (as comma-separated textfiles) that will be included in the analysis. The applicationparses all of the dimensions found in the different datasets and automatically determines the type and domainof each dimension. The user can then edit these settingsand select which of the available dimensions should beincluded in the final data format. The application alsosupport elimination of synonyms as well as special trans-formations, such as splitting a compound dimension intoseveral separate dimensions. Finally, the user can save anXML representation of the computed data format for usein the DataMeadow application.

The second task is performed after the user has acomplete data format, and takes the raw data (as comma-separated files) and builds structured data tables that

Information Visualization

Page 11: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al28

Figure 7 Two visual query branches (value and acreage) crossed using an intersection rose (brown).

conform to the format. In the process, the applicationcan join several input tables into a single output tableusing a primary key, such as when merging two separatetables for the same entity but with different dimensionsinto a single, coherent table.

Case studiesWe illustrate the use of the DataMeadow using twocase studies: a classic demographics analysis and a moreeveryday task of finding a suitable digital camera.

US census dataLet us follow a fictitious analyst (Alan) who is using theDataMeadow to study the Public Use Microdata Sample(PUMS 1%) of the US Census data from 2000. The proto-type implementation has support for loading data formatsbased on either the person or housing records of the PUMSdata set. This allows Alan to easily select and load thedatabase file for a specific state into the application. Alanis interested in studying the PUMS housing records, so hefirst loads the housing data format. He then decides tostart his analysis in the state of Vermont, so he loads thisdata set into the application.

Upon finishing loading, Alan is presented with anempty DataRose representing the Vermont data set,

containing 3151 entries. First, he selects which of the 18dimensions in the database he wants to display, optingfor build year, number of rooms, number of bedrooms,acreage, value, and owner income. He quickly creates adata flow chain by right-clicking and dragging on theVermont rose to create a first derived rose, and thenagain on the new rose to create a second. He will use thefirst derived rose for filtering, and the second to view theresults, so he labels them accordingly. Finally, he createsa barchart viewer and connects the Vermont rose to theresult rose so that he can easily observe size ratios ashe explores the data. See Figure 1 for his starting setup.

Now Alan is free to get a feeling for the data by changingthe filter selection on the filter rose. He does this byclicking and dragging on the DQ handles on this roseand observing the visual results in the results rose as wellas the barchart. He is able to quickly confirm some thingsthat he already knows: for instance, that high value andhigh acreage implies many rooms and bedrooms.

Next, Alan wants to start a new line of reasoning, sohe creates a second two-element chain of derived rosesfrom the Vermont database. He is free to leave his firstquery undisturbed. He decides to remove the number ofbedrooms dimension and instead look at the number ofpersons in the household. Feeling that he may be on tosomething, he decides to cross the results of the first querywith the results of the second. In order to do so, he creates

Information Visualization

Page 12: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al29

Figure 8 Comparing person data for houses of Vermont (upper branch) and New York (lower branch).

an intersection rose and connects the two queries to it.This rose will now show the houses from the originaldata set that are part of both results from the two sepa-rate queries. See Figure 7 for the state of his DataMeadowcanvas.

Alan decides to bring the state of New York into thepicture to contrast against Vermont. He gets rid of thesecond query branch and loads the New York data set,resulting in a second blue database rose. All dimensionaxes are automatically rescaled by the application to usethe same scaling factor, making it possible to directlycompare roses from two different data sets. Alan builds upa new query chain for New York and starts exploring thedata using axis filtering. By imposing the same constraintson the chains of both states, he can see differences in thedata sets. At one point, he notices that in Vermont state,a high number of persons in a household often implies alarge acreage, but that this is not at all the case for NewYork state. See Figure 8 for his final analysis result.

Digital camerasTo show how the DataMeadow tool might also beemployed by a normal (i.e. non-analyst) user, we will

follow Anna, who is trying to figure out which digitalcamera to buy. Anna has a number of different criteria forher new camera that she knows are conflicting and thatshe has to weigh against each other, but feels that neithervendors nor online review sites give her sufficient andreliable support in doing this. Therefore, she resolves touse the DataMeadow application to explore her options.

Lacking a coherent data set of digital cameras, Anna’sfirst step will be to prepare one for her analysis. Speakingto her vendor, she manages to get a collection of Excelspreadsheet files, one for each camera manufacturer. Intotal, these files contain data for 1088 cameras. However,the files are in an idiosyncratic format on both the avail-able dimensions and their domains. Furthermore, thevendor provides camera prices as separate Excel files thatwill have to be paired with the rest of the camera data.

Anna turns to the DataMeadow format builder (thesection Format and data set builder) to transform heridiosyncratic digital camera database into a coherent setof datafiles obeying a unified data format. Loading all ofthe manufacturer Excel files into the application, she isable to select the dimensions she wants to preserve for heranalysis as well as edit their domains to get rid of aliases.

Information Visualization

Page 13: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al30

Figure 9 Union DataRose (center) of 931 digital cameras from 11 different manufacturers.

Finally, she uses the builder application to join the pricefiles with the main camera data sets – using the cameramodel names as the key – to produce a single comma-separated data set file for each camera manufacturer.

She is now ready to begin her analysis. She starts theDataMeadow application, loads the newly derived dataformat XML file, and then loads all of the data sets intothe application. They appear as DataRoses on the visualcanvas, one per manufacturer. Since Anna is interested inlooking at all of the cameras in the data set, she createsa union DataRose and connects all of the manufacturerroses to it to create a rose representing the whole data set.See Figure 9 for an example of this.

Collapsing all of the 11 database roses into a union roserepresenting them all, she starts to build a visual query.She first derives a filter rose and a result rose by right-dragging on the ‘all cameras’ rose. Anna is an amateurornithologist, so she primarily wants a camera that hasa high zoom factor. She moves the filters for the ‘Zoomtele’ dimension to a high range (from 500 mm and up).Birdwatching is only a pastime for Anna, so she decidesthat she does not want to spend more than $500. Finally,since she is not interested in lugging around a large camerain the forest, she puts the upper limit on weight to be500 g.

Figure 10 shows Anna’s visual query. The resultingDataRose yields 14 different cameras that satisfy herdemands, and drilling down into the data gives her atable of these cameras. She finally settles for an OlympusSP-550 Ultra Zoom, a 7.1 megapixel compact SLR-likecamera with 18× (504 mm) optical zoom weighing in at460 g and a price around $449.

User studyWe conducted a qualitative expert review on our proto-type implementation. Our goal was to explore the capa-bilities of the method and get an idea of its utility. Thestudy involved two visualization researchers from thefield. Neither of the two had prior knowledge of the tool.

ProcedureWe structured our expert review based on the US Census2000 Public Use Microdata Sample (PUMS) data setand a set of questions to drive the visual exploration.Only four out of 50 available states in the PUMS dataset were included in the study. In total, there werenine open-ended questions divided into three differentgroups (inspired by the conceptual levels for situation

Information Visualization

Page 14: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al31

Figure 10 Visual query for the digital camera data set for a high-zoom and inexpensive camera weighing less than 500g.

Table 1 Categories and examples of open-ended questions about the US Census 2000 data employed in the expertreview

Conceptual level Example

Direct facts What is the average house value in Georgia?Comprehension Which state has the highest ratio of small and expensive houses?Extrapolation Is there a relation between fuel type and building size in Alaska?

awareness51): direct facts, comprehension, and extra-polation (Table 1). In this way, we hoped to be able toevaluate all aspects of our method.

Our two experts were introduced to the DataMeadowusing a tour of the system in which the experimentershowed its main features and analysis methods. This tourlasted 10 min. After that, the participants were allowed tofamiliarize themselves with the application. This sessiontypically lasted 10 min as well.

During the solving of the nine questions on the USCensus data set, the participants were instructed to followa think-aloud protocol. The experimenter collected obser-vations and comments by taking notes. Each evaluationsession lasted around one hour in total. At the end, weconducted an informal interview about their experienceusing the tool.

Results and discussionWe intentionally designed our nine questions to be of anopen-ended nature – we were not interested in quantita-tively recording the performance of our experts, but ratherto have them exercise all parts of the system and get theirfeedback on its utility. Still, both participants were ableto arrive at answers that they were satisfied with to allquestions.

The participants liked the free-form type of interactionand both remarked it was a good match to how onemight think about the analysis process. Being able to filterin situ on the dimension axes themselves correspondedwell to how the participants perceived multidimensionalfiltering. The ability to ‘play’ with the filter settings atdifferent levels in a dependency chain was often used

to both form of hypotheses and to inform the next lineof reasoning. This was also one of the key strengths ofthe platform that both participants emphasized, bothwhile performing the analysis as well as in the debriefinginterview.

Both participants thought that the opacity bands repre-sentation was the most efficient for general analysis. Insome cases, involving the distribution of mostly nominaldata (e.g. fuel type), the color histogram was used. Partici-pants remarked that this representation was often too darkbecause the data was often distributed rather evenly acrossthe dimensions, resulting in only the lower half of mostcolor scales to be used. None of the participants really likedthe parallel coordinate representation, remarking that it‘showed too much’ for the analysis task they were doing.

Some improvements that were pointed out were toinclude viewers with logarithmic scales to avoid onedata set dwarfing another, to be able to copy query filtersettings from one rose to another, and to be able to setcolor scales for individual data roses.

None of our participants during the session used mousegestures or keyboard shortcuts despite having seen themdemonstrated during the training session. We believe thisis due to the low visibility of this kind of shortcuts, andthat this might change for an expert-level user.

Conclusions and future workThis paper presents a visual analytics method calledthe DataMeadow for reasoning about multiple large-scale sets of multidimensional data. The primaryuser task supported by the method is comparison, a

Information Visualization

Page 15: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al32

high-level meta-task that requires a considerable numberof low-level user tasks such as retrieve value, correlation,and filtering. The method consists of an exploratory2D canvas and individual data sets called DataRoses.DataRoses are variable-dimension starplots that employa visual multivariate data representations to visualizethe data distribution along the coordinate axes beingdisplayed. To summarize, the contributions of this paperare the following:

• a highly interactive canvas (the DataMeadow) for multi-variate data analysis;

• a visual representation (the DataRose) based on axis-filtered parallel coordinate starplots that can be linkedtogether to form complex and dynamically updatedvisual queries; and

• results from a user study indicating that our method is auseful way to reason about and query multivariate data.

In the future, we expect to integrate additional visualrepresentations into the DataMeadow. Another inter-esting approach would be the use of both non-standardinput devices (e.g. stylii and pen-based interfaces) andoutput devices (large displays) for the application. Weare also currently planning an in-depth longitudinal fieldstudy using the tool with real analysts attempting to solvereal problems.

References1 Keim DA. Information visualization and visual data mining. IEEETransactions on Visualization and Computer Graphics 2002; 8: 1–8.

2 Thomas JJ, Cook KA, editors. Illuminating the Path: The Researchand Development Agenda for Visual Analytics. IEEE ComputerSociety Press: Silver Spring, MD, 2005.

3 Shneiderman B. Dynamic queries for visual information seeking.IEEE Software 1994; 11: 70–77.

4 Williamson C, Shneiderman B. The dynamic HomeFinder:Evaluating dynamic queries in a real-estate informationexploration system. In: Belkin N, Ingwersen P, Pejtersen A(Eds). Proceedings of the ACM SIGIR Conference on Research andDevelopment in Information Retrieval (Copenhagen, Denmark),ACM Press: New York, 1992; 338–346.

5 Viégas FB, Wattenberg M. Communication-minded visualization:A call to action. IBM Systems Journal 2006; 45: 801–812.

6 Cleveland WS. Visualizing Data. Hobart Press: Summit, NJ, USA,1993.

7 Chernoff H. Using faces to represent points in k–dimensionalspace graphically. Journal of the American Statistical Association1973; 68: 361–368.

8 Keim DA. Designing pixel-oriented visualization techniques:theory and applications. IEEE Transactions on Visualization andComputer Graphics 2000; 6: 59–78.

9 Keim DA, Hao MC, Dayal U, Hsu M. Pixel bar charts: avisualization technique for very large multi-attribute data sets?Information Visualization 2002; 1: 20–34.

10 Keim DA, Kriegel H-P. VisDB: Database exploration usingmultidimensional visualization. IEEE Computer Graphics andApplications 1994; 14: 40–49.

11 Johnson B, Shneiderman B. Tree maps: a space-filling approachto the visualization of hierarchical information structures. In:Nielson GM, Rosenblum L (Eds). Proceedings of the IEEE Conferenceon Visualization (San Diego, California, USA), IEEE ComputerSociety Press: Silver Spring, MD, 1991; 284–291.

12 LeBlanc J, Ward MO, Wittels N. Exploring N-dimensionaldatabases. In: Kaufman A (Ed). Proceedings of the IEEE Conferenceon Visualization (San Fransisco, California, USA), IEEE ComputerSociety Press: Silver Spring, MD, 1990; 230–237.

13 Shneiderman B. Tree visualization with treemaps: a 2-D space-filling approach. ACM Transactions on Graphics 1992; 11: 92–99.

14 Inselberg A. The plane with parallel coordinates. The VisualComputer 1985; 1: 69–91.

15 Inselberg A. Multidimensional detective. In: Dill J, Gershon N(Eds). IEEE Symposium on Information Visualization (Phoenix,Arizona, USA), IEEE Computer Society Press: Silver Spring, MD,1997; 100–107.

16 Fua Y-H, Ward MO, Rundensteiner EA. Hierarchical parallelcoordinates for exploration of large datasets. Proceedings of theIEEE Conference on Visualization (San Fransisco, California, USA),IEEE Computer Society Press: Silver Spring, MD, 1999; 43–50.

17 Bendix F, Kosara R, Hauser H. Parallel sets: a visual analysis ofcategorical data. In: Stasko J, Ward M (Eds). Proceedings of the IEEESymposium on Information Visualization (Minneapolis, Minnestoa,USA), IEEE Computer Society Press: Silver Spring, MD, 2005;133–140.

18 Sifer M. User interfaces for the exploration of hierarchical multi-dimensional data. In: Wong PC, Keim D (Eds). Proceedings ofthe IEEE Symposium on Visual Analytics Science & Technology(Baltimore, Maryland, USA), IEEE Computer Society Press: SilverSpring, MD, 2006; 175–182.

19 Cleveland WS, McGill ME, editors. Dynamic Graphics for Statistics.Statistics/Probability Series. Wadsworth & Brooks/Cole: PacificGrove, CA, USA, 1988.

20 Becker RA, Cleveland WS, Shyu M-J. The visual design and controlof trellis display. Journal of Computational and Graphical Statistics1996; 5: 123–155.

21 Shneiderman B. The eyes have it: A task by data type taxonomyfor information visualizations. Proceedings of the IEEE Symposiumon Visual Languages (Boulder, Colorado, USA), IEEE ComputerSociety Press: Silver Spring, MD, 1996; 336–343.

22 Yi JS, Kang Y, Stasko JT, Jacko JA. Toward a deeper understandingof the role of interaction in information visualization. In:van Wijk J, Stasko J (Eds). IEEE Transactions of Visualizationand Computer Graphics (Sacramento, California, USA), 2007; 13:1224–1231.

23 Ahlberg C, Williamson C, Shneiderman B. Dynamic queriesfor information exploration: an implementation and evaluation.In: Bauersfeld P, Bennett J, Lynch G (Eds). Proceedings ofthe ACM CHI’92 Conference on Human Factors in ComputingSystems (Monterey, California, USA), ACM Press: New York, 1992;619–626.

24 Theron R. Visual analytics of paleoceanographic conditions. In:Wong PC, Keim D (Eds). Proceedings of the IEEE Symposium onVisual Analytics Science & Technology (Baltimore, Maryland, USA),IEEE Computer Society Press: Silver Spring, MD, 2006; 19–26.

25 Becker RA, Cleveland WS. Brushing scatterplots. Technometrics1987; 29: 127–142.

26 Seo J, Shneiderman B. Interactively exploring hierarchicalclustering results. IEEE Computer 2002; 35: 80–86.

27 Xie Z, Huang S, Ward MO, Rundensteiner EA. Exploratoryvisualization of multivariate data with variable quality. In: WongPC, Keim D (Eds). Proceedings of the IEEE Symposium on VisualAnalytics Science & Technology (Baltimore, Maryland, USA), IEEEComputer Society Press: Silver Spring, MD, 2006; 183–190.

28 Wattenberg M. Visual exploration of multivariate graphs. In:Rodden T, Grinter R (Eds). Proceedings of the ACM CHI 2006Conference on Human Factors in Computing Systems (Montreal,Canada), ACM Press: New York, 2006; 811–819.

29 Yi JS, Melton R, Stasko J, Jacko J. Dust & Magnet: Multivariateinformation visualization using a magnet metaphor. InformationVisualization 2005; 4: 239–250.

30 Brodbeck D, Girardin L. Visualization of large-scale customersatisfaction surveys using a parallel coordinate tree. In: MunznerT, North S (Eds). Proceedings of the IEEE Symposium on InformationVisualization (Seattle, Washington, USA), IEEE Computer SocietyPress: Silver Spring, MD, 2003; 197–201.

Information Visualization

Page 16: PPL IVS 9500170 - gatech.edu

DataMeadow: multivariate visual analytics Niklas Elmqvist et al33

31 Wright W, Schroh D, Proulx P, Skaburskis A, Cort B. The sandboxfor analysis: Concepts and evaluation. In: Rodden T, Grinter R(Eds). Proceedings of the ACM CHI 2006 Conference on Human Factorsin Computing Systems (Montreal, Canada), ACM Press: New York,2006; 801–810.

32 Stolte C, Tang D, Hanrahan P. Polaris: a system for query, analysis,and visualization of multidimensional relational databases. IEEETransactions on Visualization and Computer Graphics 2002; 8:52–65.

33 Hsieh H, Shipman III FM. Manipulating structured informationin a visual workspace. In: Edwards K (Ed). Proceedings of the ACMSymposium on User Interface Software and Technology (Paris, France),ACM Press: New York, 2002; 217–226.

34 Hsieh H, Shipman III FM. VITE: A visual interface supporting thedirect manipulation of structured data using two-way mappings.In: Riecken D, Benyon D, Lieberman H (Eds). Proceedings of theInternational Conference on Intelligent User Interfaces (New Orleans,Louisiana, USA), ACM Press: New York, 2000; 141–148.

35 Brennan SE, Mueller K, Zelinsky G, Ramakrishnan IV, Warren DS,Kaufman A. Toward a multi-analyst collaborative framework forvisual analytics. In: Wong PC, Keim D (Eds). Proceedings of the IEEESymposium on Visual Analytics Science and Technology (Baltimore,Maryland, USA), IEEE Computer Society Press: Silver Spring, MD,2006; 129–136.

36 Yang D, Rundensteiner EA, Ward MO. Analysis guided visualexploration of multivariate data. In: Dill J, Ribarsky W (Eds).Proceedings of the IEEE Symposium on Visual Analytics Science andTechnology (Sacramento, California, USA), IEEE Computer SocietyPress: Silver Spring, MD, 2007; 83–90.

37 Nam EJ, Han Y, Mueller K, Zelenyuk A, Imre D. ClusterSculptor:A visual analytics tool for high-dimensional data. In: Dill J,Ribarsky W (Eds). Proceedings of the IEEE Symposium on VisualAnalytics Science & Technology (Sacramento, California, USA), IEEEComputer Society Press: Silver Spring, MD, 2007; 75–82.

38 Gotz D, Zhou MX, Aggarwal V. Interactive visual synthesis ofanalytic knowledge. In: Wong PC, Keim D (Eds). Proceedingsof the IEEE Symposium on Visual Analytics Science & Technology(Baltimore, Maryland, USA), IEEE Computer Society Press: SilverSpring, MD, 2006; 51–58.

39 Shiozawa H, Okada K, Matsushita Y. 3D interactive visualizationfor inter-cell dependencies of spreadsheets. In: Keim D, Wills G(Eds). Proceedings of the IEEE Symposium on InformationVisualization (San Fransisco, California, USA), IEEE ComputerSociety Press: Silver Spring, MD, 1999; 79–82.

40 Chi H, Barry P, Riedl J, Konstan J. A spreadsheet approach toinformation visualization. In: Dill J, Gershon N (Eds). Proceedingsof the IEEE Symposium on Information Visualization (Phoenix,Arizona, USA), IEEE Computer Society Press: Silver Spring, MD,1997; 17–24.

41 Tufte ER. Envisioning Information. Graphics Press: Cheshire, CT,1990.

42 Keim DA, Mansmann F, Schneidewind J, Ziegler H. Challenges invisual data analysis. Proceedings of the Tenth International Conferenceon Information Visualization, IEEE Computer Society Press: SliverSpring, MD, 2006; 9–16.

43 Amar RA, Eagan J, Stasko JT. Low-level components of analyticactivity in information visualization. In: Stasko J, Ward M (Eds).Proceedings of the IEEE Symposium on Information Visualization(Minneapolis, Minnesota, USA), IEEE Computer Society Press:Silver Spring, MD, 2005; 111–117.

44 Wehrend S, Lewis C. A problem-oriented classification ofvisualization techniques. In: Kaufman A (Ed). Proceedings of theIEEE Conference on Visualization (San Fransisco, California, USA),IEEE Computer Society Press: Silver Spring, MD, 1990; 139–143.

45 Rosario GE, Rundensteiner EA, Brown DC, Ward MO, Huang S.Mapping nominal values to numbers for effective visualization.Information Visualization 2004; 3: 80–95.

46 Levkowitz H, Herman GT. Color scales for image data. IEEEComputer Graphics and Applications 1992; 12: 72–80.

47 The InfoVis Toolkit, Fekete J-D. In: Ward M, Munzner T (Eds).Proceedings of the IEEE Symposium on Information Visualization(Austin, Texas, USA), IEEE Computer Society Press: Silver Spring,MD, 2004; 167–174.

48 Heer J, Agrawala M. Software design patterns for informationvisualization. IEEE Transactions on Visualization and ComputerGraphics 2006; 12: 853–860.

49 Tanin E, Beigel R, Shneiderman B. Incremental data structures andalgorithms for dynamic query interfaces. SIGMOD Record 1996;25: 21–24.

50 Grossman T, Dragicevic P, Balakrishnan R. Strategies foraccelerating on-line learning of hotkeys. In: Begole B, Payne S,Churchill E, Amant R, Gilmore D, Rosson MB (Eds). Proceedingsof the ACM CHI Conference on Human Factors in Computing Systems(San Jose, California, USA), ACM Press: New York, 2007; 1591–1600.

51 Endsley MR, Bolté B, Jones DG. Designing for Situation Awareness:An Approach to User-Centered Design. CRC Press: Boca Raton, FL,2003.

Information Visualization