Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf ·...

8
Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data Yalong Yang 1* Kang Zhang 1,2† Quang Vinh Nguyen 3‡ Jianrong Wang 1 School of Computer Software, Tianjin University, Tianjin, China 2 Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083, USA 3 MARCS Institute & School of Computing, Eng. and Maths, University of Western Sydney, Sydney, Australia 4 School of Computer Science and Technology, Tianjin University, Tianjin, China ABSTRACT Treemaps are well-known for visualizing hierarchical data. Most approaches have been focused on layout algorithms and paid little attention to other graphic properties. Furthermore, the structural information in Treemaps is too implicit for viewers to perceive. This paper presents Cabinet Tree, an approach that: i) draws branches explicitly to show relational structures, ii) adapts a space-optimized layout for leaves and maximizes the space utilization, iii) uses coloring strategies to clearly reveal patterns and contrast different attributes intuitively. Our quantitative evaluations demonstrate that Cabinet Tree achieves good scalability for increased resolutions and datasets, high proximity of sibling nodes and real-time performance. Keywords: orthogonal enclosure, tree drawing, hierarchical visualization, cabinet tree, big data. Index Terms: H.5.2 [Information Interfaces and Presentation]: User Interfaces—Graphical user interfaces (GUI), screen design; I.3.6 [Computer Graphics]: Methodology and Techniques—Graphics data structures and data types; I.3.8 [Computer Graphics]: Applications 1 I NTRODUCTION Much of data we use today has a hierarchical structure. Examples of hierarchical structures include university-department structure, family tree, library catalogues and so on. Such structures not only play significant roles in their own right, but also provide means for representing a complex domain in a manageable form. Current GUI tools, such as traditional node-link diagrams or file browsers, are an effective means for users to locate information, but information embedded in large hierarchies can be harder to find unless the nodes/branches are systematically laid out [1]. Unfortunately in the real world, hierarchical structures are often very large with thousands or even millions of elements and relationships. Therefore, a capability of visualizing the entire structure while supporting deep exploration at different levels of granularity is urgently needed for effective knowledge discovery [2]. Enclosure or space-filling visualization, such as Treemaps techniques [1][3][4], visualizes large tree structures through enclosure partitioning for space utilization. Treemaps have been considered a successful method for visualizing large hierarchical data sets with attributed properties. The Treemap algorithm works by dividing the display area into a nested sequence of rectangles * e-mail:[email protected] e-mail:kzhang@{tju.edu.cn, utdallas.edu} e-mail:[email protected] § e-mail:[email protected] whose areas correspond to an attribute of the dataset, effectively combining aspects of a Venn diagram and a pie chart [3]. The set of rectangles forms the layout of a Treemap. Voronoi Treemap [5][6] shows a new way to visualize things in arbitrary polygons rather than rectangles, but only with simple relationships between items. Interactive control allows users to specify the presentation of both structural and content information [7]. This is in contrast to traditional static methods of displaying hierarchically structured information, which generally makes either poor use of display space or hide much information useful to users. Originally designed to visualize files on a hard drive [8], Treemaps have been applied to a wide variety of areas ranging from financial analysis, sport reporting [9], image browsing [10] and software and file system analysis [11]. Although space-filling is useful for visualizing large hierarchies, most of the Treemap techniques do not show the relational structure explicitly. It requires extra cognitive effort for viewers to perceive and understand the relational structures that are implicit in the enclosure [12]. This paper presents a space-filling technique, called Cabinet Tree, for visualizing big hierarchical data. Our contributions include the following aspects in the visualization design of Cabinet Tree: Orthogonal and explicit drawing of branches and space-optimized layout for leaves, generating a highly compact view; Scalable visualization suitable for big data (including hundreds of thousands of nodes) and increased resolutions, measured in terms of visible nodes; A contrast-enhanced color strategy and color-coded sorting of leaves to reveal visual patterns; Quantitative evaluations to support the design choices for the above aspects and to compare with similar Treemap approaches. The remainder of the paper is organized as follows. Section 2 discusses related work of other hierarchical visualization and exploration techniques. Section 3 gives a detailed description of Cabinet Tree. Section 4 presents case studies that use Cabinet Tree to visualize file systems. Section 5 discusses the quantitative evaluation with comparison between our method and other hierarchical visualization techniques. The final section concludes our work as well as future directions. 2 RELATED WORK The best-known enclosure approach is Treemap, first proposed by Johnson and Shneiderman in 1991, called Slice and Dice Treemap (S&D Treemap) [4]. Variations of Treemap include Squarified Treemap [1], Ordered Treemap [13], Quantum Treemap [3] and etc. A brief and concise definition of treemap-like layout is [14]: nodes are represented as nested rectangles and their layout reveals the structure of the tree. The size of such a rectangle can be used to represent a quantitative attribute. A node’s color or

Transcript of Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf ·...

Page 1: Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf · Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data ...

Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing andExploring Big Data

Yalong Yang1∗ Kang Zhang1,2† Quang Vinh Nguyen3‡ Jianrong Wang4§

1School of Computer Software, Tianjin University, Tianjin, China2Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083, USA

3MARCS Institute & School of Computing, Eng. and Maths, University of Western Sydney, Sydney, Australia4School of Computer Science and Technology, Tianjin University, Tianjin, China

ABSTRACT

Treemaps are well-known for visualizing hierarchical data. Mostapproaches have been focused on layout algorithms and paidlittle attention to other graphic properties. Furthermore, thestructural information in Treemaps is too implicit for viewers toperceive. This paper presents Cabinet Tree, an approach that: i)draws branches explicitly to show relational structures, ii) adaptsa space-optimized layout for leaves and maximizes the spaceutilization, iii) uses coloring strategies to clearly reveal patternsand contrast different attributes intuitively. Our quantitativeevaluations demonstrate that Cabinet Tree achieves good scalabilityfor increased resolutions and datasets, high proximity of siblingnodes and real-time performance.

Keywords: orthogonal enclosure, tree drawing, hierarchicalvisualization, cabinet tree, big data.

Index Terms: H.5.2 [Information Interfaces and Presentation]:User Interfaces—Graphical user interfaces (GUI), screendesign; I.3.6 [Computer Graphics]: Methodology andTechniques—Graphics data structures and data types; I.3.8[Computer Graphics]: Applications

1 INTRODUCTION

Much of data we use today has a hierarchical structure. Examplesof hierarchical structures include university-department structure,family tree, library catalogues and so on. Such structures not onlyplay significant roles in their own right, but also provide means forrepresenting a complex domain in a manageable form. Current GUItools, such as traditional node-link diagrams or file browsers, arean effective means for users to locate information, but informationembedded in large hierarchies can be harder to find unless thenodes/branches are systematically laid out [1]. Unfortunatelyin the real world, hierarchical structures are often very largewith thousands or even millions of elements and relationships.Therefore, a capability of visualizing the entire structure whilesupporting deep exploration at different levels of granularity isurgently needed for effective knowledge discovery [2].

Enclosure or space-filling visualization, such as Treemapstechniques [1][3][4], visualizes large tree structures throughenclosure partitioning for space utilization. Treemaps have beenconsidered a successful method for visualizing large hierarchicaldata sets with attributed properties. The Treemap algorithm worksby dividing the display area into a nested sequence of rectangles

∗e-mail:[email protected]†e-mail:kzhang@{tju.edu.cn, utdallas.edu}‡e-mail:[email protected]§e-mail:[email protected]

whose areas correspond to an attribute of the dataset, effectivelycombining aspects of a Venn diagram and a pie chart [3]. The setof rectangles forms the layout of a Treemap. Voronoi Treemap[5][6] shows a new way to visualize things in arbitrary polygonsrather than rectangles, but only with simple relationships betweenitems. Interactive control allows users to specify the presentationof both structural and content information [7]. This is in contrastto traditional static methods of displaying hierarchically structuredinformation, which generally makes either poor use of displayspace or hide much information useful to users. Originally designedto visualize files on a hard drive [8], Treemaps have been appliedto a wide variety of areas ranging from financial analysis, sportreporting [9], image browsing [10] and software and file systemanalysis [11].

Although space-filling is useful for visualizing large hierarchies,most of the Treemap techniques do not show the relational structureexplicitly. It requires extra cognitive effort for viewers to perceiveand understand the relational structures that are implicit in theenclosure [12]. This paper presents a space-filling technique,called Cabinet Tree, for visualizing big hierarchical data. Ourcontributions include the following aspects in the visualizationdesign of Cabinet Tree:

• Orthogonal and explicit drawing of branches andspace-optimized layout for leaves, generating a highlycompact view;

• Scalable visualization suitable for big data (includinghundreds of thousands of nodes) and increased resolutions,measured in terms of visible nodes;

• A contrast-enhanced color strategy and color-coded sorting ofleaves to reveal visual patterns;

• Quantitative evaluations to support the design choices forthe above aspects and to compare with similar Treemapapproaches.

The remainder of the paper is organized as follows. Section2 discusses related work of other hierarchical visualization andexploration techniques. Section 3 gives a detailed descriptionof Cabinet Tree. Section 4 presents case studies that useCabinet Tree to visualize file systems. Section 5 discusses thequantitative evaluation with comparison between our method andother hierarchical visualization techniques. The final sectionconcludes our work as well as future directions.

2 RELATED WORK

The best-known enclosure approach is Treemap, first proposed byJohnson and Shneiderman in 1991, called Slice and Dice Treemap(S&D Treemap) [4]. Variations of Treemap include SquarifiedTreemap [1], Ordered Treemap [13], Quantum Treemap [3] andetc. A brief and concise definition of treemap-like layout is[14]: nodes are represented as nested rectangles and their layoutreveals the structure of the tree. The size of such a rectangle canbe used to represent a quantitative attribute. A node’s color or

Page 2: Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf · Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data ...

other graphical properties could also be used to represent otherattributes of the node. Treemap-like visualizations are capable ofshowing a large number of nodes because they can mostly ensure100% use of the space. However, they show the relationships(or structures) implicitly which would cost viewers extra cognitiveeffort to understand the relations between the nodes.

Figure 1: An example of enclosure approaches (S&D Treemap [4])

There are also approaches that aim at combining both thenode-link style and enclosure concept into one visualization.Sunburst conveys both area and structure using a circular or radialspace-filling layout methodology [15], but much space is stillwasted The Space-Optimized Tree uses an explicit node-link stylewhile each non-leaf node subdivides its area for its child nodes[12], however, the varied shaped polygons and their relationshipsare hard to perceive. Elastic Hierarchies [16] combines node-linkstyle and Treemap in a straightforward manner, but space utilizationremains an issue.

Figure 2: Sunburst (top-left) [15], Space-Optimized Tree (top-right)[12] and Elastic Hierarchies (bottom) [16]

Apart from the layout algorithm, the use of graphicalattributes on hierarchical visualizations, or also called attributedvisualizations, has been investigated. In 1992, Shneidermanproposed color coding in Treemap could present different types offiles, owners of programs, frequency of use or the age of the file[8] and a serious of colors could be used for continuous valuedattributes [17]. Cushion Treemap [18] uses intuitive shading toprovide insights in hierarchical structures. Labeling is an importantaspect of tree visualization for effective applications. For example,in a labeled Treemap, parent nodes make room for their names andborders to allow users to quickly find the data they are interestedin, improving over the conventional Treemap [19]. Meanwhile,labels guide the navigation in typical Treemaps [14]. A 3Dversion of Treemap could be made 3-dimensional where childnodes are colored according to their last update time and givenheights according to their access frequencies [20]. Fractal Tree[21] uses different colors to present the depths of nodes. Cascaded

Treemap [22] takes advantages of 2.5D techniques to enhance thepresentation of containment relationships.

As an important application issue, scalability refers to thecapability of effectively displaying large amounts of data [23].Screen resolutions become the limiting factor for scalablevisualizations. Larger displays with higher resolutions are beingdeveloped for visualization [24] (e.g. the large wall at the AT&TGlobal Network Operations Center [25]). Therefore, scalability forhigh resolutions and large data sets become crucial for visualizingbig data.

Our previous work on Drawer Tree [26] aims at combining theconnection and enclosure techniques to achieve a good presentationof the structural information. Drawer Tree uses lines to presentleaves, which are hard to perceive. Having much waste betweenbranches, its space utilization is not optimal.

Figure 3: Drawer Tree visualization of Eclipse folder [26]

3 CABINET TREE

To support effective visualization of big hierarchical data, weconsider the following design criteria:

i Balanced trade-off between space utilization and clarity ofrelational structures, suitable for the rectangular displays ofboth mobile and desktop screens;

ii Scalable for increased resolutions and data sizes;iii High proximity of sibling nodes for quick visual grouping;iv Fast layout algorithm.

We call our approach Cabinet Tree which resembles objectsstacked in a large cabinet. Branches of the tree partition the cabinetwhile leaves fill the remaining partitioning space.

The node-link style displays the relationships of informationexplicitly, but most of the pixels are wasted as background. WhileTreemaps ensure 100% use of the space, the implicit view costsmore cognitive effort to understand their structures. CabinetTree aims at balancing space utilization and clarify of structuralinformation. Our design follows the enclosure concept to maximizespace utilization and the 2D orthogonal concept of S&D Treemapto enable a fast cognitive process [3]. It draws branches explicitly toreveal their hierarchical relationships. Moreover, a space-optimizedlayout algorithm is applied to leaves to obtain a compact view andhigh proximity of sibling nodes. Cabinet Tree is generally scalablefor increased resolutions and data sizes as demonstrated by ourexperiments.

Comparison of attributes within a hierarchical structure is crucialfor many applications, however, rectangles in S&D Treemap aredifficult to compare [1]. In Cabinet Tree, a contract-enhancedcolor strategy is applied to the mapping of attributes for viewersto compare them easily. We apply color-coded sorting to dataitems to reveal visual patterns among groups of related nodes. Theremaining part of this section addresses the design rationale andrealization of Cabinet Tree.

3.1 Interleaved Horizontal-Vertical PartitioningThe drawing starts from the bottom horizontal line that representsthe root of the tree. Level-1 branches are drawn as verticallines partitioning the space above the root. Level-2 branches

Page 3: Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf · Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data ...

partition the space horizontally between two neighboring level-1lines, representing the children of the vertical line on the left.Similarly, level-3 branches partition the space vertically betweentwo neighboring level-2 lines, the partitioning process continuesdown to the lowest level, which can be leaves or empty branches.Leaves occupy the remaining space within the surrounding levellines. In the following discussion, we will generally call a branchor a leaf as a node.

The partitioning process is demonstrated in Figure 4. The topfigure shows a tree in the node-link style where rectangles arebranches and circles are leaves. Each node is labeled uniquely bya letter followed by its weight. The bottom figure demonstrates thesame tree drawn as a Cabinet Tree.

In typical hierarchical visualization, the type of a node is usuallydetermined by whether it has children. The nodes with children areconsidered parent nodes and those without as child nodes. Thereare, however, applications that need to visualize nodes with nochildren, for instance, empty folders in a file system, and catalogueswith no products in an e-business system. Cabinet Trees allowbranches to have no leaves, as noted in Figure 4 where G1, I1, J1,and K1 are leaf-less branches.

Figure 4: Interleaved horizontal-vertical partitioning

The layout algorithm is outlined in Algorithm 1, that isconceptually recursive but implemented iteratively:

Algorithm 1 Partitioning1: procedure PARTITION BRANCH2: Calculate thickness for B3: For each sub branch B-1:4: Calculate space for B-1 according to its weight5: Partition branch(B-1)6: Divide the remaining space into slices of equal widths

enough for all leaves in B7: For each leaf L of B:8: Calculate for the space of L according to its weight9: Fill L into the next slice(s) available

The algorithm follows the partitioning concept of S&D Treemapfor branches and adopts a space-optimized approach to allocatespace for leaves. The time complexity of this algorithm is linear(O(n)) where n is the number of nodes, since it calculates thelayout for each node only once. It is therefore suitable for real-timevisualization of big hierarchical data.

3.2 WeightIn an enclosure approach, the weight (w) of a node may encode aquantitative attribute represented by the node, such as the number

of clicks on a web link, size of a folder, and etc. We combine anode’s depth in the hierarchy with one of the node’s quantitativeattributes as its weight (w) using Formula (1).

w = Attr ∗Cl (1)

where C determines the trend of weight in terms of the depth level(l).

3.3 BranchesBranches are orthogonally drawn with decreasing thicknesses fromthe root to the lowest level. The exact thickness of a branch isdetermined by the available space to the branch. Once the spacefor a branch is allocated, its thickness is calculated according to thespace allocated for the branch with a minimal and maximum limits.All the branches are expanding upward vertically and rightwardhorizontally.

The visual comparison between implicit branches and explicitbranches in Figure 5 demonstrates that explicit branches makerelational structures easier for the viewer to perceive.

Figure 5: Implicit branches (top) and explicit branches (bottom) with4245 nodes in Eclipse application folder

To evaluate whether the strategy of decreasing thicknesses wouldsignificantly impact on the number of visible nodes, i.e. thescalability, we set all the branches at a constant thickness of 1 pixel.Figure 6 shows the number of visible nodes for varied and fixed (1pixel) branch thicknesses on various screen resolutions for the samedataset containing 530,290 nodes. As Figure 6 demonstrates, thedifference is insignificant, we therefore decide to use the varied anddecreasing thicknesses, which provides a better visualization.

Figure 6: Number of visible nodes with increasing resolutions fordifferent thickness strategies

3.4 LeavesThe space allocated for each leaf is calculated according to itsweight and the space available from its branch. Each node is placednext to its siblings to achieve a high space utilization and also highproximity of sibling nodes (to be evaluated in Section 5.1).

Page 4: Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf · Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data ...

Interleaved horizontal-vertical partitioning may result in eithervertical or horizontal leaves. The strategies for allocating bothhorizontal and vertical leaves are illustrated in Figure 6, whereLeaf1.1.1, Leaf1.1.2, Leaf1.1.3 and Leaf1.1.4 are leaves underBranch1.1 (left), and similarly Leaf3.1.1, Leaf3.1.2, Leaf3.1.3 andLeaf3.1.4 are under Branch3.1 (right).

Figure 7: Allocation of leaves

3.5 Color-Coded SortingWith sorted data, viewers can locate the needed items and see thequantitative differences easily. The attribute to be sorted could beeither discrete (e.g. file type, owners) or continuous (e.g. size,time). Figure 8 compares the method of separate coloring andsorting with the one that combines coloring and sorting. Wecombine color-coding and sorting on a given attribute to reinforcethe perception of grouped leaves of similar values.

Figure 8: Separate coloring and sorting (top) and combined coloringand sorting (bottom) with 2,211 leaves in a file system

3.6 Contrast-Enhanced Color StrategyWhen assigning colors to continuous valued attributes of leaves,the conventional method is linear mapping. However, if attributevalues are concentrated on two ends, this method generates a poorview with only the lightest and darkest colors as illustrated in Figure9.

Figure 9: A case not suitable for linear color mapping

To address the problem of polarizing attribute values, we usea contrast-enhanced strategy to post-process the results of linear

mapping. The process is demonstrated in Figure 10. Figure 10ashows the original color distribution among the nodes. We first seta threshold, and any node whose intensity less than the thresholdwill be recorded and removed as illustrated in Figure 10b. Wethen spread the remaining intensity values to fit the entire rangeproportionally, as shown in Figure 10c. Finally, the removed nodesare added back to the distribution, as in Figure 10d. This processgenerates much smoother color distribution for leaves than thelinear mapping.

Figure 10: Process of contrast-enhanced color strategy

We use HSI color space to present the color of each branch, thedeeper of the level is, the lighter the intensity.

The visual comparison between the linear mapping andcontrast-enhanced color strategies is demonstrated in Figure 11.

Figure 11: Visualization of 77,402 files and 7,850 directories inMiKTex folder. Linear color mapping (top) and contrast-enhancedcolor strategy (bottom) with color-coding on modified times, thedarker the older the file is

3.7 LabelingText labels are not pre-attentive but are important to provide thecontext in which visualized data appear. Labeling every itemstatically on a dense visualization is unrealistic [17], we thereforelabel only the nodes with enough spaces. To suit easy reading, anode’s label is horizontally placed within the node if there is anenough space and the label’s size is made proportional to the node’ssize. For a vertical node that has no space to fit its label horizontally,the label is then placed vertically.

The color of a text label is crucial for the label to stand out fromthe background. We use YUV color space to generate the colorfor labels, because YUV works well for optimal foreground andbackground detection [27][28][29]. Assuming that Y representsa node’s overall brightness or luminance, we compare Y with athreshold, and label it black or white depending on whether Y isless or larger than the threshold. This strategy ensures a high colorcontrast for labeling.

Page 5: Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf · Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data ...

4 CASE STUDIES

We use Cabinet Tree to visualize file systems to illustrate itseffectiveness and scalability. File types are color-coded as shownin Figure 13.

Figure 13: Color coding for different file types

Cabinet Tree first visualizes the file system of “CodeBlocksApplication” which contains 2,673 files and 262 directories, asshown in Figure 14. Figure 14a uses the file count as the weight,clearly showing that “CodeBlocks Application” consists of 4 majorparts:

• Binary files (in Moderate Pink) and source code files (in SoftRed),

• Media files (in Soft Orange),• Small folders (bundled in Green) and files with unspecified

types (in Vivid Yellow), and• Compressed files (in Light Yellow).

Figure 14b uses the file size as the weight, and reveals that binaryfiles (in Moderate Pink) takes the largest space.

Another example shown in Figure 12 is “Visual Studio2012 Windows Application” folder with 36,722 files and 7,399directories (including 16 empty ones), weighted on file counts.Figure 14 clearly shows that several folders share similar

Figure 14: Cabinet Tree visualizing Codeblocks, (a) node sizesweighted on file counts and (b) weighted on file sizes

structures in this dataset. For example, “ProjectTemplates”and “ProjectTemplatesCache” subfolders have similar structures;“ItemTemplates” and “ItemTemplatesCache” subfolders sharesimilar structures. By manually inspecting these folders, wefind that “ProjectTemplatesCache” is the backup folder for“ProjectTemplates” and “ItemTemplatesCache” is the backupfolder for “ItemTemplates”. Original and backup folders shouldof course be similar.

Cabinet Tree is able to handle big data with hundreds ofthousands of nodes. For example, Figure 15 presents a C drivecontents on a personal computer with 455,940 files and 74,350directories (including 3,520 empty directories).

Figure 12: Cabinet Tree visualizing 36,722 files and 7,399 directories in Visual Studio 2012 Folder

kzhang
Highlight
kzhang
Sticky Note
change to "14"
kzhang
Highlight
kzhang
Sticky Note
12
kzhang
Highlight
kzhang
Highlight
kzhang
Sticky Note
13
kzhang
Highlight
kzhang
Highlight
kzhang
Highlight
kzhang
Highlight
kzhang
Sticky Note
14
Page 6: Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf · Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data ...

Figure 15: Cabinet Tree visualizing 455,940 files and 74,350 directories in C Drive

5 EVALUATION AND DISCUSSION

Having implemented Cabinet Tree using JavaScript, we compareit with Squarified Treemap and S&D Treemap both built onData-Driven Documents (D3js) [30] which also uses JavaScript.The “round” configuration of D3js Treemap was set to “true”to ensure every node to occupy an integer number of pixels.All the experiments ran on Google Chrome Browser Version36.0.1985.125m with 1600×900 resolution using a PC with 2.9Ghz and 8GB RAM. The data sets experimented are summarizedin Table 1.

Table 1: Data sets and counts of nodes

Data Set Maxlevel

Total # of files # ofdirectories

# of emptydirectories

IE 11 3 142 136 6 0Java 1.8.0 10 1,662 1,530 132 3

Office 2013 7 5,092 4,797 295 4Hadoop 19 14,688 12,711 1,977 60

Ctex 2.9.2 11 90,981 82,901 8,080 364D Drive 21 105,302 91,183 14,119 549Qt 5.3.0 18 120,458 109,220 11,238 91

C Drive 1 20 337,953 290,737 47,580 1,954C Drive 2 26 530,290 455,940 74,350 3,520

5.1 ScalabilityThis section evaluates the scalability measured in terms of the totalnumber of visible nodes (i.e. each occupying at least one pixel),with increasing resolutions and increasing sizes of data.

To evaluate scalability with increasing resolutions, we use twolarge datasets (D Drive and C Drive 1 in Table 1) to visualizeon 7 wide screen resolutions. In this evaluation, we wish to findthe trend of the percentage of visible nodes (among all the nodes)and the layout time (excluding the file reading, rendering anddisplaying time) with increasing resolutions. Figure 16 shows thatthe percentage of visible nodes raises with the gradually increasingresolutions for both data sets.

Figure 17 shows that as the resolution grows the layout time forboth data sets is linearly increasing.

In summary, Cabinet Tree is highly scalable with increasedresolutions in both the speed and percentage of visible nodes.

Figure 16: Percentage of number of visible nodes within the totalnumber of nodes in different resolutions

Figure 17: Percentage of number of visible nodes within the totalnumber of nodes in different resolutions

To evaluate the scalability of Cabinet Tree against increased datasizes, we tested it using the data sets in Table 1 with the resultsshown in Table 2.

Table 2: Number of visible nodes with different data sets

Data Set Total # of visiblenodes

# of visible files # of visibledirectories

IE 11 142 136 6Java 1.8.0 1,662 1,530 132

Office 2013 5,064 4,771 293Hadoop 13,519 11,739 1,780

Ctex 2.9.2 62,466 60,016 2,450D Drive 67,760 58,707 9,053Qt 5.3.0 95,569 87,690 7,879

C Drive 1 190,035 179,258 10,777C Drive 2 262,381 244,063 18,318

Page 7: Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf · Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data ...

We compare Cabinet Tree with Squarified Treemap and S&DTreemap for their scalability with increased data sizes, as in Figure18.

Figure 18: Comparison of visible numbers among three approaches

Figure 19 compares the percentages of visible nodes within thetotal number of nodes for the three approaches with increaseddata sizes. It is clear that Cabinet Tree outperforms S&DTreemap in all datasets. For some smaller datasets, Cabinet Treeis better than Squarified Treemap because the former visualizesbranches (directories) explicitly while the latter only shows leaves.Squarified Treemap demonstrates a better scalability than CabinetTree in some bigger data sets because the drawing of branches inCabinet Tree costs much space, particularly for deep hierarchiesand large number of branches. The comparison of space utilizationon different data sets is illustrated in Figure 20.

Figure 19: Comparison of three approaches for visible nodes

Figure 20: Space utlization by Cabinet Tree for increased data sizes

Figure 20 shows that when the data sets get bigger, the spaceutilization for leaves by Cabinet Tree is decreasing, and dropsdramatically from Office 2013 (5,064 total number) to Hadoop(13,519 total number).

Comparing the average number of pixels for leaves among threedifferent methods, Figure 21 shows that Cabinet Tree uses the leastfor all the data sets. Cabinet Tree therefore delivers the mostcompact visualization among the three methods.

In summary, Cabinet Tree is more scalable than S&D Treemapin all the cases we experimented. Due to its explicit visualizationof branches that take spaces, Cabinet Tree performs poorer than

Figure 21: Comparison of average pixels for leaves

Squarified Treemap in scalability. However, given a space, CabinetTree can fill more leaves than S&D Treemap and SquarifiedTreemap since the average number of pixels per leaf is less thanthe latter two methods (Figure 21).

5.2 Proximity of Sibling NodesAnother aesthetic measurement is the proximity of sibling nodes.It states that [31] logical neighbors should be placed close inthe visualization. The distance between two neighboring siblingnodes is measured by the distance between their centers. Smallerdistances are desirable.

Figure 22 suggests while S&D Treemap produces the bestaverage distance between sibling nodes, Cabinet Tree achievesa very comparable result, which is much better than SquarifiedTreemap.

Figure 22: Comparison of average pixels for leaves

5.3 Running TimeFigure 23 shows the computational time for the layout algorithm(excluding file reading and rendering) on the same data sets.

Figure 23: Comparison of average pixels for leaves

Figure 23 shows that Cabinet Tree is comparable with S&DTreemap which takes least execution time. The complexity of theCabinet Tree layout is linear, so is S&D Treemap. Since CabinetTree draws more nodes than S&D Treemap as shown in Figure 18,it takes more computational time. Squarified Treemap has worstperformance among the three approaches.

kzhang
Highlight
kzhang
Highlight
kzhang
Sticky Note
delete
kzhang
Highlight
kzhang
Highlight
kzhang
Sticky Note
delete
kzhang
Highlight
kzhang
Sticky Note
change to "number of pixels per leaf"
kzhang
Sticky Note
insert "(usually sibling nodes)"
kzhang
Highlight
kzhang
Sticky Note
change to "distance between sibling nodes" (is this change correct??)
kzhang
Highlight
kzhang
Sticky Note
change ????
Page 8: Yang et al, PDF - University of Texas at Dallaskzhang/Publications/YangEtAlDrawerTree.pdf · Cabinet Tree: An Orthogonal Enclosure Approach to Visualizing and Exploring Big Data ...

6 CONCLUSION AND FUTURE WORK

This paper has presented a 2D approach for visualizing largehierarchical data, called Cabinet Tree. Using the enclosureand orthogonal drawing methods, Cabinet Tree performs aspace-optimized layout for leaves and explicit branches withcarefully designed color schemes for aesthetic and clearvisualization. Quantitative evaluations have indicated that CabinetTree is capable of visualizing large hierarchical datasets. Beingscalable for increased resolutions and data sizes, with highproximity of sibling nodes and high real-time layout speed, CabinetTree can be considered an effective tool for large hierarchyvisualization in a wider range of applications.

We are planning to apply an interaction framework intoCabinet Tree to facilitate effective exploration of large hierarchicalstructures, such as node selection and zoom-in. We will conduct aformal usability study to evaluate the effectiveness of our approach.

REFERENCES

[1] M. Bruls, K. Huizing, and J. J. V. Wijk, “Squarified treemaps,” inProceedings of the Joint Eurographics and IEEE TCVG Symposiumon Visualization, pp. 33–42, 1999.

[2] W. Huang, P. Eades, S.-H. Hong, and C.-C. Lin, “Improvingmultiple aesthetics produces better graph drawings,” Journal of VisualLanguages & Computing, vol. 24, pp. 262–272, Aug. 2013.

[3] B. B. Bederson, B. Shneiderman, and M. M. Wattenberg, “Orderedand quantum treemaps: Making effective use of 2d space to displayhierarchies,” ACM Transactions on Graphics, vol. 21, pp. 833–854,Oct. 2002.

[4] B. Johnson and B. Shneiderman, “Tree-maps: a space-fillingapproach to the visualization of hierarchical information structures,”in Proceeding of Visualization ’91, pp. 284–291, IEEE Comput. Soc.Press, 1991.

[5] M. Balzer, O. Deussen, and C. Lewerentz, “Voronoi treemaps for thevisualization of software metrics,” in Proceedings of the 2005 ACMsymposium on Software visualization - SoftVis ’05, p. 165, ACM Press,May 2005.

[6] M. Balzer and O. Deussen, “Voronoi treemaps,” in IEEE Symposiumon Information Visualization, 2005. INFOVIS 2005., pp. 49–56, IEEE,2005.

[7] B. Shneiderman, “The eyes have it: a task by data type taxonomy forinformation visualizations,” in Proceedings 1996 IEEE Symposium onVisual Languages, pp. 336–343, IEEE Comput. Soc. Press, 1996.

[8] B. Shneiderman, “Tree visualization with tree-maps: 2-d space-fillingapproach,” ACM Transactions on Graphics, vol. 11, pp. 92–99, Jan.1992.

[9] D. Banks, “TennisViewer: a browser for competition trees,” IEEEComputer Graphics and Applications, vol. 17, no. 4, pp. 63–65, 1997.

[10] B. B. Bederson, “Quantum Treemaps and Bubblemaps for a ZoomableImage Browser,” Proc. User Interface Systems and Technology,pp. 71–80, 2001.

[11] M. J. Baker and S. G. Eick, “Space-filling Software Visualization,”Journal of Visual Languages & Computing, vol. 6, pp. 119–133, June1995.

[12] Q. V. Nguyen and M. L. Huang, “Space-optimized tree: aconnection+enclosure approach for the visualization of largehierarchies,” Information Visualization, vol. 2, pp. 3–15, Mar. 2003.

[13] B. Shneiderman and M. M. Wattenberg, “Ordered treemap layouts,” inIEEE Symposium on Information Visualization, 2001. INFOVIS 2001.,pp. 73–78, IEEE, 2001.

[14] R. Blanch and E. Lecolinet, “Browsing zoomable treemaps:structure-aware multi-scale navigation techniques.,” IEEEtransactions on visualization and computer graphics, vol. 13,pp. 1248–53, Jan. 2007.

[15] J. STASKO, R. CATRAMBONE, M. GUZDIAL, andK. MCDONALD, “An evaluation of space-filling informationvisualizations for depicting hierarchical structures,” InternationalJournal of Human-Computer Studies, vol. 53, pp. 663–694, Nov.2000.

[16] M. McGuffin and M. Chignell, “Elastic hierarchies: combiningtreemaps and node-link diagrams,” in IEEE Symposium onInformation Visualization, 2005. INFOVIS 2005., pp. 57–64, IEEE,2005.

[17] J.-D. Fekete and C. Plaisant, “Interactive information visualization ofa million items,” in IEEE Symposium on Information Visualization,2002. INFOVIS 2002., pp. 117–124, IEEE Comput. Soc, 2002.

[18] J. Van Wijk and H. Van de Wetering, “Cushion treemaps: visualizationof hierarchical information,” in Proceedings 1999 IEEE Symposiumon Information Visualization (InfoVis’99), pp. 73–78, IEEE Comput.Soc, 1999.

[19] H. Lu, S. X. Liu, M. M. Wattenberg, and J. Ma XI, “Constructing alabeled treemap with balanced layout,” Apr. 2007.

[20] T. Itoh, Y. Yamaguchi, Y. Ikehata, and Y. Kajinaga, “Hierarchicaldata visualization using a fast rectangle-packing algorithm.,” IEEEtransactions on visualization and computer graphics, vol. 10,pp. 302–13, Jan. 2004.

[21] J. Leggett, “Visualizing Hierarchies and Collection Structures withFractal Trees,” in Sixth Mexican International Conference onComputer Science (ENC’05), pp. 31–40, IEEE, 2005.

[22] H. Lu and J. Fogarty, “Cascaded treemaps: examining the visibilityand stability of structure in treemaps,” in GI ’08 Proceedingsof Graphics Interface 2008, pp. 259–266, Canadian InformationProcessing Society, May 2008.

[23] S. G. Eick and A. F. Karr, “Visual Scalability,” Journal ofComputational and Graphical Statistics, vol. 11, pp. 22–43, Mar.2002.

[24] B. Yost and C. North, “The perceptual scalability of visualization.,”IEEE transactions on visualization and computer graphics, vol. 12,no. 5, pp. 837–44, 2006.

[25] B. Wei, C. Silva, E. Koutsofios, S. Krishnan, and S. North,“Visualization research with large displays,” IEEE ComputerGraphics and Applications, vol. 20, no. 4, pp. 50–54, 2000.

[26] Y. Yang, N. Dou, S. Zhao, Z. Yang, K. Zhang, and Q. V. Nguyen,“Visualizing large hierarchies with drawer trees,” in Proceedings ofthe 29th Annual ACM Symposium on Applied Computing - SAC ’14,pp. 951–956, ACM Press, Mar. 2014.

[27] S. Ji and H. Park, “Image segmentation of color image based on regioncoherency,” in Proceedings 1998 International Conference on ImageProcessing. ICIP98 (Cat. No.98CB36269), vol. 1, pp. 80–83, IEEEComput. Soc, 1998.

[28] Y. Ivanov, A. Bobick, and J. Liu, “Fast Lighting IndependentBackground Subtraction,” International Journal of Computer Vision,vol. 37, pp. 199–207, June 2000.

[29] J. Fan, D. Y. Yau, A. K. Elmagarmid, and W. G. Aref, “Automaticimage segmentation by integrating color-edge extraction and seededregion growing.,” IEEE transactions on image processing : apublication of the IEEE Signal Processing Society, vol. 10,pp. 1454–66, Jan. 2001.

[30] M. Bostock, V. Ogievetsky, and J. Heer, “D: Data-DrivenDocuments.,” IEEE transactions on visualization and computergraphics, vol. 17, pp. 2301–9, Dec. 2011.

[31] J. Liang, S. Simoff, Q. V. Nguyen, and M. L. Huang, “Visualizinglarge trees with divide & conquer partition,” in Proceedings of the 6thInternational Symposium on Visual Information Communication andInteraction - VINCI ’13, p. 79, ACM Press, Aug. 2013.