Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases

Post on 25-Feb-2016

44 views 2 download

Tags:

description

Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases. Pat Hanrahan With Chris Stolte and Diane Tang Computer Science Department Stanford University. Motivation. Large databases have become very common Corporate data warehouses Amazon, Walmart,… - PowerPoint PPT Presentation

Transcript of Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases

PolarisQuery, Analysis, and Visualization

of Large Hierarchical Relational Databases

Pat HanrahanWith Chris Stolte and Diane Tang

Computer Science DepartmentStanford University

Motivation

Large databases have become very common Corporate data warehouses

Amazon, Walmart,… Scientific projects:

Human Genome Project Sloan Digital Sky Survey

Need tools to extract meaning from these databases

Related Work

Formalisms for graphics Bertin’s “Semiology of Graphics” Mackinlay’s APT Roth et al.’s Sage and SageBrush Wilkinson’s “Grammar of Graphics”

Visual exploration of databases DeVise DataSplash/Tioga-2

Visualization and data mining SGI’s MineSet IBM’s Diamond

Formalism

Polaris Formalism

UI interpreted as visual specification that defines: Table configuration Type of graphic in each pane Encoding of data as visual properties of

marks Data transformations and queries

SchemaMarketStateYearQuarterMonthProduct TypeProduct

ProfitSalesPayrollMarketingInventoryMarginCOGS...

Ordinal fields(categorical)

Quantitative fields(measures)

Coffee chain data[Visual Insights]

Polaris Visual Encodings

Principle of Importance Ordering: Encode the most importantinformation in the most effective way [Cleveland & McGill]

The Pivot Table Interface

Common interface to statistical packages/Excel Cross-tabulations

Simple interface based on drag-and-drop

Data Cubes

Structure relation as n-dimensional cube

Each cell aggregatesall measures for those dimensions

Each cube axiscorresponds to a dimension in the relation

Table Algebra: Operands

Ordinal fields: interpret domain as a set that partitions table into rows and columns:

Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}

Quantitative fields: treat domain as single element set and encode spatially as axes:

Profit = {(Profit)}

Concatenation (+) Operator

Ordered union of two setsQuarter + ProductType

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}+{(Coffee),(Espresso)}

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)}

Profit + Sales = {(Profit),(Sales)}

Cross () Operator

Direct-product of two sets

Quarter ProductType ={(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)}

ProductType Profit =

SQL Dataflow

Notes Aggregation operators applied after sort Only one layer is shown; additional z-sort

Relational Table Tuples in Panes Marks in Panes

Sort

Multiscale Visualization

Hierarchical Structure

Challenge: these databases are very large Queries/Vis should not require all the

recordsAugment database with hierarchical structure

Provide meaningful levels of abstraction Derived from domain or clustering Provides metadata (missing data for

context)

Hierarchies and Data Cubes

Each dimension in the cube is structured as a tree

Each level in tree corresponds to level of detail

Schema: Star Schema

StateMonthProductProfitSalesPayrollMarketingInventoryMargin...

Measures

LocationMarketState

TimeYearQuarterMonthProducts

Product TypeProduct Name

Fact tableExistence Table

Generalizations• Snowflake schemas• Lattices (DAGs)

Categorical Hierarchies

Quarter Month Direct product of two sets Would create twelve entries for each

quarter, i.e. (Qtr1, December)Quarter / Month

Based on tuples in database not semantics Would only create three entries per quarter Can be expensive to compute

Quarter . Month Based on tuples in existence tables (not db)

Cartographic GeneralizationCanterbury and East Kent

1:50,000 1:625,000

Generalization: Techniques

Selection

Simplification

Exaggeration

Regularization

Displacement

Aggregation

Summary

Polaris Spreadsheet or table-based displays Simple drag-and-drop interface Built on a formalism that allows algebraic

manipulation of visual mapping of tuples to marks

Multiscale visualizations using data and visual abstraction

Connects to SQL/MDX servers

See http://www.graphics.stanford.edu/projects/polaris

Future Work

Articulate full-set of multiscale design patterns

Transition between levels of detail Develop system infrastructure for browsing

VLDB Support layers/lenses/linking with tuple flow Device independence through graphical

encodings Extend formalism to 3D Couple scientific and information visualization …