2013.10.24 big datavisualization
-
Upload
sean-kandel -
Category
Technology
-
view
6.622 -
download
0
description
Transcript of 2013.10.24 big datavisualization
![Page 1: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/1.jpg)
Visualizing “Big” DataSean Kandel & Je!rey Heer Trifacta Inc. @trifacta
![Page 2: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/2.jpg)
How can we visualize and interact with billion+ record
databases in real-time?
![Page 3: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/3.jpg)
Two Challenges:1. E!ective visual encoding2. Real-time interaction
![Page 4: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/4.jpg)
Perceptual and interactive scalability should be limited by the chosen resolution of the visualized data, not the
number of records.
![Page 5: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/5.jpg)
Perception
![Page 6: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/6.jpg)
Data Sampling
ModelingBinning
![Page 7: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/7.jpg)
Google Fusion Tables (Sampling)
![Page 8: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/8.jpg)
imMens (Binned Aggregation)
![Page 9: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/9.jpg)
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection
![Page 10: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/10.jpg)
Number of Bins?
![Page 11: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/11.jpg)
100,000 Data Points Rectangular BinsHexagonal Bins
Hexagonal or Rectangular Bins?
Hex bins better estimate density for 2D plots,but the improvement is marginal [Scott 92], whilerectangles support reuse and query processing.
![Page 12: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/12.jpg)
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection
2. Aggregate Count, Sum, Average, Min, Max, ...
![Page 13: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/13.jpg)
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection
2. Aggregate Count, Sum, Average, Min, Max, ...
(3. Smooth Optional: smooth aggregates [Wickham ’13])
![Page 14: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/14.jpg)
[1] Wickham 2013
![Page 15: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/15.jpg)
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection
2. Aggregate Count, Sum, Average, Min, Max, ...
(3. Smooth Optional: smooth aggregates [Wickham ’13])
4. Plot Visualize the aggregate summary values
![Page 16: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/16.jpg)
Plot: Visual Encoding
Choose Most E!ective Encoding [Cleveland & McGill ’84]
1D Plot -> Position or Length EncodingHistograms, line charts, etc.
2D Plot -> Area or Color EncodingSpatial dimensions (x, y) already allocated.While less e!ective than area for magnitude estimation, color can be used at the per-pixel level and provides an overall “gestalt”
![Page 17: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/17.jpg)
Standard Color RampCounts near zero are white.
-> Outliers are missed
Add Discontinuity after ZeroCounts near zero remain visible.
-> Outliers can be seen
![Page 18: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/18.jpg)
Linear Alpha Interpolationis not perceptually linear.
Cube-Root Alpha Interpolationapproximates perceptual linearity.
![Page 19: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/19.jpg)
Color Encoding
Luminance (in range 0-1)
Min. Non-Zero Intensity (α=0.15) [1] Perceptual Scaling (γ=1/3) [2]
User-Adjustable Min/Max Values [3]
[1] Keep small non-zero values visible (outliers!)[2] Match color ramp to perceptual distances[3] Enable exploration across value ranges
![Page 20: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/20.jpg)
Design Space of Binned Plots
![Page 21: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/21.jpg)
Interaction
![Page 22: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/22.jpg)
Interaction Techniques?1. Select Detail-on-Demand2. Navigate Pan & Zoom3. Query Brush & Link
![Page 23: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/23.jpg)
![Page 24: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/24.jpg)
![Page 25: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/25.jpg)
5-D Data CubeMonth, Day, Hour, X, Y
X
Y
256
…
767
512 1023…
Day
Hour
Month
23…
0 1 … 30
0 …
11
1
23…
0…
11
0 1 … 30 0 1 … 30 0
23…
0
11
10
…
10
12 x 31 x 24 x 512 x 512 = ~2.3 billion cells
![Page 26: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/26.jpg)
X
Y
256
…
767
512 1023…
Day
Hour
Month
23…
0 1 … 30
0 …
11
1
23…
0…
11
0 1 … 30 0 1 … 30 0
23…
0
11
10
…
10
Brushing JanuaryMonth, Day, Hour, X, Y
31 x 24 x 512 x 512 = ~195 million cells
![Page 27: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/27.jpg)
![Page 28: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/28.jpg)
![Page 29: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/29.jpg)
Multivariate Data Tiles1. Send data, not pixels2. Embed multi-dim data
![Page 30: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/30.jpg)
Full 5-D Cube
For any pair of 1D or 2D binned plots, the maximum number of dimensions needed to support brushing & linking is four.
Σ Σ Σ Σ
![Page 31: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/31.jpg)
X : 512 bins
Y :
512
bins
![Page 32: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/32.jpg)
![Page 33: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/33.jpg)
![Page 34: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/34.jpg)
![Page 35: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/35.jpg)
![Page 36: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/36.jpg)
![Page 37: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/37.jpg)
![Page 38: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/38.jpg)
![Page 39: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/39.jpg)
![Page 40: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/40.jpg)
![Page 41: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/41.jpg)
![Page 42: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/42.jpg)
![Page 43: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/43.jpg)
![Page 44: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/44.jpg)
![Page 45: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/45.jpg)
![Page 46: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/46.jpg)
![Page 47: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/47.jpg)
![Page 48: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/48.jpg)
~2.3B bins
~17.6M bins (in 352KB!)
Full 5-D Cube
13 3-D Data Tiles
Σ Σ Σ Σ
![Page 49: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/49.jpg)
Query & Render on GPU via WebGL
Pack data tiles as PNG image files,bind to WebGL as image textures.
![Page 50: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/50.jpg)
Query & Render on GPU via WebGL
Σ
Invoke program for each output bin.Executes in parallel on GPU.
![Page 51: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/51.jpg)
Query & Render on GPU via WebGL
Σ
![Page 52: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/52.jpg)
Performance BenchmarksSimulate interaction:brushing & linkingacross binned plots.
- imMens vs. Profiler- 4x4 and 5x5 plots- 10 to 50 bins
Measure time from selection to render.
Test setup:2.3 GHz MacBook Pro (4-core)
NVIDIA GeForce GT 650MGoogle Chrome v.23.0
![Page 53: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/53.jpg)
~50fps querying of visualsummaries of 1B data points.
In-Memory Data Cube
imMens
Number of Data Points
5 dimensions x 50 bins/dim x 25 plots
![Page 54: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/54.jpg)
[1] Lins et. al. Infovis 2013
[2] Sismanis et. al. SIGMOD 2002
NanoCubes
![Page 55: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/55.jpg)
[1] Lins et. al. Infovis 2013
NanoCubes
![Page 56: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/56.jpg)
ResourcesimMens vis.stanford.edu/projects/immensTableau Public tableausoftware.com/publicBigVis (R) github.com/hadley/bigvisNanocubes nanocubes.netBlinkDB blinkdb.orgMapD geops.csail.mit.edu/docs/
![Page 57: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/57.jpg)
AcknowledgmentsZhicheng “Leo” LiuBiye Jiang
![Page 58: 2013.10.24 big datavisualization](https://reader034.fdocuments.in/reader034/viewer/2022042713/540dd38e8d7f72747e8b4b86/html5/thumbnails/58.jpg)
Visualizing “Big” DataSean Kandel & Je!rey Heer Trifacta Inc. @trifacta