On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.
-
Upload
heidi-degon -
Category
Documents
-
view
216 -
download
0
Transcript of On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.
![Page 1: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/1.jpg)
On the role of Interactivity and Data Placement in Big Data Analytics
Srini ParthasarathyOSU
![Page 2: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/2.jpg)
The Data Deluge: Data Data Everywhere
22
![Page 3: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/3.jpg)
600$ to buy a disk drive that can store all of the
world’s music
3
[McKinsey Global Institute Special Report, June ’11]
Data Storage is Cheap
![Page 4: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/4.jpg)
Data does not exist in isolation.
4
![Page 5: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/5.jpg)
Data almost always exists in connection with other data – integral
part of the value proposition.
5
![Page 6: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/6.jpg)
6
Social networks Protein Interactions Internet
VLSI networks Data dependenciesNeighborhood graphs
![Page 7: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/7.jpg)
7
Big Data Problem: All this data is only useful if we can scalably extract useful knowledge from such complex data
![Page 8: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/8.jpg)
THIS TALK
• THE ROLE OF DATA PLACEMENT IN BIG DATA SYSTEMS
• THE ROLE OF VISUALIZATION AND INTERACTION IN BIG DATA ANALYSIS
![Page 9: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/9.jpg)
GLOBAL GRAPHS
![Page 10: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/10.jpg)
GLOBAL GRAPHS
• What? – System for deploying applications processing complex data
• Why? – Seeks balance between high productivity and high performance
• How?– Built on top of PNL’s GlobalArrays– Trees (GlobalTrees, GlobalForests)– Relational Arrays (ArrayDB-GA)– Graphs (GlobalGraphs)
• Data Placement is key to high performance
![Page 11: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/11.jpg)
Importance of Data Placement
• Locality– Placing related items close to each other so they may be
processed together
• Mitigating Impact of Data Skew– Reducing load imbalance in a parallel setting– Reducing variance in partition samples
• Generating Stratified Samples– Improving interactive performance
![Page 12: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/12.jpg)
Key Ideas
• Pivotization– Convert data with complex structure into sets– Each element of set captures features of local topology
• Hashing into Strata: Hash related sets into similar bins– Can employ a sketch-clustering algorithm
• Partitioning: Place Strata into partitions for• Locality • Mitigating Data Skew• Samples
![Page 13: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/13.jpg)
SK
ETCH
SORT
or S
KETC
HCL
UST
ER
S-1 : : S-4(Δ1, SK-1)(Δ5, SK-5)(Δ12,SK-12)(Δ25,SK-25) : : :
S-5 : : : S-128 : : :
PART
ITIO
NIN
G &
REP
LICA
TIO
N
P-1 : P-2 S-4 S-7 S-8 S-12 : S-128
P-3 : : : P-8 S-3 S-4 S-9S-12 : S-127
PIVO
T
T
RAN
SFO
RMAT
ION
S
A
B C
LE
A
B C
LE F
.
.
.
.
Δ1
Δ25
DATA (Δ)
A
B C
A
F C
A
E C
A
F L
B
E F
A
E L
A
B L
A
B C
A
E CA
E L
A
B L
.
.
.
.
(PS-1)
(PS-25)
PIVOT SETS (PS)
MIN
WIS
E H
ASH
ING
on
PIVO
T SE
TS
{1050, 2020,3130,1800} (SK-1)
{1050, 2020,7225, 2020} (SK-25)
.
.
.
.
.
.SKETCHES(SK) Strata (S)
![Page 14: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/14.jpg)
Frequent Tree Mining
• Our proposed approaches shows 100X gains
![Page 15: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/15.jpg)
WebGraph Compression
• Linear Scaleup with no loss in compression ratio
![Page 16: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/16.jpg)
PRISM-HD -
PRobing the Intrinsic Structure and Makeup of High-dimensional Data
HD
![Page 17: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/17.jpg)
Visualization and Interactivity are key to discovery
17
![Page 18: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/18.jpg)
PRISM-HD• What?
– A novel mechanism for exploring complex data
• Why?– User is often overwhelmed with
characteristics of data– Befuddled on where to start
• How?– Given, similarity measure-of-interest– Compute similarity graph at threshold (t)
• Key: Graphs are dimensionless
– Provide user graph visualization cues• User determines next threshold and
repeats
HD
![Page 19: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/19.jpg)
HD
HIGH THRESHOLD MODERATE THRESHOLD LOW THRESHOLD
![Page 20: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/20.jpg)
Benefits of Knowledge CachingHD
![Page 21: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/21.jpg)
Benefits of Incremental Processing on Twitter
Incremental estimates on Twitter t1 = 0.95
HD
![Page 22: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/22.jpg)
PRISM-HD and Global Graphs in Context:Leveraging Social Media in Emergency Response
HD
![Page 23: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/23.jpg)
Concluding Remarks
• Data is everywhere• Data is fraught with complexities
– Dimensionality, dynamics, structure, massive…• Both data placement and data interactivity
have an important role to play in big data analytics– PRISM-HD and GlobalGraphs can help!
HD
![Page 24: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.](https://reader036.fdocuments.in/reader036/viewer/2022062417/5519d205550346047c8b4c58/html5/thumbnails/24.jpg)
Thanks for your attentionContact: [email protected]
Mining Simulation Data
Medical Image Analysis
Protein Interaction Network (yeast)
Acknowledgements: Various NSF, NIH, DOE and industry grants