When Graph Meets Big Data: Opportunities and...
Transcript of When Graph Meets Big Data: Opportunities and...
![Page 1: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/1.jpg)
When Graph Meets Big Data: Opportunities and Challenges
Yinglong XiaHuawei Research America11/13/2016
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16)
High Performance Graph Data Management and Processing (HPGDM 2016)
![Page 2: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/2.jpg)
2
Introduction
![Page 3: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/3.jpg)
3
![Page 4: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/4.jpg)
4
Recent Growth
Revenue
Net ProfitsCash flow http://www.huawei.com/en/about-huawei
Huawei has business in over 170 countries, with 150,000 employees, approximately 70,000 of which are engaged in Research & development. Huawei operates a global network of 14 regional headquarters, 16 R&D Centers, 28 Innovation Centers jointly operated with customers, and 45 Training Centers.
![Page 5: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/5.jpg)
5
Graph Analytics Basics
![Page 6: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/6.jpg)
6
Brief History
N.T. Bliss, Confronting the Challenges of Graphs and Networks, Lincoln Laboratory Journal, 2013
2016
Neuronal network @ Human Brain Project 89 billion V & 100 trillion E
61.6 million V1.47 billion E
40 million V300 million E
![Page 7: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/7.jpg)
Import properties/metrics:- Small-world effect- Betweenness- Eccentricity/Centrality- Transitivity- Resilience- Community structure- Clustering coefficient- Matching index
7
Complex Network AnalysisReal world complex networks include WWW, Social Network, Biological network, Citation Network, Power Grid, Food Web, Metabolic network, etc.
Complex network models:- Poisson random graph
- degree~Poisson- Small world effect
- Watts and Strogatz graph- Transitivity- Small world effect
- Barabasi and Albert graph- Small world- Power law
![Page 8: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/8.jpg)
8
Diversity in Graph TechnologyDynamic graph helps analyze thespatial and temporal influence overthe entities in the network
RDF graph enables knowledge inference over linked data
Streaming graph monitorssentiment propagation overtime and how the graph structure can impact
Property graph is widely used as a data storage model to manage the properties of entities as well as the interconnections
Vertex ID
Edge label
Edge property
Graph technology leads to rich analytic abilities
Graphical models leverages statistics to inference latentfactors in a complex system
![Page 9: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/9.jpg)
9
Industrial Use Cases
![Page 10: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/10.jpg)
10
Financial Risk Management• 56 million small business owners in China• 1/3 want to finance their ventures• Weight to 24.3% of the GDP• 16.3% received loads from banks• 3/4 citizens in China has no credit history
Identifying Fraud by Graph
Analyze one’s social connections
Motivation
Credit defaultswap record
Visit gamble forum
Called lier in someone’s comment
• Strong variables• Weak variables• Individual regression• collective regression
Traditional credit scoring organizations
Emerging credit scoring organizations
• Customer profiling• Precise engagement• Anti-fraud control• Credit scoring in realtime• Post-load management• Lost-customer engage
Credit Scoring
Samsung Group cross-ownership structure in Gephi network visualization
Auditing such labyrinthian beasts is enough to bring a grown auditor to tears. This is where network ‘graph analytics’ can be invaluable
Auditing
Emerging approach
Traditional approach
![Page 11: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/11.jpg)
11
Public Security
• Knowledge graph• Heterogeneous data• Information integration• Complex search• Security analysis
Insider Thread Solution
• Graph is natural to link data from different sources for exploration
• Graph can combine with probabilistic models for inference
• Graph helps find anomalous behaviors
Advantage of Graph for Security
Use Case -1 Use Case -2
![Page 12: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/12.jpg)
12
Telecom FraudIn May 2016, Communications Fraud Control Association (CFCA) and the Forum for International Irregular Network Access (FIINA), operators ranging from AT&T, Vodafone, Korea Telecom to Orange and Deutsche Telekom shed light on how old and newer forms of fraud are detected and combatted
Anti-Fraud platform proposed by TMForum
![Page 13: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/13.jpg)
13
Existing Graph Systems
![Page 14: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/14.jpg)
14
Some Existing Products
Visualization
Analytics
Frameworks
Storage
ScaleGraph
Flink/Gelly
![Page 15: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/15.jpg)
15
Neo4j System Architecture and Storage FormatTraversals Core API Cypher
Vertex/Edge Cache Thread local diffs
FS Cache HA
Record filesTransaction log
Disk
Graph structure and data buffers
i.e. mmap
LFU-protocol
Link edges inclined to a vertex using the relationship data structure, imposing some performance issue for handling celebrates in power-law graphs e.g. social network
Neo’s declarative query language
for TX roll-back
changes in a TX
High Availability based on TX
Easy to implement horizontal partitioning in FS
![Page 16: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/16.jpg)
16
Titan System Architecture and Storage Format
Store Manager
Transaction store
Relations
Index Store
![Page 17: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/17.jpg)
17
OrientDB System ArchitectureGraph JDBC
DocDB based storage
Support distributed platforms, offering key-value store, docDB, and graphDB in one system
![Page 18: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/18.jpg)
18
Glance at Graph Computing EnginesSpark/GraphX
GraphChi
![Page 19: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/19.jpg)
19
Graph in ONOS
HotSDN’2014
![Page 20: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/20.jpg)
20
Challenges
![Page 21: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/21.jpg)
21
Challenges - Performance● Understand performance bottleneck by
breaking down the execution time● Bottleneck comes from the memory sub-system
● DTLB is inefficient● Cache performs well● Cache MPKI rate is high
Core graph algorithms from 21 real-world use cases
3 different types of graph computing, with focus on structural traversal, property processing, and graph editing, respectively
![Page 22: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/22.jpg)
22
Challenges — Input Sensitivity● Impact from graph topology
● Power-law graph results in imbalanced workload due to dense vertices
● Dense subgraph, sparse backbone
● Dense subgraph can be converted into matrices
● Iterative update in a subgraph● Road net is easy to decompose
● Property type matters● More time spent on property
management● Computing performance can be
negatively impacted
Performance is inconsistent across different graph types
![Page 23: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/23.jpg)
23
Challenges - Impact of H/W Accelerator● GPU can be helpful
● Sufficient acceleration by GPU● Requires re-design of the
algorithms
● Challenges● Data must be transferred to GPU● Cost of Host to Device data transfer● Difficulty in putting large graph into
GPU (Double buffering)● Sensitive to input graph data
Speedup of NVIDIA Tesla K40 over 16-core Intel Xeon E5-2670
Memory divergency shows higher sensitivity for graph computing on GPU
![Page 24: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/24.jpg)
24
Challenges - Scale-out Issue● Poor data locality and difficult partitioning result in
challenges in scaling out the computing ● Scale-out challenges can be
seen in Graph500 analysis● Single machine with big
memory can help● Must be cautious to use
many computing nodes
degraded performance when #core is 100~1000
Analysis of data from Graph500
*from Peter Kogge
![Page 25: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/25.jpg)
25
Breakthrough
![Page 26: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/26.jpg)
26
Graph Platform for Smart Big Data
Infrastructure
Data Management
Graph engines
Visualization
Analytics
Single Machine Cluster GPU Server Cloud
Structure Management
PropertyManagement
Metadata Management
Permission Control
Basic Engine
Streaming Graph Graphical Model Hyper Graph
Bayes NetCommunity
Label propagationCentrality
Anomaly detection
Matching
Ego Feature
Max Flow
Dynamic Graph Vis Property Vis Large Graph Vis
Incremental Update
![Page 27: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/27.jpg)
27
Unified Graph Data Access Patterns1
2
3
4
5
6
1 2 3 4 5 60.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.20.3
1.1equivalent
src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8
src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9
src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1
shard 1 (1, 2) shard 2 (3,4) shard 3 (5,6)
src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8
src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9
src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1
src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8
src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9
src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1
1
2
3
4
5
6
0.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.2
0.3
1.1
1
2
3
4
5
6
0.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.2
0.3
1.1
step
1st
ep 2
step
3
obse
rvat
ion
on P
SW d
ata
acce
ss
patte
rns
insp
ires
high
ly e
ffici
ent
shar
ding
repr
esen
tatio
n
Itera
tion i
![Page 28: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/28.jpg)
28
Experiments
Performance improvement of SSSP against GraphChi including data ingestion
Execution time breakdown on Pagerank running Twitter-2010
Execution time breakdown on Pagerank running Twitter-2010
Disk read bandwidth over time by GraphChi and EdgesSet. Edge-Set showed up to 2x aggregate bandwidth and more constant IO usage.
![Page 29: When Graph Meets Big Data: Opportunities and …hpgdmp.bsc.es/system/files/uploads/Xia_GraphInBigData...2016/11/13 · When Graph Meets Big Data: Opportunities and Challenges Yinglong](https://reader034.fdocuments.in/reader034/viewer/2022050420/5f8f8489b9e14120e86abc5c/html5/thumbnails/29.jpg)
29
Opportunities in Graph Technology for Big Data● Develop high performance graph computing kernels and primitives
• Graph500 technique based architecture-awareness for graph computing• Heterogeneous computing and computing near-data technology
● Reinvent graph technology for supporting cognitive computing• One open platform with multiple graph and graph-related technologies• Integral consideration on graphical model, streaming graphs, etc. for AI/IoT
● Offer vertical solutions to break through separation among technique stacks• Holistic solution for rapidly building industry-level graph analytics solutions• Incorporating with market segmentation, such as security, finance, etc.
● Collaborations and Standardization• Foster collaboration with relevant professional communities to educate the market• Developing domain or cross-domain standardizations