Post on 07-Jan-2017
GRAPH ANALYTICS AND MACHINE LEARNING
STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
Mathematics on Graph
• An abstract representation of a set of entities where some pairs are connected by links;
Entity (Vertex, Node)
Link ( Edge, Relationship)
What is Graph?
Constructing of Graph
Graph Affinity Matrix
Graph Laplacian Matrix
Update Function on Graph
Magic of Properties of Laplacian Matrix
What is a Graph Database?
• A Database with an Explicit Graph Structure;
• Each Node Knows its Adjacent Nodes; • As the Number of Nodes Increases, the
Cost of a Local Step Remains the Same, O(n);
• An Index for Lookups;
Relational Model vs Graph Model
Optimized for Aggregation Optimized for Connections
RDBMS
SQL vs NOSQL C
om
ple
xit
y
Big Table Column Family
Size
Key-Value Store
Document Databases
Graph Databases
90% of Use Cases
Relational Databases
Performance Comparison
Value in Relationships Low High
Key-Value
Why Graph Databases?
K V
BigTable
K V V V V
Document
Relational
Graph
NoSQL and Big Data
14
• Traditional databases handle big data sets, too. But, more on structure data;
• NoSQL databases have poor analytics;
• HDFS, MapReduce often works from text files;
• NoSQL is more for high throughput, basically, AP from the CAP theorem, instead of CP;
• In practice, Big Data is likely to be a mix of text files, NoSQL, and SQL RDBMS;
Graph Terminology
• Graph Computation(Analytics):
o Whole graph is processed, typically for several
iterations vertex-centric computation.
o Examples: Belief Propagation, Pagerank,
Community detection, Triangle Counting,
Matrix Factorization, Machine Learning…
• Graph Database (Queries):
o Selective graph queries (compare to SQL
queries)
o Traversals: shortest-path, friends-of-friends,…
15
GRAPH ANALYTICS
What Graph Can Model?
Graphs are Essential to ML
• Identify influential people and information;
• Discover communities;
• Understand people’s interests in common;
• Model complex real life data dependencies;
It’s all about GRAPH: The Value of Data is Proportional to the Number of Meaningful Relationships!
Complex Big Data Graph ML Algorithms
Graph Social Network Model
Model can be easily used in real life applications for customer classification, profiling, segmentation and product
recommendations.
Identifying Key People
Social Network Tie Recommendation
Full Stack Graph ML Algorithms
Typical Graph Analytics
Graph Analytics - Page Rank
• PageRank, is about the importance of nodes in GRAPH – Link Analysis, which is defined as the probability falling into node depending on: The probability
landing onto one of the node’s neighbor;
The probability crossing the link from neighbor to it;
o Identify the influential leader;
Graph Analytics - Triangle Count • Clustering coefficient (CC) is a
measure of the degree to which nodes in a graph tend to cluster together;
• Calculation of CC can be tuned to counting the number of triangles around one particular node in the graph;
• CC indicates the degree to which a node’s neighbors are themselves neighbors;
• CC of a graph is closely related to the transitivity of a graph;
Graph Analytics - Connected Components
• Connected component is a subgraph in which any
two vertices are connected and no additional
vertices connected to the supergraph;
• A graph is strongly connected if every vertex is
reachable from other vertices. The strongly
connected components form a partition into
subgraphs that are themselves strongly connected;
• A spanning tree is a subgraph of the original graph,
which connect all the vertexes that where originally
connected;
• A minimum spanning tree (mst) is a spanning tree
such that the sum of the weights of its edges is not
greater than the sum of the edges of any other
spanning tree;
Graph Analytics - Betweenness centrality
• Betweenness centrality is an indicator of a node's centrality in a network, which is equal to the number of shortest paths from all vertices to all others that pass through that node;
• A node with high betweenness centrality has a large influence on the transfer of items through the network;
• Betweenness centrality is related to a network's connectivity;
Graph Social Media Recommendation
Graph Computing Opportunity
Combining with the leading tools such as Graph Database, Machine Learning, High Performance
Computing, Clustering, Streaming, Graph Computing Technology is ready to take off in Big
Data Era!
Distributed Graph Analytics System
How to Construct Graph?
Graph ETL Data Flow
Graph ETL Example
Graph ETL Architecture