Graph analytic and machine learning

Post on 07-Jan-2017

342 views 1 download

Transcript of Graph analytic and machine learning

GRAPH ANALYTICS AND MACHINE LEARNING

STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b

Mathematics on Graph

• An abstract representation of a set of entities where some pairs are connected by links;

Entity (Vertex, Node)

Link ( Edge, Relationship)

What is Graph?

Constructing of Graph

Graph Affinity Matrix

Graph Laplacian Matrix

Update Function on Graph

Magic of Properties of Laplacian Matrix

What is a Graph Database?

• A Database with an Explicit Graph Structure;

• Each Node Knows its Adjacent Nodes; • As the Number of Nodes Increases, the

Cost of a Local Step Remains the Same, O(n);

• An Index for Lookups;

Relational Model vs Graph Model

Optimized for Aggregation Optimized for Connections

RDBMS

SQL vs NOSQL C

om

ple

xit

y

Big Table Column Family

Size

Key-Value Store

Document Databases

Graph Databases

90% of Use Cases

Relational Databases

Performance Comparison

Value in Relationships Low High

Key-Value

Why Graph Databases?

K V

BigTable

K V V V V

Document

Relational

Graph

NoSQL and Big Data

14

• Traditional databases handle big data sets, too. But, more on structure data;

• NoSQL databases have poor analytics;

• HDFS, MapReduce often works from text files;

• NoSQL is more for high throughput, basically, AP from the CAP theorem, instead of CP;

• In practice, Big Data is likely to be a mix of text files, NoSQL, and SQL RDBMS;

Graph Terminology

• Graph Computation(Analytics):

o Whole graph is processed, typically for several

iterations vertex-centric computation.

o Examples: Belief Propagation, Pagerank,

Community detection, Triangle Counting,

Matrix Factorization, Machine Learning…

• Graph Database (Queries):

o Selective graph queries (compare to SQL

queries)

o Traversals: shortest-path, friends-of-friends,…

15

GRAPH ANALYTICS

What Graph Can Model?

Graphs are Essential to ML

• Identify influential people and information;

• Discover communities;

• Understand people’s interests in common;

• Model complex real life data dependencies;

It’s all about GRAPH: The Value of Data is Proportional to the Number of Meaningful Relationships!

Complex Big Data Graph ML Algorithms

Graph Social Network Model

Model can be easily used in real life applications for customer classification, profiling, segmentation and product

recommendations.

Identifying Key People

Social Network Tie Recommendation

Full Stack Graph ML Algorithms

Typical Graph Analytics

Graph Analytics - Page Rank

• PageRank, is about the importance of nodes in GRAPH – Link Analysis, which is defined as the probability falling into node depending on: The probability

landing onto one of the node’s neighbor;

The probability crossing the link from neighbor to it;

o Identify the influential leader;

Graph Analytics - Triangle Count • Clustering coefficient (CC) is a

measure of the degree to which nodes in a graph tend to cluster together;

• Calculation of CC can be tuned to counting the number of triangles around one particular node in the graph;

• CC indicates the degree to which a node’s neighbors are themselves neighbors;

• CC of a graph is closely related to the transitivity of a graph;

Graph Analytics - Connected Components

• Connected component is a subgraph in which any

two vertices are connected and no additional

vertices connected to the supergraph;

• A graph is strongly connected if every vertex is

reachable from other vertices. The strongly

connected components form a partition into

subgraphs that are themselves strongly connected;

• A spanning tree is a subgraph of the original graph,

which connect all the vertexes that where originally

connected;

• A minimum spanning tree (mst) is a spanning tree

such that the sum of the weights of its edges is not

greater than the sum of the edges of any other

spanning tree;

Graph Analytics - Betweenness centrality

• Betweenness centrality is an indicator of a node's centrality in a network, which is equal to the number of shortest paths from all vertices to all others that pass through that node;

• A node with high betweenness centrality has a large influence on the transfer of items through the network;

• Betweenness centrality is related to a network's connectivity;

Graph Social Media Recommendation

Graph Computing Opportunity

Combining with the leading tools such as Graph Database, Machine Learning, High Performance

Computing, Clustering, Streaming, Graph Computing Technology is ready to take off in Big

Data Era!

Distributed Graph Analytics System

How to Construct Graph?

Graph ETL Data Flow

Graph ETL Example

Graph ETL Architecture