Massivegraph telecom ppt

39
Analyzing the Structure and Evolution of Massive Telecom Graphs Amit Nanavati, Rahul Singh, Anupam Joshi, Gautam Das CIKM 2006 Presented by Harshavardhan Achrekar University of Massachusetts Lowell

description

Data Visualization and Massive graph

Transcript of Massivegraph telecom ppt

Page 1: Massivegraph telecom ppt

Analyzing the Structure and

Evolution of Massive Telecom Graphs

Amit Nanavati, Rahul Singh, Anupam Joshi, Gautam DasCIKM 2006

Presented by Harshavardhan AchrekarUniversity of Massachusetts Lowell

Page 2: Massivegraph telecom ppt

Introduction

• Mobile Telecom Focus / Challenges

Customer Acquisition

Customer Retention

• Avoid churns

• Strategy :- right incentive (loyalty program), marketing strategies(family plans), place network assets appropriately.

• Goal / Critical Requirements

Optimizing Marketing Expenditure

Improved Targeting

Loss of Subscribers who switch from one carrier / service provider to another.Loss of Subscribers who switch from one carrier / service provider to another.

Page 3: Massivegraph telecom ppt

Call Detail Record (CDR) / CALL GRAPH

• Graph theory explains User Behaviour patterns

CALL GRAPH disconnected

blanket advertisement over word of mouth spreading

presence of cliques

presence of communities ...effective group targeting and retention

CallerCallerReceivReceiv

erer

{Time stamp , Duration}

A clique in an undirected graph G = (V, E) is a subset of the vertex set C   ⊆ V, such that for every two vertices in C, there exists an edge connecting the two.

Page 4: Massivegraph telecom ppt

Authors Contribution

• Structural Analysis of CDR’s of one of the Largest Mobile Telecom in World.

• Topological properties of massive call graphs like shape, degree distribution , cliques , connected components etc., power law in scale free network.

• Model build on edge distribution as opposed to node distribution of components of the graph.

• Temporal analysis performed against Static snapshot of network.

• Short Messaging Services (SMS) graph analyzed.(skipped !!!)

Page 5: Massivegraph telecom ppt

Data Sources

• Study was done on single mobile operator in India.

• Analyzed calling patterns of four regions ; two metropolitan cities for a week and two states with mixture of rural and urban population for one month.

• CDR’s stored at base station in data warehouse.

Page 6: Massivegraph telecom ppt

• CALL GRAPH G is a pair <V(G),E(G)>, where V (G) is a nonempty finite set of vertices , and E(G) is a finite set of vertex-pairs from V (G). If u and v are vertices of G, then edge <u,v> implies u calls v.

• Multiple calls between 2 user / nodes are treated as single edge.

• Short duration calls (less than 10 seconds), long distance or international calls ignored.

Page 7: Massivegraph telecom ppt

Structural properties of CALL GRAPHS

•Node Degree Distribution- gives information about the number of nodes n(d) of each degree d in the graph. ( P(d) = n(d)/n).

• The degree distribution P(d) for directed networks splits in, the in-degree distribution P(din) and the out-degree distribution P(dout), which are measured separately as probabilities of having din incoming links and dout outgoing links, respectively.

• See Fig 1 and 2. Indegree dist follows WWW.(expo)

Page 8: Massivegraph telecom ppt

• heavy-tailed form fits a power-law behavior

• Few nodes that have very high in-degree or out-degree and may be suitable for individual targeting by telecom service provider.

Page 9: Massivegraph telecom ppt

• Neighbourhood Distribution / Hop Plot N(h) for a graph is the number of pairs of nodes within a specified distance, for all distances h.

• The individual neighbourhood function for u at h is number of nodes at distance h or less from u.

• The neighbourhood function N(h) is the number of pairs of nodes within distance h.

• H is hop exponent. {Use ANF tool.}

Page 10: Massivegraph telecom ppt

Two graphs with different hop exponents, are structurally different.

compute hop exponent using linear fit on the N(h) distribution and found it close to 4 and 5 in telecom network like dense WWW.

Grid has hop exponent 2

Region A AND B have same H.

Page 11: Massivegraph telecom ppt

•Effective Diameter of the Network :- For a call graph of N nodes with E edges effective diameter

•See Table II ...max value is 13.

•if any two nodes are within delta eff hops from each other with a high probability.

• Small-World phenomenon exists in mobile call graphs since most pairs of nodes (phone numbers) are separated by a handful of edges (calls).

•identify social communities....Milgram’s experiment..

Page 12: Massivegraph telecom ppt

•Cliques:- Useful for defining closed user groups , where discounts are given for all calls made within the closed user group. (family plan)

•The number and sizes of such groups also gives an idea of what are the right incentives to offer.

• See Fig 4...many cliques of size 3-4...max 17.

Page 13: Massivegraph telecom ppt

•Page Rank p(i) of page i

•measure of Social importance of individual...grows with number of people calling the individual and social importance of the callers.

•See Fig 5 ..follows power law distribution.

Page 14: Massivegraph telecom ppt

Strongly Connected Component (SCC)

Scale free networks exhibits presence of SCC.

Largest SCC is significantly larger than second largest SCC. (Observed in WWW

graph also.)

Page 15: Massivegraph telecom ppt

Shape of CALL Graphs•build a generative model - To study and predict usage growth in a new region

•Reach experiments to obtain shape of network.

•Structure based on Node Distribution

•spot all the connected components and place them spatially with interconnections & identify shape.

•use Random Start Breadth First Search.

•experiment collected a set of random sample nodes and computed the reach of all these nodes

Page 16: Massivegraph telecom ppt

•While ‘reachability’ of a node v means whether v is reachable from another node u, we use ‘reach’ of v to mean the set of nodes (or its cardinality) reachable from v. .....nodal analysis.

•See Table IV

• Reach (R of a node u) is the number of all possible nodes reached in BFS, when starting from a given node.

•Percentage Reach (P = R/N) is the percentage of nodes reached (to total number of nodes in the graph).

• Reach Probability (pR) denotes the percentage probability that a given node has reach R.

Page 17: Massivegraph telecom ppt

Bow-Tie Network...like WWW•Reach Split between 1-6 or 1022575-

1022586

•massively connected component CC (nodes having reach exactly equal to 1022575)

•entry component (nodes having reach more than 1022575)

• exit component (nodes have reach less than 6)

•disconnected components.

Page 18: Massivegraph telecom ppt

• explain Table V.

• Sizes of IN,SCC,OUT for the WWW are nearly of same order (44 million, 56 million, and 44 million respectively) .

• For our graphs, the SCC is often an order of magnitude larger than IN, and OUT is often nearly twice that of IN (124801, 755592, 266984 respectively).

• Bow-Tie model does not characterize our graphs.

Page 19: Massivegraph telecom ppt

Structure based on Edge Density•examine the number of vertices in the

various regions (IN;OUT,etc.).

•From the BFS experiment, we know that starting from a particular node, the reach is either huge (>1022575) or very low (< 6).

•We collected the nodes whose reach is very high. These are the nodes of SCC and IN region.

•Starting from nodes with high reach, we collected the nodes that are reachable. These nodes belong to the SCC and OUT regions.

•We intersected these two sets to isolate SCC, IN and OUT and extracted the several edge-induced subgraph which are defined in Table VI.

Page 20: Massivegraph telecom ppt
Page 21: Massivegraph telecom ppt

Left part & Right part captures number of nodes from two sets of bipartite graphs.Edge ratio column reports the ratio of edges in a particular component to the edges of IN-SCC region.

Page 22: Massivegraph telecom ppt

CALL Graph as a Treasure Hunt Model

Page 23: Massivegraph telecom ppt

Temporal Analysis•how some of the structural properties

of these call graphs vary with time.

•For regions B and C which had one week’s data researchers looked at cumulative CDR’s at each of the seven days.

•For regions A and D had one month’s data so looked at seven time points at intervals of four days each.

•no noise...celebration...

Page 24: Massivegraph telecom ppt

Degree Distribution (degree increase with time)

Page 25: Massivegraph telecom ppt

Preferential Attachment in the network, nodes with higher degree have stronger ability to

grab new links.

• first found out the in-degrees and out-degrees of nodes on the first day and for the same set of nodes, the average of their in-degrees and out-degrees on the seventh day

Page 26: Massivegraph telecom ppt

Neighborhood Distribution

• indication of the effective diameter of a graph

• plot gives insights on how the diameter of the call graphs is changing with time.

• Maximum distance between any two pairs in graph is decreasing with time.

• This decreasing diameter phenomenon is observed in WWW.

Page 27: Massivegraph telecom ppt

Cliques

•First day itself the largest sized clique is 7. By the fifth day a clique of size 12 is formed.

• linear increase in the number of cliques of smaller sizes each day.

•no cliques of size greater than 12 are formed in the last two days

Page 28: Massivegraph telecom ppt

Strongly Connected Component

•Fraction of nodes present in the largest SCC increases rapidly with time.

•Fraction of nodes present in SCCs of the smallest sizes is decreasing with time.

•CALL graphs show a tendency of greater accumulation into a single SCC over time by taking in nodes from the smaller components.

Page 29: Massivegraph telecom ppt

Treasure Hunt Model

• The number of edges in the maze increase very rapidly. ...expected ... observed that the percentage of nodes in the SCC was increasing.

• The sizes of in-tunnel & out-tunnel are also increasing. But increase is not as rapid as maze.

• The sizes of the treasure & entry components are decreasing. Shortcuts remain constant with time.

Page 30: Massivegraph telecom ppt

•The maze is getting bulkier by sucking in edges from the side components i.e. the entry on the one end and the treasure on the other end.

Page 31: Massivegraph telecom ppt

•Ratio of edges in various components wrt the edges in the in-tunnel component were similar for 4 regions.

•Fraction of edges in the maze is increasing.

•Fraction of edges in entry & treasure are decreasing

•Fraction of edges in shortcuts and out-tunnel remain almost constant.

Page 32: Massivegraph telecom ppt

Will components collapse into a single large maze?

•Densification -- No.

•New people who join the network initially make or receive a few calls and hence are part of the IN or the OUT region.

•Over time they make and receive more calls thus pulling them into the SCC.

•The constantly high influx of new nodes into the IN and the OUT regions suggests against the total vanishing of the treasure and entry regions.

Page 33: Massivegraph telecom ppt

Conclusion

•Systematic Approach to analyze network topologies

•Shape of CALL Graph follow Treasure Hunt Model

•Evolution of CALL Graph over Time

•SMS Graph is Social.(more reciprocative..larger cliques size).(skipped!!!)

Thank you

Page 34: Massivegraph telecom ppt

SMS Graph Analysis (Appendix)

Page 35: Massivegraph telecom ppt
Page 36: Massivegraph telecom ppt
Page 37: Massivegraph telecom ppt
Page 38: Massivegraph telecom ppt
Page 39: Massivegraph telecom ppt