Statistical Inference for Large Directed Graphs with Communities of Interest Deepak Agarwal.

Statistical Inference for Large Directed Graphs with

Communities of Interest

Deepak Agarwal

Outline

• Communities of Interest : overview

• Why a probabilistic model?

• Bayesian Stochastic Blockmodels

• Example

• Ongoing work

Communites of interest

• Goal: understand calling behavior of every TN on ATT LD network: massive graph

• Corinna, Daryl and Chris invented COI’s to scale

computation using Hancock (Anne Rogers and Kathleen Fisher)

• Definition: COI of TN X is a subgraph centered around X– Top k called by X + other– Top k calling X + other

COI signature

Otheroutbound

Otherinbound

• Entire graph union of COI’s

• Extend a COI by recursively growing the spider – Captures calling behavior more accurately

• Definition for this work: – Grow the spider till depth 3. Only retain depth 3 edges

that are between depth 2 nodes.

Extended COI

Enhancing a COI !!• Missed calls:

– Local calls where TN’s not ATT local– Outbound OCC calls– Calls to/from the bin “other”

• Big outbound and inbound TNs– Dominate the COI, lot of clutter.– Need to down weight their calls.

• Other issuesWant to quantify things like tendency to call, tendency of

being called, tendency of returning calls for every TN.

Our approach so far

• COI -> social network

• Want a statistical model that estimates missing edges, add desired ones and remove (or down weight) undesired ones.

me COI from top probability edges of a statistical model.

The model adds new edges. (brown arrows)

Removes undesired ones.

Getting a sense of data

Some descriptive statistics

based on a random sample

of 500 residential COI’s.

density = 100*ne/(g(g-1))

ne = number of edges

g = number of nodes

Under random

Average conditional on out -degrees

Under random:Conditional on outdegrees

Under random:Conditional on indegrees

Distribution of “Other"

Representing the Data

• Collection of all edges with activity

• Matrix with no diagonal entries

• Collection of several 2x2 contingency tables

COI: gxg matrix without diagonal entries

COI: collection of 2x2 tables.

• Data matrix a collection of g(g-1)/2 2x2 tables (called dyads).

mijaij

ajinij

present

absent

present absent

Row total

Column total

More probabilities than edges.

Need to express them in terms of fewer parameters which could be learned from data.

i j jiijijjjriis

wzwrws

wwwMClikelihood

All Greek letters to be estimated from data

Computation: 2 minutes for a typical COI on fry

Likelihood, gradient and Hessian computed using C, optimizer in R.

Optimizer goes crazy due to presence of so many zero degrees

Do regularization, known as “shrinkage estimation” in statistics.

Incur bias for small degree nodes but get reduction in variance.

Meaning of parameters

• Node i: – αi: expansiveness (tendency to call) – βi: attractiveness (tendency of being called)

• Global parameters:– θ: density of COI (reduces with increasing

sparseness)– ρ: reciprocity of COI (tendency to return calls)– λs: “caller” specific effect– λr: “cal lee” specific effect– γ: “call” specific effect

Differential reciprocity

• Different reciprocity for each node:– Add another parameter ηi to node i

– Replace ρM by ρM + Σ iηi Mi in the likelihood

– Called “differential reciprocity” model

– Computationally challenging, have implemented it.

Missing edges?• Can estimate all parameters as long as we

have some observed edges in data matrix– for each row (to estimate expansiveness)– for each column (to estimate attractiveness)

• Missing local calls -> o.k.

• OCC -> problem, entire row missing.– Impute data using reasonable assumptions m

times (typically m=3 o.k.) and combine results. Working on it.

Incorporating edge weights

• Edge weights binned into k bins using a random sample of 500 COI’s. Weights in ith bin assigned a score i.

tij unknown,

w’s weights on

dyad (i,j). tij

imputed using

Hyper geometric

Row total

Column totalk - wji

k - wij

Example• COI with 117 nodes, 172 edges.

• 14 missing edges, local calls from14 non ATT local customers to seed node (local list provided by Gus).

• One edge attribute: number of common “buddies” between TN i and TN j

• Tried Bizocity, “Localness to seed” for caller and cal lee effects, eventually settled with one caller effect viz localness to seed, no cal lee effect.

Parameter estimates.• θ = -6.28; ρ=2.76 (higher side)

• λs=.29 (TN’s local to seed have a higher tendency to call)

• γ=.41 (common acquaintances between two TN’s increase their tendency to call each other)

Pruning the big (red) nodes

• Down weight expansiveness/attractiveness based on proportion of volume going to “other”, higher value get down weighted more by adding “offset”– Renormalize the new probability matrix to have the same mass as

the original one.

• Offset function used:

aotherotherap

aotherotherf

if ))5tan(.)5tan(.1log(

if 0)(

Matrix obtained by takingunion of top 50 data edges,top 50 edges from original model,top 50 edges from pruned model.

Where to from here?

• Estimate missing OCC calls :multiple imputation.

• Scale the algorithm to get parameter estimates for every TN, maybe on a weekly basis, enrich customer signature.

• Can compute Hellinger distance between two COIs in closed form. Could be useful in supervised learning tasks like tracking Repetitive debtors.

Statistical Inference for Large Directed Graphs with Communities of Interest Deepak Agarwal.

Documents

Transcript of Statistical Inference for Large Directed Graphs with Communities of Interest Deepak Agarwal.

Consultancy Development Center2).pdfEditor in-Chief Deepak Agarwal Editor Rajesh Parpyani Executive Editor Meena Pant Published by Consultancy Development Centre (CDC) Core IV B, 2nd

COMBINED GRADUATE LEVEL EXAM., 2017 (TIER-2) LIST OF ...€¦ · 1203004636 6 vinay kumar sah 61. 1601014614 sanket agarwal 12. 1401001104 paramjit gupta 62. 1601014855 1 deepak kumar

ICML’11 Tutorial: Recommender Problems for Web Applications Deepak Agarwal and Bee-Chung Chen Yahoo! Research.

Computational Advertising: The LinkedIn Way Deepak Agarwal, LinkedIn Corporation CIKM, San Francisco Oct 30 th, 2013.

Brookings India Health Monitor · Deepak Agarwal, Randeep Guleria, Mudit Kapoor, Shamika Ravi and Ambuj Roy (using AIIMS Data) •“Restructuring MCI and Medical Education in India”

- 1 - Recommender Problems for Content Optimization Deepak Agarwal Yahoo! Research MMDS, June 15 th, 2010 Stanford, CA.

Contextual Advertising by Combining Relevance with Click Feedback Deepak Agarwal Joint work with Deepayan Chakrabarti & Vanja Josifovski Yahoo! Research.

34 YEARS IIT-JEE + 10 YRS AIEEE CHAPTER-WISE SOLVED PAPER CHEMISTRY - DEEPAK AGARWAL

Scanned by CamScannergecj.ac.in/IPS1920.pdfStartup Cell By Mr. Deepak Agarwal Lecture By F VC Exam Mr. Mayank Mewara Literary Activity by Dr. Neha Sharma LUNCH Physical Activities

Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research.

Offline Components: Collaborative Filtering in Cold …pages.cs.wisc.edu/~beechung/icml11-tutorial/ICML...Deepak Agarwal & Bee-Chung Chen @ ICML’11 18 Non -parametric Bayesian matrix

BANARAS HINDU UNIVERSITYbhu.ac.in/ims/results/result final MD MS 2011.pdf · banaras hindu university ... 1153 3 saksham agarwal deepak agarwal 89 1154 4 prasad c. chikkalingaiah

IDNO Name 2016A5PS0545 JAUNTY SINGH 2016B1A10133 … Applicants First Semester 2020... · 2016b4a10598 deepak agarwal 2016b4a20463 nimish goyal 2016b4a20616 ayush srivastava 2016b4a30596

Currency Research Desk - Karvy Commodities · rakesh.chelapareddy@karvy.com +914033216636 Deepak Agarwal deepak.agarwal@karvy.com ... concern as sharp depreciation of India’s competitors…

BRiCS Build Robots Create Science BRiCS team, IIT Kanpur Amitabha Mukerjee with Vibhanshu Abhishek, Deepak Agarwal, Deepak Arzare, Avishek Banerjee, Manish.

Activity Ranking in LinkedIn Feed - VideoLectures.nettranslectures.videolectures.net/site/normal_dl/tag=... · Activity Ranking in LinkedIn Feed. Deepak Agarwal, Bee-Chung Chen, Rupesh

DEEPAK KARNIK HRUSHIKESH KULKARNI AKRITI AGARWALjkaps/courses/ece511-f14/project/Group-10... · DEEPAK KARNIK HRUSHIKESH KULKARNI AKRITI AGARWAL . ... This Robot follows a black line

1 Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian.

An 8 GHz Ultra Wideband Transceiver Testbed · An 8 GHz Ultra Wideband Transceiver Testbed Deepak Agarwal (ABSTRACT) Software deﬁned radios have the potential of changing the fundamental

SUBASII AGARWAL · SUBASII AGARWAL N2-88/89, IRC Village, Nayapalli, Bhubaneswar-751015 Odisha Part-B** ... Mohanlal Agarwal Promoter Rajesh Agarwal Promoter Anil Agarwal Promoter