Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of...

68
Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha Mohanty, Andres Corrada

Transcript of Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of...

Page 1: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Statistical Models of (Social) Networks

Andrew McCallum

Computer Science Department

University of Massachusetts Amherst

Joint work with

Xuerui Wang, Natasha Mohanty, Andres Corrada

Page 2: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 3: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Workplace effectiveness ~ Ability to leverage network of acquaintances

But filling Contacts DB by hand is tedious, and incomplete.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Email Inbox Contacts DB

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

WWW

Automatically

Managing and UnderstandingConnections of People in our Email World

Page 4: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

System Overview

ContactInfo andPerson Name

Extraction

Person Name

Extraction

NameCoreference

HomepageRetrieval

Social NetworkAnalysis

KeywordExtraction

CRFWWW

names

Email QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 5: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

An ExampleTo: “Andrew McCallum” [email protected]

Subject ...

First Name:

Andrew

Middle Name:

Kachites

Last Name:

McCallum

JobTitle: Associate Professor

Company: University of Massachusetts

Street Address:

140 Governor’s Dr.

City: Amherst

State: MA

Zip: 01003

Company Phone:

(413) 545-1323

Links: Fernando Pereira, Sam Roweis,…

Key Words:

Information extraction,

social network,…

Search for new people

Page 6: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Summary of Results

Token

Acc

Field

Prec

Field

Recall

Field

F1

CRF 94.50 85.73 76.33 80.76

Person Keywords

William Cohen Logic programming

Text categorization

Data integration

Rule learning

Daphne Koller Bayesian networks

Relational models

Probabilistic models

Hidden variables

Deborah McGuiness

Semantic web

Description logics

Knowledge representation

Ontologies

Tom Mitchell Machine learning

Cognitive states

Learning apprentice

Artificial intelligence

Contact info and name extraction performance (25 fields)

Example keywords extracted

1. Expert Finding: When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!)

2. Social Network Analysis: Understand the social structure of your organization. Suggest structural changes for improved efficiency.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 7: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Outline

• Social Network Analysis with (Language) Attributes

– Roles and Topics (Author-Recipient-Topic Model)

– Groups and Topics (Group-Topic Model)

• Demo: Rexa, a Web portal for researchers

Page 8: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Outline

• Social Network Analysis with (Language) Attributes

– Roles and Topics (Author-Recipient-Topic Model)

– Groups and Topics (Group-Topic Model)

• Demo: Rexa, a Web portal for researchers

Page 9: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Clustering words into topics withLatent Dirichlet Allocation

[Blei, Ng, Jordan 2003]

Sample a distributionover topics,

For each document:

Sample a topic, z

For each word in doc

Sample a wordfrom the topic, w

Example:

70% Iraq war30% US election

Iraq war

“bombing”

GenerativeProcess:

Page 10: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

STORYSTORIESTELL

CHARACTERCHARACTERS

AUTHORREADTOLD

SETTINGTALESPLOT

TELLINGSHORTFICTIONACTIONTRUE

EVENTSTELLSTALENOVEL

MINDWORLDDREAMDREAMSTHOUGHT

IMAGINATIONMOMENT

THOUGHTSOWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEASWIM

SWIMMINGPOOLLIKESHELLSHARKTANK

SHELLSSHARKSDIVING

DOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASESGERMSFEVERCAUSECAUSEDSPREADVIRUSES

INFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSING

SMALLPOXBODY

INFECTIONSCERTAIN

Example topicsinduced from a large collection of text

FIELDMAGNETICMAGNETWIRE

NEEDLECURRENT

COILPOLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCEMAGNETS

BEMAGNETISM

POLEINDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGYFIELD

PHYSICSLABORATORY

STUDIESWORLD

SCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELD

PLAYERBASKETBALL

COACHPLAYEDPLAYING

HITTENNISTEAMSGAMESSPORTSBAT

TERRY

JOBWORKJOBS

CAREEREXPERIENCEEMPLOYMENTOPPORTUNITIES

WORKINGTRAININGSKILLS

CAREERSPOSITIONS

FINDPOSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

[Tennenbaum et al]

Page 11: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

STORYSTORIESTELL

CHARACTERCHARACTERS

AUTHORREADTOLD

SETTINGTALESPLOT

TELLINGSHORTFICTIONACTIONTRUE

EVENTSTELLSTALENOVEL

MINDWORLDDREAMDREAMSTHOUGHT

IMAGINATIONMOMENT

THOUGHTSOWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEASWIM

SWIMMINGPOOLLIKESHELLSHARKTANK

SHELLSSHARKSDIVING

DOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASESGERMSFEVERCAUSECAUSEDSPREADVIRUSES

INFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSING

SMALLPOXBODY

INFECTIONSCERTAIN

FIELDMAGNETICMAGNETWIRE

NEEDLECURRENT

COILPOLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCEMAGNETS

BEMAGNETISM

POLEINDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGYFIELD

PHYSICSLABORATORY

STUDIESWORLD

SCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELDPLAYER

BASKETBALLCOACHPLAYEDPLAYING

HITTENNISTEAMSGAMESSPORTSBAT

TERRY

JOBWORKJOBS

CAREEREXPERIENCEEMPLOYMENTOPPORTUNITIES

WORKINGTRAININGSKILLS

CAREERSPOSITIONS

FINDPOSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

Example topicsinduced from a large collection of text

[Tennenbaum et al]

Page 12: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

From LDA to Author-Recipient-Topic(ART)

Page 13: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Inference and Estimation

Gibbs Sampling:- Easy to implement- Reasonably fast

r

Page 14: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Enron Email Corpus

• 250k email messages• 23k people

Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)From: [email protected]: [email protected]: Enron/TransAltaContract dated Jan 1, 2001

Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions.

DP

Debra PerlingiereEnron North America Corp.Legal Department1400 Smith Street, EB 3885Houston, Texas [email protected]

Page 15: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Topics, and prominent senders / receiversdiscovered by ARTTopic names,

by hand

Page 16: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Topics, and prominent senders / receiversdiscovered by ART

Beck = “Chief Operations Officer”Dasovich = “Government Relations Executive”Shapiro = “Vice President of Regulatory Affairs”Steffes = “Vice President of Government Affairs”

Page 17: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Comparing Role Discovery

connection strength (A,B) =

distribution overauthored topics

Traditional SNA

distribution overrecipients

distribution overauthored topics

Author-TopicART

Page 18: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Comparing Role Discovery Tracy Geaconne Dan McCarty

Traditional SNA Author-TopicART

Similar roles Different rolesDifferent roles

Geaconne = “Secretary”McCarty = “Vice President”

Page 19: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Traditional SNA Author-TopicART

Different roles Very similarNot very similar

Geaconne = “Secretary”Hayslett = “Vice President & CTO”

Comparing Role Discovery Tracy Geaconne Rod Hayslett

Page 20: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Traditional SNA Author-TopicART

Different roles Very differentVery similar

Blair = “Gas pipeline logistics”Watson = “Pipeline facilities planning”

Comparing Role Discovery Lynn Blair Kimberly Watson

Page 21: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

McCallum Email Corpus 2004

• January - October 2004• 23k email messages• 825 people

From: [email protected]: NIPS and ....Date: June 14, 2004 2:27:41 PM EDTTo: [email protected]

There is pertinent stuff on the first yellow folder that is completed either travel or other things, so please sign that first folder anyway. Then, here is the reminder of the things I'm still waiting for:

NIPS registration receipt.CALO registration receipt.

Thanks,Kate

Page 22: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

McCallum Email Blockstructure

Page 23: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Four most prominent topicsin discussions with ____?

Page 24: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 25: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Two most prominent topicsin discussions with ____?

Words Problove 0.030514house 0.015402

0.013659time 0.012351great 0.011334hope 0.011043dinner 0.00959saturday 0.009154left 0.009154ll 0.009009

0.008282visit 0.008137evening 0.008137stay 0.007847bring 0.007701weekend 0.007411road 0.00712sunday 0.006829kids 0.006539flight 0.006539

Words Probtoday 0.051152tomorrow 0.045393time 0.041289ll 0.039145meeting 0.033877week 0.025484talk 0.024626meet 0.023279morning 0.022789monday 0.020767back 0.019358call 0.016418free 0.015621home 0.013967won 0.013783day 0.01311hope 0.012987leave 0.012987office 0.012742tuesday 0.012558

Page 26: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 27: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Pairs with highestrank difference between ART & SNA

5 other professors3 other ML researchers

Page 28: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Role-Author-Recipient-Topic Models

Page 29: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Results with RART:People in “Role #3” in Academic Email

• olc lead Linux sysadmin• gauthier sysadmin for CIIR group• irsystem mailing list CIIR sysadmins• system mailing list for dept. sysadmins• allan Prof., chair of “computing

committee”• valerie second Linux sysadmin• tech mailing list for dept. hardware• steve head of dept. I.T. support

Page 30: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Roles for allan (James Allan)

• Role #3 I.T. support• Role #2 Natural Language

researcher

Roles for pereira (Fernando Pereira) • Role #2 Natural Language researcher• Role #4 SRI CALO project participant• Role #6 Grant proposal writer• Role #10 Grant proposal coordinator• Role #8 Guests at McCallum’s house

Page 31: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Traditional SNA Author-TopicART

Block structured NotNot

ART: Roles but not Groups

Enron TransWestern Division

Page 32: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Outline

• Social Network Analysis with (Language) Attributes

– Roles and Topics (Author-Recipient-Topic Model)

– Groups and Topics (Group-Topic Model)

• Demo: Rexa, a Web portal for researchers

Page 33: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Groups and Topics

• Input:– Observed relations between people– Attributes on those relations (text, or categorical)

• Output:– Attributes clustered into “topics”– Groups of people---varying depending on topic

Page 34: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Discovering Groups from Observed Set of Relations

Admiration relations among six high school students.

Student Roster

AdamsBennettCarterDavisEdwardsFrederking

Academic Admiration

Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)

Page 35: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Adjacency Matrix Representing Relations

A B C D E FABCDEF

A B C D E FG1G2G1G2G3G3

G1G2G1G2G3G3

ABCDEF

A C B D E FG1G1G2G2G3G3

G1G1G2G2G3G3

ACBDEF

Student Roster

AdamsBennettCarterDavisEdwardsFrederking

Academic Admiration

Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)

Page 36: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 37: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Group Model: Partitioning Entities into Groups

2Sv

β

2Gγ α

Stochastic Blockstructures for Relations[Nowicki, Snijders 2001]

S: number of entities

G: number of groups

Enhanced with arbitrary number of groups in [Kemp, Griffiths, Tenenbaum 2004]

BetaDirichlet

Binomial

SgMultinomial

Page 38: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Two Relations with Different Attributes

A C B D E FG1G1G2G2G3G3

G1G1G2G2G3G3

A C E B D FG1G1G1G2G2G2

G1G1G1G2G2G2

ACEBDF

Student Roster

AdamsBennettCarterDavisEdwardsFrederking

Academic Admiration

Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)

Social Admiration

Soci(A, B) Soci(A, D) Soci(A, F)Soci(B, A) Soci(B, C) Soci(B, E)Soci(C, B) Soci(C, D) Soci(C, F)Soci(D, A) Soci(D, C) Soci(D, E)Soci(E, B) Soci(E, D) Soci(E, F)Soci(F, A) Soci(F, C) Soci(F, E)

ACBDEF

Page 39: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Goal:Model relations and their (textual) attributes simultaneously to obtain better groups and more meaningful topics.

budget, funding, annual, cash

document, corrections, review, annual

Page 40: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

The Group-Topic Model: Discovering Groups and Topics Simultaneously

bNw

t

B

T

φ

η

DirichletMultinomial

Uniform

2Sv

β

2Gγ α

Beta

Dirichlet

Binomial

SgMultinomial

T

Page 41: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Inference and EstimationGibbs Sampling:- Many r.v.s can be integrated out- Easy to implement- Reasonably fast

We assume the relationship is symmetric.

Page 42: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Dataset #1:U.S. Senate

• 16 years of voting records in the US Senate (1989 – 2005)

• a Senator may respond Yea or Nay to a resolution

• 3423 resolutions with text attributes (index terms)

• 191 Senators in total across 16 years

S.543 Title: An Act to reform Federal deposit insurance, protect the deposit insurance funds, recapitalize the Bank Insurance Fund, improve supervision and regulation of insured depository institutions, and for other purposes. Sponsor: Sen Riegle, Donald W., Jr. [MI] (introduced 3/5/1991) Cosponsors (2) Latest Major Action: 12/19/1991 Became Public Law No: 102-242. Index terms: Banks and banking Accounting Administrative fees Cost control Credit Deposit insurance Depressed areas and other 110 terms

Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen (D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea Bradley (D-NJ), Nay Conrad (D-ND), Nay ……

Page 43: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Topics Discovered (U.S. Senate)Education Energy

MilitaryMisc.

Economic

education energy government federalschool power military laboraid water foreign insurance

children nuclear tax aiddrug gas congress tax

students petrol aid businesselementary research law employeeprevention pollution policy care

Mixture of Unigrams

Group-Topic Model

Education

+ DomesticForeign Economic

Social Security

+ Medicareeducation foreign labor socialschool trade insurance securityfederal chemicals tax insuranceaid tariff congress medical

government congress income caretax drugs minimum medicare

energy communicable wage disabilityresearch diseases business assistance

Page 44: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Groups Discovered (US Senate)

Groups from topic Education + Domestic

Page 45: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Senators Who Change Coalition the most Dependent on Topic

e.g. Senator Shelby (D-AL) votes with the Republicans on Economicwith the Democrats on Education + Domesticwith a small group of maverick Republicans on Social Security + Medicaid

Page 46: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Dataset #2:The UN General Assembly

• Voting records of the UN General Assembly (1990 - 2003)

• A country may choose to vote Yes, No or Abstain

• 931 resolutions with text attributes (titles)

• 192 countries in total

• Also experiments later with resolutions from 1960-2003

Vote on Permanent Sovereignty of Palestinian People, 87th plenary meeting

The draft resolution on permanent sovereignty of the Palestinian people in the occupied Palestinian territory, including Jerusalem, and of the Arab population in the occupied Syrian Golan over their natural resources (document A/54/591) was adopted by a recorded vote of 145 in favour to 3 against with 6 abstentions:

In favour: Afghanistan, Argentina, Belgium, Brazil, Canada, China, France, Germany, India, Japan, Mexico, Netherlands, New Zealand, Pakistan, Panama, Russian Federation, South Africa, Spain, Turkey, and other 126 countries. Against: Israel, Marshall Islands, United States. Abstain: Australia, Cameroon, Georgia, Kazakhstan, Uzbekistan, Zambia.

Page 47: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Topics Discovered (UN)

Everything Nuclear

Human RightsSecurity

in Middle East

nuclear rights occupiedweapons human israel

use palestine syriaimplementation situation security

countries israel calls

Mixture ofUnigrams

Group-TopicModel

NuclearNon-proliferation

Nuclear Arms Race

Human Rights

nuclear nuclear rightsstates arms humanunited prevention palestine

weapons race occupiednations space israel

Page 48: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

GroupsDiscovered(UN)The countries list for each group are ordered by their 2005 GDP (PPP) and only 5 countries are shown in groups that have more than 5 members.

Page 49: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Do We Get Better Groups with the GT Model?

1. Cluster bills into topics using mixture of unigrams;

2. Apply group model on topic-specific subsets of bills.

Agreement Index (AI) measures group cohesion. Higher, better.

Datasets Avg. AI for Baseline Avg. AI for GT p-value

Senate 0.8198 0.8294 <.01

UN 0.8548 0.8664 <.01

1. Jointly cluster topic and groups at the same time using the GT model.

Baseline Model GT Model

Page 50: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Groups and Topics, Trends over Time (UN)

Page 51: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Outline

• Social Network Analysis with (Language) Attributes

– Roles and Topics (Author-Recipient-Topic Model)

– Groups and Topics (Group-Topic Model)

• Demo: Rexa, a Web portal for researchers

Page 52: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Previous Systems

Page 53: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 54: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

ResearchPaper

Cites

Previous Systems

Page 55: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

ResearchPaper

Cites

Person

UniversityVenue

Grant

Groups

Expertise

More Entities and Relations

Page 56: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 57: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 58: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 59: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 60: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 61: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 62: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 63: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 64: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 65: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.
Page 66: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Outline

• Examples of IE and Data Mining.

• Brief introduction of Conditional Random Fields

• Joint inference: Motivation and examples

– Joint Labeling of Cascaded Sequences (Belief Propagation)

– Joint Labeling for Transfer Learning (Piecewise Training & BP)

– Joint Labeling of Distant Entities (BP by Tree Reparameterization)

– Joint Co-reference Resolution (Graph Partitioning)

– Joint Segmentation and Co-ref (Sparse BP)

• Joint Topic Discovery and Social Network Analysis

– Roles and Topics (Author-Recipient-Topic Model)

– Groups and Topics (Group-Topic Model)

• Demo: Rexa, a Web portal for researchers

Page 67: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

End of Talk

Page 68: Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Summary• Traditionally, SNA examines links,

but not the language content on those links.

• Presented ART, an Bayesian network for messages sent in a social network: captures topics and role-similarity.

• RART explicitly represents roles.

• Additional work– Group-Topic model discovers groups

and clusters attributes of relations.[Wang, Mohanty, McCallum, LinkKDD 2005]