Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach...

62
Machine Learning for Complex Social Processes Hanna Wallach University of Massachusetts Amherst [email protected]

Transcript of Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach...

Page 1: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Machine Learning for Complex Social Processes

Hanna WallachUniversity of Massachusetts Amherst

[email protected]

Page 2: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 3

Complex Social Processes

Page 3: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 4

National Institutes of Health

Page 4: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 5

United States Patent System

Page 5: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 6

Representatives and Constituents

Page 6: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 7

Social Processes: Structure

Page 7: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 8

Social Processes: Content

$$$

$$$

Page 8: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 9

Social Processes: Dynamics

$$$

$$$

Page 9: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 10

Social Processes: Dynamics

$$$

$$$

Page 10: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 11

Social Processes: Dynamics

$$$

$$$

Page 11: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 12

Social Processes: Dynamics

$$$

$$$

Page 12: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 13

Social Processes: Dynamics

$$$

$$$

Page 13: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 14

Modeling Social Processes

“Policy-makers or computer scientists may be interested in finding the needle in the haystack (such as a potential terrorist threat or the right web page to display from a search), but social scientists are more commonly interested in characterizing the haystack.”

― King & Hopkins, 2010

Page 14: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 15

Predictive Analyses

$$$ ???

Page 15: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 16

Explanatory Analyses

$$$???

Page 16: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 17

Exploratory Analyses

$$$ = ???

Page 17: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 18

Bayesian Latent Variable Models

● Modeling challenges:

– Aggregating and representing large data sets

– Handling data from sources with disparate emphases

– Efficiently reasoning under uncertain information

● Bayesian latent (i.e., hidden) variable models:

– Appropriate for prediction, explanation, and exploration

– Interpretable structure, not “black-box” models

– Powerful, flexible, widely applicable...

Page 18: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 19

This Talk

Page 19: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 20

This Talk

Page 20: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 21

Communication Networks

Page 21: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 22

Communication Networks

Page 22: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 23

Communication Networks

Page 23: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 24

Observing Communication Networks

Page 24: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 25

Structure and Content

Subject: New Hanover County Public Safety Talk GroupsFrom: “Lee, Warren” <[email protected]>To: “Pope, Troy W.” <[email protected]>Cc: …

Troy,

I wanted to give you anupdate on our progress inmoving towards a fullydigital public safety radiosystem in New HanoverCounty...

Page 25: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 26

New Hanover County, NC

Page 26: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 27

NHC Email Network

Page 27: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 28

Levels of Granularity

= ???

Page 28: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 29

Levels of Granularity

= ???

Page 29: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 30

Levels of Granularity

= ??? = ??? +

Page 30: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 31

Principled Visualization

● Common workflow:

– Construct a statistical model of observed data

– Perform post-hoc visualization to draw conclusions about the model and its relationship to the data

● Problem: visualization algorithms can produce visual artifacts that may be misleading

● Solution: visualizations should be directly interpretable in terms of the model and its relationship to the data

Page 31: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 32

Exploring Structure and Content

● Facilitate exploratory analysis of topic-specific communication patterns by learning

– Topics of communication

– Topic-specific communication subnetworks

– Principled visualizations of topic-sepcific subnetwork

● Draw upon ideas from two well-known frameworks:

– Statistical topic modeling

– Latent space network modeling

Page 32: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 33

Topics and Words

gene ncbi computer patent

genome national modeling patenting

dna information data claims

genetic technology algorithm intellectual

genes database analyses property

sequence molecular method rights

human biology model ip

protein genbank information innovation

rna pubmed efficient claim

genomic references complexity claiming

... ... ... ...

pro

babili

ty

Page 33: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 34

Documents and Topics

Page 34: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 35

Latent Dirichlet Allocation

probability

...

??

?? ?

[Blei, Ng & Jordan, '03]

Page 35: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 36

Individuals and Latent Spaces

every individual is associated

with a position in latent space

Page 36: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 37

Latent Space Network Model

probability of communication

depends on distance in

latent space

[Hoff et al., '02]

Page 37: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 38

Topics and Spaces

gene ncbi computer patent

genome national modeling patenting

dna information data claims

genetic technology algorithm intellectual

... ... ... ...

9

456

12

3

710

8 2

109

8

43

65

7

1210

9

84

3

65

71

2

98 10

14

3

56

7

Page 38: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 39

A New Model...

● Model email content using LDA

● Model recipients using topic-specific latent spaces

● Generative process:

– Generate topics and topic-specific latent spaces

– Generate document-specific topic distributions

– Generate recipients using latent spaces

– Generate words using topics

[Krafft et al., '12]

Page 39: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 40

Graphical Model

topicinference

subnetworkinference

subnetworkvisualization

Page 40: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 41

Experimental Evaluation

● Quantitative model validation:

– Link prediction performance vs. baselines

– Posterior predictive checks

– Topic coherence vs. LDA

● Exploratory analysis:

– Modularity: disconnected components

– Assortativity: components of a single “type”

Page 41: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 42

Link Prediction

Page 42: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 43

Posterior Predictive Checks

degree geodesic distance

transitivity dyadic intensity

Page 43: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 44

Topic Coherence

[Mimno et al., '11]

Page 44: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 45

Organization Structure

Page 45: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 46

High Modularity, High Assortativity

Meeting Schedulingmeeting march board agenda week

Page 46: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 47

High Modularity, Low Assortativity

Public Signagechange signs sign process ordinance

Page 47: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 48

Low Modularity, Low Assortativity

Public Relationscity breakdown information give

Page 48: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 49

Low Modularity, High Assortativity

Broadcast Messagesfw fyi bulletin summary week

Page 49: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 50

Take Away Message

● Explanatory and exploratory analyses matter

● Communication networks are important:

– Critical to all kinds of collaborative problem solving

– … but can be hard to directly observe

● Topic-partitioned multinetwork embedding:

– Good model of structure and content

– Emphasizes principled visualization

Page 50: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 51

This Talk

Page 51: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 52

Transparency in the US

● 52.8 million pages reviewed for declassification

● 26.7 million pages declassified

● $11.36 billion spent on administration of the US government classification system

[ISOO, 2011]

Page 52: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 53

● Date issued

● Date declassified

● Document type

● Source institution

● Classification level

● Document text

Declassified Documents

[Gale, 2012]

Page 53: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 54

Inferred Topics

% o

f avera

ge d

ocu

men

t

church

catholic

pope

vatican

religious

bishop

cardinal

archbishop

priests

paul

...

Page 54: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 55

draft

service

manpower

volunteer

selective

age

calls

volunteers

deferments

pay

...

Inferred Topics

% o

f avera

ge d

ocu

men

t

Page 55: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 56

atomic

weapon

bomb

bombs

weapon

energy

thermonuclear

development

hydrogen

stockpile

...

Inferred Topics

% o

f avera

ge d

ocu

men

t

Page 56: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 57

Predictive Analyses

$$$ ???

Page 57: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 58

Classification Duration

declassification date4/6/93

creation date2/21/67 26 years

Page 58: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 59

Survival Analysis

● Statistical methods for modeling durations:

– Biology/medicine: organism death

– Engineering: component failure

– Social sciences: event durations (e.g., recidivism)

● Goal: model effect on survival time of covariates, e.g.,

– Vaccine treatments

– Temperature differences

– Job placement or education programs

Page 59: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 60

Duration and Content

14 years 57 years

Page 60: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 61

● Topics provide information about classification durations

● Goal: incorporate durations into the probabilistic model

● Infer latent topics using both textual and temporal information

Modeling Text and Duration

Page 61: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Hanna Wallach :: UMass Amherst :: 62

To Conclude...

Page 62: Machine Learning for Complex Social Processeswallach/talks/2013-07-03_MSR_NE.pdf · Hanna Wallach :: UMass Amherst :: 18 Bayesian Latent Variable Models Modeling challenges: – Aggregating

Thanks!

Acknowledgements: P. Krafft, J. Moore, B. Desmarais, R. Shorey

[email protected] http://www.cs.umass.edu/~wallach/