Bayesian Poisson Tensor Factorization for Inferring ...as5530/ScheinPaisleyBleiWallach2015...Modern...

1
Apr 1990 Jul 1991 Nov 1992 Mar 1994 Jul 1995 Nov 1996 Mar 1998 Jul 1999 Oct 2000 Feb 2002 Jun 2003 Oct 2004 Feb 2006 Jun 2007 Oct 2008 0.000 0.005 0.010 0.015 0.020 0.025 0.030 Russia Tajikistan Iran Afghanistan Uzbekistan Kyrgyzstan Kazakhstan Pakistan Turkmenistan China 0.0 0.1 0.2 0.3 0.4 0.5 Afghanistan Russia Tajikistan Iran Uzbekistan Kyrgyzstan Kazakhstan Turkmenistan Pakistan China Consult Cooperate (Diplomatic) Intend to Cooperate Make Statement Fight Appeal Cooperate (Material) Coerce Aid Yield time steps senders receivers action types Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts John Paisley David M. Blei Hanna Wallach Aaron Schein UMass Amherst Columbia University Columbia University Microsoft Research 1 3 5 7 9 11 13 15 17 19 Dierent values of shape a 1 0 5 10 15 20 25 30 35 40 E[]= a/b G[] = exp((a))/b Mode()=(a - 1)/b 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Dierent values of shape a 2 (0, 1] 0.0 0.5 1.0 1.5 2.0 E[]= a/b G[] = exp((a))/b 5 10 15 20 25 30 35 40 45 50 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 E[]= 20.00 G[]= 19.01 Mode()= 18.00 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0 2 4 6 8 E[]= 0.60 G[]= 0.06 0 2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 a=2, b=1 a=0.1, b=1 a=20, b=4 Dyadic event data Picture © Kalev Leetaru, available on the GDELT blog Poisson tensor factorization Bayesian PTF Y Fitting this model is a form of nonnegative CP-Decomposition / PARAFAC: Add Gamma priors over factors Sparsity-inducing with Posterior inference not point estimation < 1 Variational inference Q((1) , (2) , (3) , (4) ) = Y i,k Q((1) ik ) Y j,k Q((2) jk ) Y a,k Q((3) ak ) Y t,k Q((4) tk ) Optimize variational parameters to minimize the KL-divergence of the exact posterior from Q! Point estimates under the Q-distribution: arithmetic expectation geometric expectation Predictive results Exploratory results Motivation International relations: the field of political science that studies how countries interact Modeling goal: Infer multilateral relations from dyadic event data Extracted from newspapers: Modern datasets are bigger, e.g., GDELT: Sparse event counts Multilateral relation A coherent thread of international events. 1. sender countries 2. receiver countries 3. action types 4. time steps Characterized by: From Schrodt (1993) Event Data in Foreign Policy Analysis: We compare Bayesian PTF to PTF and NTF-LS (NTF with Euclidean distance). Models are either fit to a dense region of the tensor and tested on a sparse region or vice-versa. When models are fit to a sparse data, B-PTF generalizes much better. Jan 2011 Mar 2011 May 2011 Jun 2011 Aug 2011 Sep 2011 Nov 2011 Jan 2012 Feb 2012 Apr 2012 May 2012 Jul 2012 Sep 2012 Oct 2012 Dec 2012 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Ecuador United Kingdom United States Sweden Australia Spain Russia Japan Belarus Cuba 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Ecuador United Kingdom United States Sweden Australia Russia Belarus Spain Paraguay Europe Consult Aid Appeal Make Statement Coerce Disapprove Intend to Cooperate Cooperate (Diplomatic) Reject Cooperate (Material) time steps senders receivers action types European debt crisis: “Europe” is the EU. Germany and Greece are top actors and all diplomatic actions (e.g., Aid) are represented. 1999 East Timorese Crisis: Violence erupts after East Timorese independence claim from Indonesia. Parts of Timor were controlled by Portugal (former colonies). Australia and US served as mediators. 2011 Libyan intervention: NATO, US, France, Italy, UK bomb targets in Libya to stop attacks on civilians by the Qaddafi regime. US-led War in Afghanistan: Peaks immediately following 9/11 (and invasion). Senders, receivers and action types corroborate. Notice the “blip” in 1998 when Clinton bombed al-Qaeda maps in Afghanistan. Julian Assange seeks asylum: Wikileaks founder, an Australian national, who was sought by the US and Sweden for separate crimes, requests asylum from Ecuador at their embassy in the UK. The “Stans”: Central Asian relations

Transcript of Bayesian Poisson Tensor Factorization for Inferring ...as5530/ScheinPaisleyBleiWallach2015...Modern...

Page 1: Bayesian Poisson Tensor Factorization for Inferring ...as5530/ScheinPaisleyBleiWallach2015...Modern datasets are bigger, e.g., GDELT: Sparse event counts Multilateral relation A coherent

Apr 1990 Jul 1991 Nov 1992 Mar 1994 Jul 1995 Nov 1996 Mar 1998 Jul 1999 Oct 2000 Feb 2002 Jun 2003 Oct 2004 Feb 2006 Jun 2007 Oct 20080.000

0.005

0.010

0.015

0.020

0.025

0.030

Rus

sia

Taj

ikista

n

Iran

Afg

hani

stan

Uzb

ekista

nKyr

gyzs

tan

Kaz

akhs

tan

Pak

ista

nTur

kmen

ista

n

Chi

na

0.0

0.1

0.2

0.3

0.4

0.5

Afg

hani

stan

Rus

sia

Taj

ikista

n

Iran

Uzb

ekista

nKyr

gyzs

tan

Kaz

akhs

tan

Tur

kmen

ista

n

Pak

ista

n

Chi

na

Con

sult

Coo

pera

te(D

iplo

mat

ic)

Inte

ndto

Coo

pera

teM

ake

Stat

emen

t

Figh

t

App

eal

Coo

pera

te(M

ater

ial)

Coe

rce

Aid

Yield

time steps

senders

receivers

action types

Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts

John Paisley David M. Blei Hanna WallachAaron ScheinUMass Amherst Columbia University Columbia University Microsoft Research

1 3 5 7 9 11 13 15 17 19

Di↵erent values of shape a � 1

0

5

10

15

20

25

30

35

40

E[✓] = a/b

G[✓] = exp( (a))/b

Mode(✓) = (a � 1)/b

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Di↵erent values of shape a 2 (0, 1]

0.0

0.5

1.0

1.5

2.0

E[✓] = a/b

G[✓] = exp( (a))/b

5 10 15 20 25 30 35 40 45 500.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

E[✓] = 20.00

G[✓] = 19.01

Mode(✓) = 18.00

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.80

2

4

6

8E[✓] = 0.60

G[✓] = 0.06

0 2 4 6 80.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

a=2, b=1

a=0.1, b=1

a=20, b=4

Dyadic event data

Picture © Kalev Leetaru, available on the GDELT blog

Poisson tensor factorization

Bayesian PTF

Y

Fitting this model is a form of nonnegative CP-Decomposition / PARAFAC:

• Add Gamma priors over factors• Sparsity-inducing with • Posterior inference not point estimation

↵ < 1

Variational inference

Q(⇥(1),⇥(2),⇥(3),⇥(4))

=Y

i,k

Q(✓(1)ik )Y

j,k

Q(✓(2)jk )Y

a,k

Q(✓(3)ak )Y

t,k

Q(✓(4)tk )

Optimize variational parameters to minimize the KL-divergence of the exact posterior from Q!

Point estimates under the Q-distribution:

arithmetic expectation

geometric expectation

Predictive results

Exploratory resultsMotivation• International relations: the field of political

science that studies how countries interact• Modeling goal: Infer multilateral relations

from dyadic event data

Extracted from newspapers:

Modern datasets are bigger, e.g., GDELT:

Sparse event counts

Multilateral relationA coherent thread of international events.

1. sender countries2. receiver countries3. action types4. time steps

Characterized by:

From Schrodt (1993) Event Data in Foreign Policy Analysis:

We compare Bayesian PTF to PTF and NTF-LS (NTF with Euclidean distance). Models are either fit to a dense region of the tensor and tested on a sparse region or vice-versa.

When models are fit to a sparse data, B-PTF generalizes much better.

Jan 2011 Mar 2011 May 2011 Jun 2011 Aug 2011 Sep 2011 Nov 2011 Jan 2012 Feb 2012 Apr 2012 May 2012 Jul 2012 Sep 2012 Oct 2012 Dec 20120.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Ecu

ador

Uni

ted

Kin

gdom

Uni

ted

Stat

es

Swed

en

Aus

tral

ia

Spai

n

Rus

sia

Japa

n

Belar

us

Cub

a

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Ecu

ador

Uni

ted

Kin

gdom

Uni

ted

Stat

es

Swed

en

Aus

tral

ia

Rus

sia

Belar

us

Spai

n

Par

agua

y

Eur

ope

Con

sult

Aid

App

eal

Mak

eSt

atem

ent

Coe

rce

Disap

prov

eIn

tend

toCoo

pera

teCoo

pera

te(D

iplo

mat

ic)

Rejec

tCoo

pera

te(M

ater

ial)

time steps

senders

receivers

action types

European debt crisis: “Europe” is the EU. Germany and Greece are top actors and all diplomatic actions (e.g., Aid) are represented.

1999 East Timorese Crisis: Violence erupts after East Timorese independence claim from Indonesia. Parts of Timor were controlled by Portugal (former colonies). Australia and US served as mediators.

2011 Libyan intervention: NATO, US, France, Italy, UK bomb targets in Libya to stop attacks on civilians by the Qaddafi regime. US-led War in Afghanistan: Peaks immediately following 9/11 (and

invasion). Senders, receivers and action types corroborate. Notice the “blip” in 1998 when Clinton bombed al-Qaeda maps in Afghanistan.

Julian Assange seeks asylum: Wikileaks founder, an Australian national, who was sought by the US and Sweden for separate crimes, requests asylum from Ecuador at their embassy in the UK. The “Stans”: Central Asian relations