Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform...

30
Analysis of Multiview Legislative Networks with Structured Matrix Factorization: Does Twitter Influence Translate to the Real World? Shawn Mankad The University of Maryland Joint work with: George Michailidis 1 / 30

Transcript of Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform...

Page 1: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Analysis of Multiview Legislative Networks withStructured Matrix Factorization: Does Twitter

Influence Translate to the Real World?

Shawn Mankad

The University of Maryland

Joint work with: George Michailidis

1 / 30

Page 2: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Motivation

There is a growing literature that attempts to understand and exploitsocial networking platforms for resource optimization and marketing.

We develop new methodology for identifying important accounts based onstudying networks that are generated from Twitter, which has over 270million active accounts each month as of September 2014.

2 / 30

Page 3: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Motivation

Twitter platform

Twitter allows accounts to broadcast short messages, referred to as“tweets”

I A tweet that is a copy of another account’s tweet is called a “retweet”

I Within a tweet, an account can “mention” another account byreferring to their account name with the @ symbol as a prefix

I Accounts also declare the other accounts they are interested in“following”, which means the follower receives notication whenever anew tweet is posted by the followed account

Each of the three actions define networks.Collectively, they define a “multiview network”.

3 / 30

Page 4: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Motivation

Example of Multiview Networks

Twitter networks from 418 Members of Parliament (MPs) in the UnitedKingdom

Retweet Network Mentions Network Follows Network

172 Conservative MPs187 Labour43 Liberal Democrats5 MPs representing the Scottish National Party (SNP)11 MPs belonging to other parties

4 / 30

Page 5: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Motivation

Motivating Question

Can we use the network structures in Twitter to create an influencemeasure that is a surrogate for “real-life” MP influence?

There are many ways to combine network structure (communities) withnetwork statistics for the identification of influential nodes, (e.g., MPs),but it remains unclear which is the preferred method.

We integrate both steps together to address this issue through matrixfactorization.

I PageRank, HITS, etc.

5 / 30

Page 6: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Non-negative Matrix Factorization for Network Analysis

Outline

Motivation

Non-negative Matrix Factorization for Network Analysis

Structured NMF for Network Analysis

Extension to Multiview Networks

Application to the Data

6 / 30

Page 7: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Non-negative Matrix Factorization for Network Analysis

Non-negative Matrix Factorization

Let Y be an observed n × p matrix that is non-negative. NMF expresses

Y ≈ UV T ,

where U ∈ Rn×K+ ,V ∈ Rp×K

+ .

7 / 30

Page 8: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Non-negative Matrix Factorization for Network Analysis

Why NMF?1

I Better interpretability:

NMF SVDI Networks, other data from social sciences are typically non-negative

1Images modified from Xu, W., Liu, X., & Gong, Y. (2003, July). Documentclustering based on non-negative matrix factorization. In Proceedings of the 26th annualinternational ACM SIGIR conference on Research and development in informaionretrieval (pp. 267-273). ACM.

8 / 30

Page 9: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Non-negative Matrix Factorization for Network Analysis

Interpretations of NMF

Y =K∑

k=1

UkV Tk s.t.

∑k

Vjk = 1

=

Mean ofCluster k

in Rp+

. . .

× [P(Obs.1 ∈ group k), . . . ,P(Obs.n ∈ group k)] ,

Ding et al (2009) show NMF equivalence with relaxed K-means.

Yij = (UDV T )ij s.t.∑i ,j

Yij = 1,∑k

Vkj =∑k

Uik = 1

P(wi , dj) = P(wi |zk)× P(zk)× P(dj |zk),

Ding et al (2008) show NMF equivalence with PLSI.9 / 30

Page 10: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Non-negative Matrix Factorization for Network Analysis

Edge Assignment and Overlapping Communities

Yij = Ui1Vj1 + . . .+ UiKVjK ,

UikVjk measures the contribution of community k to edge Yij .

Rank 3 NMF

●●

●●

●●

●●

SVD (Spectral clustering)10 / 30

Page 11: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Structured NMF for Network Analysis

Outline

Motivation

Non-negative Matrix Factorization for Network Analysis

Structured NMF for Network Analysis

Extension to Multiview Networks

Application to the Data

11 / 30

Page 12: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Structured NMF for Network Analysis

Structured Semi-NMF

We proposemin

Λ;V≥0||Y − SΛV T ||2F ,

where S ∈ Rn×d ,Λ ∈ Rd×K , and V ∈ Rn×K+ .

Each column of S is a node-level network statistic that is calculateda-priori, e.g.,

S =

c1 b1

c2 b2

... ...cn bn

.

S are covariates that guide the matrix factorization to more interpretablesolutions.Then V can be used to rank nodes within each community.

12 / 30

Page 13: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Structured NMF for Network Analysis

Centrality Measures

If S is specified, then nodes with different types of local topologies will beemphasized in the factorizations.

For instance, in each of the following networks, X has higher centralitythan Y according to a particular measure.

13 / 30

Page 14: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Structured NMF for Network Analysis

Analysis Procedure

1. Specify S (node-level statistics), K (number of communities).

2. Perform the matrix factorization.

3. Node i has importance Ii =∑

k Vik .

4. Rank nodes according to I.

14 / 30

Page 15: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Structured NMF for Network Analysis

Semi-NMF

If S = I , thenmin

Λ;V≥0||Y − ΛV T ||2F ,

which is similar to the standard NMF model.

Thus, if S is not specified, then the usual results.

15 / 30

Page 16: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Structured NMF for Network Analysis

PageRankStructured Semi-NMF

with S = I

●●

● ●

●●

2

3

3 3

3

1

7

7

7

7

7

7

7

7

7

7

77

7

7

●●

● ●

●●

2

3

3 3

3

1

7

7

7

7

7

7

7

7

7

7

77

7

7

Structured Semi-NMFwith S = [Clustering Coefficient]

Structured Semi-NMFwith

S = [Clustering Coefficient, Betweenness, Closeness, Degree]

●●

● ●

●●

1

2

2 2

2

6

7

7

7

7

7

7

7

7

7

7

77

7

7

●●

● ●

●●

1

3

3 3

3

2

7

7

7

7

7

7

7

7

7

7

77

7

7

16 / 30

Page 17: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Extension to Multiview Networks

Outline

Motivation

Non-negative Matrix Factorization for Network Analysis

Structured NMF for Network Analysis

Extension to Multiview Networks

Application to the Data

17 / 30

Page 18: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Extension to Multiview Networks

New Objective Function

Each column of Sm is a node-level network statistic, e.g.,

Sm =

c1 b1

c2 b2

... ...cn bn

Then we propose

minΛm,Θ≥0,Vm≥0

∑m

||Ym − SmΛm(Θ + Vm)T ||2F ,

where Sm ∈ Rn×d ,Λm ∈ Rd×K , and Θ,Vm ∈ Rn×K+ .

Rows of Θ reveal the overall importance of a node to each community.

18 / 30

Page 19: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Extension to Multiview Networks

Analysis Procedure

1. Specify Sm (node-level statistics), K (number of communities).

2. Perform the matrix factorization.

3. Node i has importance Ii =∑

k Θik .

4. Rank nodes according to I.

19 / 30

Page 20: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Extension to Multiview Networks

Approximate Alternating Least Squares

Λm = (STmSm)−1ST

mAm(Θ + Vm)((Θ + Vm)T (Θ + Vm))−1

Vm = ATmSmΛm(ΛT

mSTmSmΛm)−1

Θ =∑m

ATmSmΛm(ΛT

mSTmSmΛm)−1

To overcome numerical instabilities that occur when too many elementsare exactly zero, and maintain non-negativity of Θ and Vm, we project toa small constant.

20 / 30

Page 21: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

Outline

Motivation

Non-negative Matrix Factorization for Network Analysis

Structured NMF for Network Analysis

Extension to Multiview Networks

Application to the Data

21 / 30

Page 22: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

Specifying Sm

Sm = (Betweenness,ClusteringCoefficient,Closeness,Degree)

I Clustering coefficient for a given node quantifies how close itsneighbors are to being a complete graph. A higher measure ofclustering coefficient could result from an MP “creating buzz”.

I Betweenness quantifies the control of a node on the communicationbetween other nodes in a social network, and is computed as thenumber of shortest paths going through a given node.

I Closeness is a related centrality measure that quantifies the length oftime it would take for information to spread from a given node to allother nodes.

I Degree, the number of connections a node has obtained, ensures thatactive MPs are emphasized in the factorization.

22 / 30

Page 23: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

●●●●●

●●●

●●●

● ●●

1 3 5 7 9

1520

25

Rank 2 Sm

% V

aria

nce

Exp

lain

ed

Estimated Rank of θ, Vm

●●●●

●●●●●

● ●

1 3 5 7 9

1520

25

Rank 3 Sm

% V

aria

nce

Exp

lain

ed

Estimated Rank of θ, Vm

●●●

●●●●●

●●●

●●

●●

●●

1 3 5 7 9

1520

25

Rank 4 Sm

% V

aria

nce

Exp

lain

edEstimated Rank of θ, Vm

We set K = 6 and rank of Sm = 4.

23 / 30

Page 24: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

Results: Ranking by Twitter influence

Rank Structured Semi-NMF Semi-NMF PageRank HITS1 Ed Miliband (L, 2478) Ed Miliband (L, 2478) Ian Austin (L, 3) Michael Dugher (L, 120)2 Ed Balls (L, 580) Ed Balls (L, 580) William Hague (C, 771) Ed Miliband (L, 2478)3 Tom Watson (L, 253) Michael Dugher (L, 120) Hugo Swire (C, 57) Ed Balls (L, 580)4 Michael Dugher (L, 120) Tom Watson (L, 253) Tom Watson (L, 253) Chuka Umunna (L, 203)5 Chuka Umunna (L, 203) Chuka Umunna (L, 203) Ed Balls (L, 580) Andy Burnham (L, 125)6 Rachel Reeves (L, 54) Rachel Reeves (L, 54) Michael Dugher (L, 120) Tom Watson (L, 253)7 Stella Creasy (L, 178) Chris Bryant (L, 164) Pat McFadden (L, 1) Rachel Reeves (L, 54)8 Chris Bryant (L, 164) Stella Creasy (L, 178) Ed Miliband (L, 2478) Chris Bryant (L, 164)9 Tom Harris (L, 113) Luciana Berger (L, 133) Stella Ceasy (L, 178) Diana Johnson (L, 105)

10 David Miliband (L, 489) Andy Burnham (L, 125) Matthew Hancock (C, 32) Tom Harris (L, 113)

24 / 30

Page 25: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

Results: Twitter influence does translate to the real world

Predicting future newspaper coverage with Poisson Regression and variousinfluence measures I

HeadlineCount = F (α + βI + γControls),

where Controls includes

I Age

I Gender

I Constituency Size

I Political Party

I Indicator variable denoting whether each MP represents aconstituency within the city of London.

25 / 30

Page 26: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

UK UK without D.Cameron Irish

0

50

100

150

200

0

50

100

150

0

5

10

NonePageRank

HITSSem

i−NMF

Structured

Semi−NM

F

NonePageRank

HITSSem

i−NMF

Structured

Semi−NM

F

NonePageRank

HITSSem

i−NMF

Structured

Semi−NM

F

Method

RM

SE

26 / 30

Page 27: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

Using Θ and Vm to identify interesting substructure:

(a) Retweet Network (b) Mentions Network (c) Follows Network

27 / 30

Page 28: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

Wrap up

Key idea: Use network statistics to guide the factorization to bettersolutions.

1. If we can identify the right local topology, then we can overcome nothaving dynamic data for certain tasks.

2. The data is exclusively link “meta-data”.I Content analysis can potentially be avoided with network analysis tools

for identifying influential users.I Important for applications in marketing and intelligence gathering.

Thank you!

28 / 30

Page 29: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

Betweenness Centrality

In marketing theory, these are the types:

1. Bridge Node2. Gateway Node3. Creation Node4. Consumption Node

Viral marketing depends heavily on high betweeness bridge nodes!29 / 30

Page 30: Analysis of Multiview Legislative Networks with Structured ... · Motivation Twitter platform Twitter allows accounts tobroadcast short messages, referred to as \tweets" I A tweet

Application to the Data

Clustering Coefficient

The clustering coefficient for node B asks, if A–B and B–C, is A–Cconnected?

The clustering coefficient for a given node is defined as the ratio of closedtriads to total possible closed triads.

30 / 30