Quantifying and Bursting the Online Filter Bubble

Post on 21-Feb-2017

48 views 0 download

Transcript of Quantifying and Bursting the Online Filter Bubble

1

QUANTIFYING AND BURSTING THE ONLINE FILTER BUBBLEKIRAN GARIMELLAKCL, 13 FEB 2017

2HELLO!

2011

2013

2014

BACHELORS &

MASTERS IN

COMPUTER SCIENCE

BARCELONA

DOHA, QATAR

HYDERABAD,INDIA HELSINKI,

FINLAND

RESEARCH ENGINEER

RESEARCH ASSOCIAT

EPHD

ADVISOR: ARISTIDES

GIONIS

EXPECTED: SEPT 2017

3

OVERVIEW▸ Motivation▸ Summary of the thesis▸ Shallow dive into one sub-topic

4

SOCIAL MEDIA BUBBLE

5

FILTER BUBBLE

6

ECHO CHAMBERS

7

THE POLARIZATION CYCLE

USER HOMOPH

ILY

ALGORITHMIC

PERSONALIZATION

Increased Polarization

8

POLARIZATION - TWITTER

9

BLOGS

10

INSTAGRAM

11

US SENATE VOTES

12

HOW CAN WE DEAL WITH THE POLARIZATION ON SOCIAL MEDIA?

THIS THESIS

13

RESEARCH QUESTIONS1. Identify polarized discussions on social media and

quantify their severity.2. Track evolution of polarized discussions and

understand their properties.3. Design ways to reduce the polarization.

14

1. IDENTIFYING AND QUANTIFYING POLARIZED DISCUSSIONS

▸ Using different types of user interactions▸ A. Retweet network▸ B. Reply network

RESEARCH QUESTION I

15

1 A. QUANTIFYING CONTROVERSY ON SOCIAL MEDIA [WSDM’16, CSCW’16]

IDENTIFYING AND QUANTIFYING POLARIZED DISCUSSIONS

16

QUANTIFYING CONTROVERSY▸ In the wild▸ Not necessarily political controversies▸ Compare across controversies▸ Language independent

17

NEED NOT BE POLITICAL …

18

NEED NOT BE POLITICAL …

19

COMPARING ACROSS CONTROVERSIES

20

SOLUTION▸ Graph based formulation▸ Model conversations using a retweet

graph▸ Nodes: users, Edges: retweets

21

EXAMPLE

controversial non-controversial

retweet graphs

#beefban #russia march #sxsw #germanwings.

22

EXAMPLE

controversial non-controversial

retweet graphs

follow graphs

23

PIPELINE

Any Clustering algorithm

• Retweets

• Mentions

• Social network

• Content

• Random walk

• Edge-betweenness

• 2d-embedding

• Sentiment variance

Controversy score

24

SENTIMENT VARIANCE▸ Controversy = intensified sentiments▸ Positive and negative sentiments on each side are

higher compared to non-controversial issues▸ Language dependent

25IDENTIFYING AND QUANTIFYING POLARIZED DISCUSSIONS

1 B. A MOTIF-BASED APPROACH FOR IDENTIFYING CONTROVERSY [UNDER REVIEW]▸ Use motifs defined on the reply networks

26

REPLY NETWORKS

27

CONTROVERSIAL NON-CONTROVERSIAL

REPLY NETWORKS

28

MOTIFS

29RESEARCH QUESTION II

2. POLARIZATION OVER TIME▸ A. How do polarized debates change with interest▸ B. Has polarization on Twitter increased over the

years

30POLARIZATION OVER TIME

2 A. HOW DO POLARIZED DEBATES CHANGE WITH INTEREST [UNDER REVIEW]▸ Polarization increases with interest▸ Most retweeting activity occurs within a side▸ Endorsement network becomes more hierarchical

and a large fraction of edges go from periphery to core

▸ Content becomes more similar between the two sides

31POLARIZATION OVER TIME

2 B. HAS POLARIZATION INCREASED OVER THE YEARS? [UNDER REVIEW]▸ Are Twitter users less likely to follow/retweet users

from both sides?▸ Are users less likely to use biased content?▸ Large scale study – 700,000 users, 2B tweets, 8

years

32RESEARCH QUESTION III

3. REDUCING POLARIZATION▸ A. Reducing Controversy by Connecting

Opposing Views

▸ B. Balancing Information Exposure in Social Networks

33REDUCING POLARIZATION

3 A. REDUCING CONTROVERSY BY CONNECTING OPPOSING VIEWS [WSDM’17]

34

POLARIZATION - TWITTER

35

HOW CAN WE BRIDGE THE DIVIDE?

THIS PAPER

36REDUCING CONTROVERSY BY CONNECTING OPPOSING VIEWS

▸ Connect the two sides▸ Model interactions as a graph

▸ Retweet graph Nodes: users, Edges: retweets

HOW CAN WE BRIDGE THE DIVIDE?

37

▸ Quantify degree of polarization in a network▸ How well does information flow between the two

sides?

MEASURE OF POLARIZATION

38

RANDOM WALK CONTROVERSY SCORE▸ Authoritative users exist on both sides of the

controversy▸ How likely a random user on either side is to be

exposed to authoritative content from the opposing side

39

RANDOM WALK CONTROVERSY SCORE (RWC)

X Y

40

RANDOM WALK CONTROVERSY SCORE (RWC)

X Y

41

RANDOM WALK CONTROVERSY SCORE (RWC)

X Y

42

RANDOM WALK CONTROVERSY SCORE (RWC)

43

RWC SCORE: 0.95RWC SCORE: 0.12

44

PROBLEM▸ Given a graph▸ Two sides▸ RWC score

45

FIND THE k BEST EDGES TO ADD TO THE GRAPH THAT MAXIMIZE THE REDUCTION IN RWC SCORE

46REDUCING POLARIZATION

Side 1 Side 2

REDUCING CONTROVERSY BY CONNECTING OPPOSING VIEWS

47

▸ Greedy▸ Look for all pairs of nodes

▸ Find the k pairs that give the highest reduction in RWC

▸ O(n2), n: number of nodes

ALGORITHMS

48REDUCING CONTROVERSY BY CONNECTING OPPOSING VIEWS

Side 1 Side 2

OUR ALGORITHM

The best edges are between the highest degree nodes

49REDUCING CONTROVERSY BY CONNECTING OPPOSING VIEWS

Side 1 Side 2

OUR ALGORITHM

The best edges are between the highest degree nodesO(p2), p << n

50

▸ High degree users Highly retweeted users▸ We can not recommend @realDonaldTrump to follow

@BarackObama▸ Not likely to materialize

NOT PRACTICAL

51

▸ Take into account the probability of the user liking the recommendation

▸ Not all users are the same▸ Popular users▸ Highly polarized users

▸ Compute polarity scores for users

ACCEPTANCE PROBABILITY

52

ACCEPTANCE PROBABILITY

POLARITY SCORE: -0.99

POLARITY SCORE: 0.95

53

based on connections

based on retweets

p(u, v) =

ACCEPTANCE PROBABILITY▸ Learn probabilities from data

54

DEMO

55REDUCING POLARIZATION

Side 1 Side 2

3 B. BALANCING INFORMATION EXPOSURE IN SOCIAL NETWORKS

▸ Find a set of seed nodes that can balance the exposure of information

56

THANK YOU!@gvrkiran

kiran.garimella@aalto.fi