Partitioning Social Networks for Time-dependent Queries

24
Partitioning Social Networks for Time- dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. da Trindade - University of Illinois - EuroSys11 – Workshop on Social Network Systems

description

Partitioning Social Networks for Time-dependent Queries. Berenice Carrasco , Yi Lu and Joana M. F. da Trindade - University of Illinois -. EuroSys11 – Workshop on Social Network Systems. My colleague’s facebook home page!. My colleague’s facebook home page!. - PowerPoint PPT Presentation

Transcript of Partitioning Social Networks for Time-dependent Queries

Page 1: Partitioning Social Networks for Time-dependent Queries

Partitioning Social Networks for Time-dependent QueriesBerenice Carrasco, Yi Lu and Joana

M. F. da Trindade- University of Illinois -

EuroSys11 – Workshop on Social Network Systems

Page 2: Partitioning Social Networks for Time-dependent Queries

My colleague’s facebook home page!

Page 3: Partitioning Social Networks for Time-dependent Queries

My colleague’s facebook home page!

Adarsh

Jona

Nandana

Joana

Naseer

• What is visible to Joana?– Messages in a two-

hop network

Page 4: Partitioning Social Networks for Time-dependent Queries

Why is partitioning important?

• Different types of queries in Social Networks– photo tags, marketplace, news feed

• Retrieve small records (personalized content)• Multiple records from different users• Time-dependent

– Home page refresh at Facebook

Most common query

Page 5: Partitioning Social Networks for Time-dependent Queries

Existing approaches

• Partition based on friendship solely (1-hop network)– Power-law degree distribution• Highly interconnected data• Small fraction of nodes with very large degrees

– General approach: Horizontal partitioning + Replication

Page 6: Partitioning Social Networks for Time-dependent Queries

Existing approaches

• Hash-based horizontal partitioning

Adarsh

Jona

Nandana

Joana

Naseer

JonaJoana

Adarsh Nandana

Naseer

p1 p2 p3

Multiple records in different servers

Bad response time Inefficient network usage

High packet overhead for such small data

Key: User name

Page 7: Partitioning Social Networks for Time-dependent Queries

Existing approaches

• Replication

Great amount of extra storage

Page 8: Partitioning Social Networks for Time-dependent Queries

Existing approaches

• Query-based partitioning

Assume queries do not change with timeCurino et. al., “SCHISM: A workload-driven approach to database replication and partititioning”, 2010

Page 9: Partitioning Social Networks for Time-dependent Queries

The challenge for Social Networks

• Friendship or query-based do not work well• Underlying network varies over time– Added/deleted friends– Interaction level changes

Only 30% of Facebook user pairs interact consistently from one month to the next

Page 10: Partitioning Social Networks for Time-dependent Queries

Our approach

• Partitioning not only the friendship network but also along the time dimension– Interaction: activity network

• weighted links: strong vs. weak• power-law with much lighter tail

– Maximal degree around 100

– This partitioning results in:• Fewer cross-edges• Reduced need for replication

– Goal: Provide frequent users with high data locality• Faster response to queries

Page 11: Partitioning Social Networks for Time-dependent Queries

Our algorithm

1. Construct an Activity

Prediction Graph (APG)

2. Compute cost of local partitions

3. Partitioning APG with KMETIS

4. Greedy algorithm for

partitioning the current period

• Differentiate between: 1) period used for prediction and 2) current period to partition

• Look at the interaction and predict the strength of relationship• Then, look at this strength and determine what data can be

accessed together

Identifies links from past traces and capture relationships with strong activity

Assign a cost that will determine how costly it would be to cut one edge or another

Page 12: Partitioning Social Networks for Time-dependent Queries

Our algorithm

• We propose a way to compute weights in this APG

• User nodes• Message nodes• Two-hop network

Page 13: Partitioning Social Networks for Time-dependent Queries

Our algorithm

• We propose a way to compute weights in this APG

• Message node weights

• User node weights

•Decay factor•# msg exchanged

Page 14: Partitioning Social Networks for Time-dependent Queries

Our algorithm

• Cost of local partitions

• Message node weights

• User node weights• Edge weights

• Msg accessible to user X

• Remote msg weightsPartition 1 Partition 2

Page 15: Partitioning Social Networks for Time-dependent Queries

Evaluation: Graph Partitioning

• Data set:– Facebook New Orleans network• Jan2005 to Dec2006• 8643 users and 69836 wall posts• APG: Jan2005 to Nov2006• Fixed period: Dec-2006, with 13948 wall posts

Page 16: Partitioning Social Networks for Time-dependent Queries

Evaluation of Data Locality

• We mimic real Facebook page downloads for all wall posts in Dec2006– Query requests 6 most recent wall posts in the

user’s two-hop network• We compare our algorithm to two hashed-

based horizontal partitioning algorithms– Hash_p1– Hash_p1_p2

• Number of partitions used: up to 20

Page 17: Partitioning Social Networks for Time-dependent Queries

Evaluation of Data Locality

• Proportion of queries that access only 1 partition

Page 18: Partitioning Social Networks for Time-dependent Queries

Evaluation of Data Locality

• Proportion of queries that access at most 3 partitions

Page 19: Partitioning Social Networks for Time-dependent Queries

Conclusion and Future Work

• Our algorithm partitions social network data according to interaction levels at different times

• Our activity prediction graph significantly improved data locality compared to hashing

• Placement of data across different periods

Page 20: Partitioning Social Networks for Time-dependent Queries

Backup Slides

Page 21: Partitioning Social Networks for Time-dependent Queries

Existing approaches

• Hash-based horizontal partitioning

GizzardRange partitioning

CassandraConsistent hashing

DynamoModified consistent

hashing

Page 22: Partitioning Social Networks for Time-dependent Queries

Our approach

• Replication with time-dependency

Page 23: Partitioning Social Networks for Time-dependent Queries

Our approach

• Replication with time-dependency

Page 24: Partitioning Social Networks for Time-dependent Queries

Greedy Algorithm

• Use an algorithm for messages corresponding to the non-predicted month: Dec2006– Initiator and receiver of the message exist in the

APG but no previous interaction– Exactly one of the initiator and receiver of the

message exist in the APG– Neither the initiator nor the receiver exists in the

APG