Inferring Peer Centrality in Socially-Informed P2P Systems
description
Transcript of Inferring Peer Centrality in Socially-Informed P2P Systems
Inferring Peer Centralityin Socially-Informed P2P Systems
Nicolas Kourtellis, Adriana Iamnitchi
Department of Computer Science & EngineeringUniversity of South Florida
Tampa, USA
11th IEEE International Conference on Peer-to-Peer ComputingKyoto, Japan, 2011
Socially-aware Applications
Applications collect and use social information: Location, collocation, history of interactions, etc. Build (implicit/explicit) social network of users Use: reduce spam, provide recommendations, etc.
Wide range of system architectures How does the social network of users affect the load
in a P2P architecture?
2Decentralization of user social data
• MobiClique• Yarta• ...
• PeerSoN• LifeSocial.KOM• Safebook• Prometheus• …
P2P Networks
Company Servers
MobileDevices
Social Graphs & P2P Networks
Users connected with application-specific edges User-contributed peers form a P2P network User social graph is partitioned into subgraphs &
stored on peers
Questions: How do applications traverse a distributed social graph? What does it mean for the P2P routing?
3
Invite user G’s 2-hop hiking contacts to a trip
Social graph traversals => many P2P lookups
Application performance affected by projection of social graph on peers
Application Example
4
=> 1-hop={B, C, E} 2-hops={A, D, F, I}
How do the properties of the projection graph compare with the properties of the social graph projected?
Projection Graph
5
ProjectionGraph (PG)
P2P Overlay
SocialGraph (SG)
Projection Graph Model
Uses: Study properties of peers such as centrality Study how the social graph topology affects P2P
routing & system performance6
7
Outline
Motivation Projection Graph Model Social Network Centrality Metrics
Degree Centrality Node Betweenness Centrality Edge Betweenness Centrality
Centrality Calculation: Limitations Experimental Questions Experimental Methodology Experimental Results Impacts on Applications & Systems
Number of edges of a node High degree centrality peers: Network Hubs
Can be targeted to directly influence many other peers with a message broadcast or distribute a search query
Degree Centrality
8
Node Betweenness Centrality
Measures the extent to which a node lies on the shortest path between two other nodes
High betweenness centrality peers: Control communication between distant peers Can host data caches for reduced latency to locate
data
9
Edge Betweenness Centrality
Measures the extent to which an edge lies on the shortest path between two nodes
High betweenness centrality edges: Connect distant parts of P2P network Can be monitored to block malware traffic
10
Calculating Peer Centrality
Challenging because of: Limited access to user data (e.g., privacy settings) P2P network scale Peer churn
Through experimental analysis on the social and projection graph, we investigate how to circumvent these limitations
11
Experimental Questions
Can we approximate the centrality of peers using the centrality scores of their users?
How does the number of users storing data per peer affect the centrality scores of their peers? Social graph is less dynamic than the P2P network Calculate infrequently centrality score of users & use it
to estimate their peer’s centrality
Spoiler Alert! [1, ~150] users/peer: Can estimate degree &
betweenness centrality of peers with good accuracy
Above 150 users/peer: The projection graph becomes highly connected => peers do not differentiate in centrality
12
Naturally-formed communities offer incentives for resource sharing 1 community subgraph mapped per peer
Projection graphs generated from 5 real social graphs
Communities detected via recursive Louvain algorithm* Varied average community size: 5,10,20,…,1000 users/peer
Calculate correlation of centralities of users and their peers Compare average centralities of users and their peers Identify top centrality peers from their users’ scores
Experimental Methodology
13
Social Network Users Edges
gnutella04 10,876 39,994
gnutella31 62,561 147,878
enron 33,696 180,811
epinions 75,877 405,739
slashdot 82,168 504,230
*V. D. Blondel et al, “Fast unfolding of communities in large networks”,Journal of Statistical Mechanics: Theory and Experiment, vol. 10, 2008.
Correlation of Centrality Scores
[1-150] users/peer: Projection graph resembles
closely social graph Highest correlation of social &
projection graph metrics Degree & node betweenness
estimated from local information (cumulative scores)
14
After 150 users/peer: Projection graph topology
loses social properties Highly connected network Peers participate equally
in graph traversal
Users/Peervs.
Degree
Users/Peervs.
Node Betweenness
Users/Peervs.
Edge Betweenness
Comparison of Centrality Scores
Increase number of users/peer turning point in projection graph More connections with other peers
increase peer degree & betweenness to maximum More social edges within peers
decrease edge betweenness to minimum 15
Users/Peervs.
Degree
Users/Peervs.
Node Betweenness
Users/PeerVs.
Edge Betweenness
Finding High Betweenness Peers
Placing data caches on high betweenness peers can reduce latency to locate data
Can we identify such peers, knowing the top betweenness users or communities?
Top 5% betweenness centrality users => top betweenness centrality peers with 80–90% accuracy
16
Users/Peer Users/Peer
With Top-N% users With Top-N% communities
Summary of Findings
[1, ~150] users/peer: Projection graph resembles closely social graph Highest correlation of social & projection graph metrics Degree & node betweenness can be estimated from
local information (cumulative scores of users) Cannot estimate well edge betweenness
Above 150 users/peer: Projection graph topology loses social properties A highly connected projection graph No differentiation in peer centrality
Top betweenness centrality users can pinpoint the top betweenness centrality peers with good accuracy
Overall: Applications can calculate infrequently centrality score of users to estimate peer centrality Social graph changes slowly compared to P2P network
17
Impact on Applications & Systems
Target high degree peers to: Decrease search time Increase breadth of search and diversity of results
Target high betweenness peers to: Monitor information flow and collect traces Place data caches and indexes of data location Quarantine malware outbursts Disseminate software patches
Tackle P2P churn Predict centrality of peers to allocate resources
Reduce overlay overhead Enhance routing tables with P2P edges for faster &
more secure peer discovery18
19
Thank you!
This work was supported by NSF Grants:CNS 0952420 and CNS 0831785
http://www.cse.usf.edu/dsg/[email protected]