FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information...

23
FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology

Transcript of FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information...

Page 1: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

FeedEx: Collaborative Exchange of News Feeds

Seung Jun and Mustaque Ahamad

Georgia Tech Information Security Center

Georgia Institute of Technology

Page 2: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Motivation• RSS/Atom feeds have become increasingly

popular– Published by most traditional media and blogs

• Scalability of feed servers– Frequent pull requests create high load– Infrequent requests increase latency and may

lead to missed items• Our Approach

– Use resources at peer nodes to deliver feed items– Scalable growth in resources with service

demand• Challenges

– Peers may not fully cooperate and execute the agreed protocols

Page 3: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

FeedEx Overview• Feeds have different update and usage patterns.

– A new hybrid transport mechanism– Pull from servers– Push among peer nodes

• Peers in FeedEx – Form a distribution mesh,– Fetch feeds from web servers occasionally, and– Exchange new entries among each other– Peer incentives for exchanging entries

Page 4: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

RSS/Atom Primer

• Feed format

<feed><title>NYT Technology</title><!-- other elements --><entry>

<title>Basics: Going Wireless on ...</title><link>http://www.nytimes.com/2006/05/18/...</link><summary>Wi-Fi has revolutionized the...</summary><!-- other elements -->

</entry>

<!-- more entries --></feed>

• Current way of reading feeds– Stand-alone applications (e.g., Mozilla Thunderbird)– Web-based service (e.g., Bloglines and My Yahoo!)

Page 5: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Analysis of Feed Publishing

• Purpose– Interesting by itself and helpful in

designing FeedEx• Methodology

– 245 popular feeds monitored for 10 days– Feeds fetched every 2 minutes

Page 6: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Publishing Rate by Rank

● ●

● ● ●

● ● ● ●●

●●

● ● ● ● ●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

Page 7: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Entry Count

Mean of entry count

79

0 40 80 120

001

0203

04

sdeefforeb

muN

Rotten Tomatoes

MSDNEurekAlert

Techbargains.com

Slate

Range of entry count

159

0 20 40 60 80 100

Techbargains.com

EurekAlert

Washington Post

MSNBC

Page 8: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Publishing Rate by Time

010

25 Reuters

05

10

Yahoo(M)

04

8

Motley Fool

04

812 NPR

0 1 2 3 4 5 6 7Sat Sun

Time (day)

Entr

ies

pub

lishe

d per

hou

r

Page 9: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Entry Lifetime

Lifetime (hours)

Cum

ulat

ive

prob

abili

ty

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

CNN

FOX News

Techbargains.com

Beta News

Page 10: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Architecture of FeedEx

To News Feed Servers

To Neighbors

Neighbor

Server

RPC

From Neighbors

To List ServerConnector

Feed Fetch Scheduler

Page 11: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Bootstrapping

• Obtain a list of peers– Dedicated list server (Gnutella and

BitTorrent)– Embedding (Pseudoserving [Kong and Ghosal 1999]

and CoopNet [Padmanabhan and Sripanidkulchai 2002])– Local cache

• Connect to peers1. Establish connection2. Exchange subscription sets: {(url,hop),...}

Page 12: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Neighbor Selection

• Metrics for good neighbors– Subscription set match

– Topological proximity– Duration of relationship

( ' )

( ) i

P Q

hi

i S S

u Q w d

Page 13: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Adaptive Fetching from Servers

• Coordinated fetching by peers– High coordination overhead– Lots of nodes with high churn rate

• Solution: Adaptive fetching– Freshness rate f : Fraction of new entries

in a fetched document

– Set a target freshness rate ft

– Fetching interval is doubled or halved, bounded by Tmin and Tmax

Page 14: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Entry Exchange Among Peers• New entries obtained

– By fetching from web servers– From neighbors

• Entry bundle– A set of new entries– Document identifier (did): Assigned by SHA-1

digest– Flooded to matching neighbors

• Two-phase flooding– check_did(did) call: 344 bytes including HTTP

request header– put_entries(bundle)

Page 15: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Incentive Mechanism

• Pairwise fairness is simple and effective– Uses local information only– Easy to implement and enforce the mechanism

• Contribution metric cj,i: cj,i += wf −hf

• Deficit of contribution di,j: di,j = ci,j − cj,i

• Node i ensures di,j < D for every neighbor j and a parameter D.

Page 16: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Prototype Implementation

• Python: python.org• XML-RPC: xmlrpc.com/spec• Twisted: twistedmatrix.com• SQLite: sqlite.org• Universal Feed Parser: feedparser.org

Page 17: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Experimental Setup• Two modes

– Stand-alone applications: sln– FeedEx: xch

• Metrics– Time lag– Missing entries– Communication cost

• Experiments– Use 189 PlanetLab nodes– Run 22 hours on a weekday– Primary factor: 6 fetching intervals– Let each node subscribe 20 out of 70 feeds

Page 18: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Results: Time Lag

Fetching interval (hours)

Tim

e la

g (h

ours

)

0 5 10 15

02

46

8

● ● ● ● ●●

Page 19: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Fetching interval (hours)

Mis

sing e

ntr

ies

(%)

.5 1 2 4 8 16

020

4060

8010

0

● ● ●●

●●

● ● ● ● ● ●

●●

● ● ● ● ● ●

XCH miss

Results: Missing Entries

Page 20: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Results: Communication Cost

Fetching interval (hours)

Rece

ived

cal

ls p

er m

iniu

te

.5 1 2 4 8 16

04

812

16●

●●

●●

●●

check_did

Page 21: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Advantages

• Server scalability• Archivability• Controllability• Filtering and recommendation• Privacy

Page 22: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Related Work

• News feed delivery– Corona (Cornell)– FeedTree (Rice)

• Web caching and CDN [Freedman et al. 2004, Wang et al. 2004]

• Gossip-based protocols [Birman et al. 1999, Ganesh et al. 2003, Eugster et al. 2003]

Page 23: FeedEx: Collaborative Exchange of News Feeds Seung Jun and Mustaque Ahamad Georgia Tech Information Security Center Georgia Institute of Technology.

Conclusions

• A new transport mechanism for news feeds– Pull by and exchange among peers

• FeedEx encourages cooperation by enforcing pair-wise fairness, while achieving– Reduced feed server load– Low latency– High coverage– Low communication overhead