175 High Performance P2P Web Caching

download 175 High Performance P2P Web Caching

of 21

Transcript of 175 High Performance P2P Web Caching

  • 7/28/2019 175 High Performance P2P Web Caching

    1/21

    High Performance

    P2P Web Caching

    Erik GarrisonJared Friedman

    CS264 PresentationMay 2, 2006

  • 7/28/2019 175 High Performance P2P Web Caching

    2/21

    SETI@Home

    Basic Idea: people donate computer time to look foraliens

    Delivered more than 9 million CPU-years

    Guinness BWR largest computation ever Many other successful projects (BOINC, Google

    Compute) The point: many people are willing to donate

    computer resources for a good cause

  • 7/28/2019 175 High Performance P2P Web Caching

    3/21

    Wikipedia

    About 200 servers required to keep the sitelive

    Hosting & Hardware costs over 1$M per year All revenue from donations Hard to make ends meet Other not-for-profit websites in similar

    situation

  • 7/28/2019 175 High Performance P2P Web Caching

    4/21

    HelpWikipedia@Home

    What if people could donate idle computerresources to help host not-for-profitwebsites?

    They probably would! This is the goal of our project

  • 7/28/2019 175 High Performance P2P Web Caching

    5/21

    Prior Work

    This doesn't exist But some things are similar

    Content Distribution Networks (Akamai) Distributed web hosting for big companies

    CoralCDN/CoDeeN P2P web caching, like our idea, But a very different design Both have some problems

  • 7/28/2019 175 High Performance P2P Web Caching

    6/21

    Akamai, the opportunity

    Internet traffic is 'bursty' Expensive to build infrastructure to handle

    flash crowds International audience, local servers

    Sites run slowly in other countries

  • 7/28/2019 175 High Performance P2P Web Caching

    7/21

    Akamai, how it works

    Akamai put >10,000 servers around theglobe

    Companies subscribe as Akamai clients Client content (mostly images, other media)

    is cached on Akamai's servers Tricks with DNS make viewers download

    content from nearby Akamai servers Result: Website runs fast everywhere, no

    worries about flash crowds But VERY expensive!

  • 7/28/2019 175 High Performance P2P Web Caching

    8/21

    CoralCDN

    P2P web caching Probably the closest system to our goal Currently in late-stage testing on PlanetLab Uses an overlay and a 'distributed sloppy

    hash table' Very easy to use just append '.nyud.net' to

    a URL and Coral handles it Unfortunately ...

  • 7/28/2019 175 High Performance P2P Web Caching

    9/21

    Coral: Problems

    Currently very slow This might improve in later versions Or it might be due to the overlay structure

    Security: volunteer nodes can respond withfake data

    Any site can use Coral to help reduce load Just append .nyud.net to their internal links

    Decentralization makes optimization hard more on this later

  • 7/28/2019 175 High Performance P2P Web Caching

    10/21

    Our Design Goals

    Fast: Akamai level performance Secure: Pages served are always genuine Fast updates possible Must greatly reduce demands on main site

    But this cannot compromise first 3

  • 7/28/2019 175 High Performance P2P Web Caching

    11/21

    Our Design

    Node/Supernode structure Take advantage of extremely heterogeneous

    performance characteristics

    Custom DNS server redirects incomingrequests to nearby super node

    Super node forwards request to nearbyordinary node

    Node replies to user

  • 7/28/2019 175 High Performance P2P Web Caching

    12/21

    Our Design

    User goes to wikipedia.org

    DNS server resolveswikipedia.org to a super node

    Super node forwards request toordinary node that has therequested document

    Node retrieves documentand sends to user

  • 7/28/2019 175 High Performance P2P Web Caching

    13/21

    Performance

    Requests are answered in only 2 hops DNS server resolves to a geographically

    close supernode Supernode avoids sending requests to slow

    or overloaded nodes All parts of a page (e.g., html and images)

    should be served by a single node

  • 7/28/2019 175 High Performance P2P Web Caching

    14/21

    Security

    Have to check nodes' accuracy First line of defense: encrypt local content May delay attacks, but won't stop them

  • 7/28/2019 175 High Performance P2P Web Caching

    15/21

    Security

    More serious defense: let users check thevolunteer nodes!

    Add a javascript wrapper to the website that

    requests the pages using AJAX With some probability, the AJAX script will

    compute the MD5 of the page it got and sendit to a trusted central node

    Central node kicks out nodes that frequentlyget invalid MD5sum's

    Offload processing not just to nodes, but tousers, with zero-install

  • 7/28/2019 175 High Performance P2P Web Caching

    16/21

    A Tricky Part

    Supernodes get requests, have to decidewhat node should answer what requests

    Have to load-balance nodes no overloading Popular documents should be replicated

    across many nodes But don't want to replicate unpopular

    documents much conserve storage space Lots of conflicting goals!

  • 7/28/2019 175 High Performance P2P Web Caching

    17/21

    On the plus side...

    Unlike Coral & CoDeeN, supernodes know alot of nodes (maybe 100-1000?)

    They can track performance characteristics

    of each node Make object placement decisions from a

    central point Lots of opportunity to make really intelligent

    decisions Better use of resources Higher total system capacity Faster response times

  • 7/28/2019 175 High Performance P2P Web Caching

    18/21

    Object Placement Problem

    This kind of problem is known as an objectplacement problem What nodes do we put what files on?

    Also related to the request routing problem Given the files currently on the nodes, what

    node do we send this particular request to?

    These problems are basically unsolved for

    our scenario Analytical solutions have been done for very

    simplified, somewhat different cases We suspect a useful analytic solution is

    impossible here

  • 7/28/2019 175 High Performance P2P Web Caching

    19/21

    Simulation

    Too hard to solve analytically, so do asimulation

    Goal is to explore different object placement

    algorithms under realistic scenarios Also want to model the performance of the

    whole system What cache hit ratios can we get?

    How does number/quality of peers affect cachehit ratios?

    How is user latency affected?

    Built a pretty involved simulation in Erlang

  • 7/28/2019 175 High Performance P2P Web Caching

    20/21

    Simulation Results

    So far, encouraging! Main results using a heuristic object

    placement algorithm Can load-balance without creating hotspots

    up to about 90% of theoretical capacity Documents rarely requested more than once

    from central server Close to theoretical optimum

  • 7/28/2019 175 High Performance P2P Web Caching

    21/21

    Next Steps

    Add more detail to simulation Node churn Better internet topology

    Explore update strategies Obviously, an actual implementation would

    be nice, but not likely to happen this week What do you think?