Rethinking Internet Bulk Data Transfers Krishna P. Gummadi Networked Systems Group Max-Planck...
-
Upload
godwin-reynolds -
Category
Documents
-
view
215 -
download
0
Transcript of Rethinking Internet Bulk Data Transfers Krishna P. Gummadi Networked Systems Group Max-Planck...
Rethinking Internet Bulk Data Transfers
Krishna P. Gummadi
Networked Systems GroupMax-Planck Institute for Software Systems
Saarbruecken, Germany
Why rethink Internet bulk data transfers?
• The demand for bulk content in the Internet is rapidly rising– e.g., movies, home videos, software, backups, scientific data
• Yet, bulk transfers are expensive and inefficient– postal mail is often preferred for delivering bulk content• e.g., netflix mails DvDs, even Google uses sneaker-nets
• The problem is inherent to the current Internet design– it was designed for connecting end hosts, not content delivery
How to design a future network for bulk content delivery?
Lots of demand for bulk data transfers
• Bulk transfers comprise of a large fraction Internet traffic– they are few in number, but account for a lot of bytes
• Lots more bulk content is transferred using postal mail– NetFlix ships more DvD bytes than the Internet in the
U.S
• Growing demand to transfer all bulk data over the Internet
% flows % bytes
Toronto 0.008% 36%
Munich 0.003% 44%
Abilene 0.08% 50%
Share of bulk (> 100MB) TCP flows
Internet bulk transfers are expensive and inefficient
• Attempted 1.2 TB transfer between U.S. and Germany – between Univ. of Washington and Max-Planck Institute
• MPI’s happy hours were UW’s unhappy hours• network transfer would have lasted several weeks
• Ultimately, used physical disks and DHL to transfer data
• Other examples:– Amazon’s on-demand Simple Storage Service (S3)
• storage: 0.15 cents/GB, data transfer: 0.28 cents/GB – ISPs routinely rate-limit bulk transfers
• even academic networks rate-limit YouTube and file-sharing
The Internet is not designed for bulk content delivery
• For many bulk transfers, users primarily care about completion time– i.e., when the last data packet was delivered
• But, Internet routing / transport focus on individual packets– such as their latency, jitter, loss, and ordered delivery
• What would an Internet for bulk transfers focus on?– optimizing routes & transfer schedules for efficiency & cost• finding routes with most bandwidth, not min. delay• delaying transfers to save cost, if deadlines can be met
Rest of the talk
• Identify opportunities for inexpensive and efficient data transfers
• Explore network designs that exploit the opportunity
Opportunity 1: Spare network capacity
• Average utilization of most Internet links is very low– backbone links typically run at 30% utilization– peering links are used more, but nowhere near capacity– residential access links are unused most of the time
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Opportunity 1: Spare network capacity
• Why do ISPs over-provision?– we do not know how to manage congested networks – TCP reacts badly to congestion
• Over-provisioning avoids congestion & guarantees QoS!– backbone ISPs offer strict SLAs on reliability, jitter, and loss
– customers buy bigger access pipes than they need
• Why not use spare link capacities for bulk transfers?
Opportunity 2:Large diurnal variation in
network load
• Internet traffic shows large diurnal variation– much of it driven by human activity
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Opportunity 2:Large diurnal variation in
network load• ISPs charge customers based on their peak load
– 95th percentile of load averaged over 5 min intervals
• Because networks have to be built to withstand peak load
• Why not delay bulk transfers till periods of low load?– reduces peak load and evens out load variations– predictable load is easier for ISPs to handle– lower costs for customers
Opportunity 2:Large diurnal variation in
network load
• Can significantly reduce peak load by delaying bulk traffic– allowing a 2 hour delay can decrease peak load by 50%
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Opportunity 3:Non-shortest paths with most
bandwidth
• The Internet picks a single short route between end hosts• Why not use non-shortest, multi-paths with most b.w.?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Rest of the talk
• Identify opportunities for inexpensive and efficient data transfers
• Explore network designs that exploit the opportunity
Many potential network designs
• How to exploit spare bandwidth for bulk transfers?– network-level differentiation using router queues? – transport-level separation using TCP-NICE?
• How and where to delay bulk data for scheduling?– at every-router? at selected routers at peering points?– storage-enabled routers? data centers attached to routers?
• How to select non-shortest multiple paths?– separately within each ISP? or across different ISPs?
• But, all designs share few key architectural features
Key architectural features
• Isolation: separate bulk traffic from the rest• Staging: break end-to-end transfers into
multiple stagesAT&T
MPI
UW
DTS1
S2
S4
S3
S5
S6
UW MPI : S1, S2, S3, S4, S5, S6
How to guarantee end-to-end reliability?
• How to recover when an intermediate stage fails?– option 1: client times out and requests the sender again– option 2: client directly queries the network to locate data• analogous to FedEx or DHL tracking service
AT&T
MPI
UW
DTS1
S2
S4
S3
S5X
S’5
S’6
Content Addressed Tracking (CAT)
CAT allows opportunistic multiplexing of transfer
stages
• Eliminates redundant data transfers independent of end hosts• Huge potential benefits for popular data transfers
– e.g., 80% of P2P file-sharing traffic is repeated bytes
AT&T
MPI
UW
DTS1
S2
S4
S3
S5
S6
UW MPI : S1, S2, S3, S4, S5, S6
Fraunhofer
UW Fraunhofer : S1, S2, S3, S4, S5, S7
S7
CAT
Service model of the new architecture
• Our architecture has 3 key features– isolation, staging and content addressed tracking
• Data transfers in our architecture are not end-to-end– packets are not individually ack’ed by end hosts
• Our architecture provides an offline data transfer service– the sender pushes the data into the network– the network delivers the data to the receiver
Evaluating the architecture
• CERN’s LHC workload– 15 Petabytes of data per year -- 41 TB/day– transfer data from Tier 1 site at FNAL and 6 Tier 2 sites
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Benefits of our architecture• Abilene (10 Gbps) backbone was deemed insufficient
– plan to use NSF’s new TeraGrid (40 Gbps) network
• Could our new architecture help?
Max. Time (days) required to transfer 41TB over AbileneIsolation 1.52 (6.1)
+ Multi-path Routing 0.95 (6.1)
+ CAT 0.38 (6.1)
+ Multi-path + CAT 0.19 (6.1)
• With our architecture, the CERN data could be transferred using less than 20% of spare b.w. in Abilene!
Are bulk transfers worth a new network architecture?
• The economics of distributed computing– computation costs: falling at Moore’s law– storage costs: falling faster than Moore’s law– networking costs: falling slower than Moore’s law
• Wide-area bandwidth is the most expensive resource today– strong incentive to put computation near data
• Lowering bulk transfer costs enables new distributed systems – encourages data replication near computation
• e.g., Web on my disk, Google might become PC software
Conclusion
• A large and growing demand for bulk data in the Internet
• Yet, bulk transfers are expensive and inefficient
• The Internet is not designed for bulk content transfers
• Proposed an offline data transfer architecture– to exploit spare resources, to schedule transfers, to route efficiently, and to eliminate redundant transfers
• It relies on isolation, staging, & content-addressed tracking– preliminary evaluation shows lots of promise
Acknowldgements
• Students at MPI-SWS– Massimiliano Marcon– Andreas Haeberlen– Marcel Dischinger
• Faculty– Stefan Savage, UCSD– Amin Vahdat, UCSD– Peter Druschel, MPI-
SWS