Rethinking Internet Bulk Data Transfers Krishna P. Gummadi Networked Systems Group Max-Planck...

Rethinking Internet Bulk Data Transfers

Krishna P. Gummadi

Networked Systems GroupMax-Planck Institute for Software Systems

Saarbruecken, Germany

Why rethink Internet bulk data transfers?

• The demand for bulk content in the Internet is rapidly rising– e.g., movies, home videos, software, backups, scientific data

• Yet, bulk transfers are expensive and inefficient– postal mail is often preferred for delivering bulk content• e.g., netflix mails DvDs, even Google uses sneaker-nets

• The problem is inherent to the current Internet design– it was designed for connecting end hosts, not content delivery

How to design a future network for bulk content delivery?

Lots of demand for bulk data transfers

• Bulk transfers comprise of a large fraction Internet traffic– they are few in number, but account for a lot of bytes

• Lots more bulk content is transferred using postal mail– NetFlix ships more DvD bytes than the Internet in the

U.S

• Growing demand to transfer all bulk data over the Internet

% flows % bytes

Toronto 0.008% 36%

Munich 0.003% 44%

Abilene 0.08% 50%

Share of bulk (> 100MB) TCP flows

Internet bulk transfers are expensive and inefficient

• Attempted 1.2 TB transfer between U.S. and Germany – between Univ. of Washington and Max-Planck Institute

• MPI’s happy hours were UW’s unhappy hours• network transfer would have lasted several weeks

• Ultimately, used physical disks and DHL to transfer data

• Other examples:– Amazon’s on-demand Simple Storage Service (S3)

• storage: 0.15 cents/GB, data transfer: 0.28 cents/GB – ISPs routinely rate-limit bulk transfers

• even academic networks rate-limit YouTube and file-sharing

The Internet is not designed for bulk content delivery

• For many bulk transfers, users primarily care about completion time– i.e., when the last data packet was delivered

• But, Internet routing / transport focus on individual packets– such as their latency, jitter, loss, and ordered delivery

• What would an Internet for bulk transfers focus on?– optimizing routes & transfer schedules for efficiency & cost• finding routes with most bandwidth, not min. delay• delaying transfers to save cost, if deadlines can be met

Rest of the talk

• Identify opportunities for inexpensive and efficient data transfers

• Explore network designs that exploit the opportunity

Opportunity 1: Spare network capacity

• Average utilization of most Internet links is very low– backbone links typically run at 30% utilization– peering links are used more, but nowhere near capacity– residential access links are unused most of the time

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Opportunity 1: Spare network capacity

• Why do ISPs over-provision?– we do not know how to manage congested networks – TCP reacts badly to congestion

• Over-provisioning avoids congestion & guarantees QoS!– backbone ISPs offer strict SLAs on reliability, jitter, and loss

– customers buy bigger access pipes than they need

• Why not use spare link capacities for bulk transfers?

Opportunity 2:Large diurnal variation in

network load

• Internet traffic shows large diurnal variation– much of it driven by human activity




network load• ISPs charge customers based on their peak load

– 95th percentile of load averaged over 5 min intervals

• Because networks have to be built to withstand peak load

• Why not delay bulk transfers till periods of low load?– reduces peak load and evens out load variations– predictable load is easier for ISPs to handle– lower costs for customers


network load

• Can significantly reduce peak load by delaying bulk traffic– allowing a 2 hour delay can decrease peak load by 50%



Opportunity 3:Non-shortest paths with most

bandwidth

• The Internet picks a single short route between end hosts• Why not use non-shortest, multi-paths with most b.w.?



Rest of the talk

• Identify opportunities for inexpensive and efficient data transfers

• Explore network designs that exploit the opportunity

Many potential network designs

• How to exploit spare bandwidth for bulk transfers?– network-level differentiation using router queues? – transport-level separation using TCP-NICE?

• How and where to delay bulk data for scheduling?– at every-router? at selected routers at peering points?– storage-enabled routers? data centers attached to routers?

• How to select non-shortest multiple paths?– separately within each ISP? or across different ISPs?

• But, all designs share few key architectural features

Key architectural features

• Isolation: separate bulk traffic from the rest• Staging: break end-to-end transfers into

multiple stagesAT&T

MPI

UW

DTS1

S2

S4

S3

S5

S6

UW MPI : S1, S2, S3, S4, S5, S6

How to guarantee end-to-end reliability?

• How to recover when an intermediate stage fails?– option 1: client times out and requests the sender again– option 2: client directly queries the network to locate data• analogous to FedEx or DHL tracking service

AT&T

MPI

UW

DTS1

S2

S4

S3

S5X

S’5

S’6

Content Addressed Tracking (CAT)

CAT allows opportunistic multiplexing of transfer

stages

• Eliminates redundant data transfers independent of end hosts• Huge potential benefits for popular data transfers

– e.g., 80% of P2P file-sharing traffic is repeated bytes

AT&T

MPI

UW

DTS1

S2

S4

S3

S5

S6

UW MPI : S1, S2, S3, S4, S5, S6

Fraunhofer

UW Fraunhofer : S1, S2, S3, S4, S5, S7

S7

CAT

Service model of the new architecture

• Our architecture has 3 key features– isolation, staging and content addressed tracking

• Data transfers in our architecture are not end-to-end– packets are not individually ack’ed by end hosts

• Our architecture provides an offline data transfer service– the sender pushes the data into the network– the network delivers the data to the receiver

Evaluating the architecture

• CERN’s LHC workload– 15 Petabytes of data per year -- 41 TB/day– transfer data from Tier 1 site at FNAL and 6 Tier 2 sites



Benefits of our architecture• Abilene (10 Gbps) backbone was deemed insufficient

– plan to use NSF’s new TeraGrid (40 Gbps) network

• Could our new architecture help?

Max. Time (days) required to transfer 41TB over AbileneIsolation 1.52 (6.1)

+ Multi-path Routing 0.95 (6.1)

+ CAT 0.38 (6.1)

+ Multi-path + CAT 0.19 (6.1)

• With our architecture, the CERN data could be transferred using less than 20% of spare b.w. in Abilene!

Are bulk transfers worth a new network architecture?

• The economics of distributed computing– computation costs: falling at Moore’s law– storage costs: falling faster than Moore’s law– networking costs: falling slower than Moore’s law

• Wide-area bandwidth is the most expensive resource today– strong incentive to put computation near data

• Lowering bulk transfer costs enables new distributed systems – encourages data replication near computation

• e.g., Web on my disk, Google might become PC software

Conclusion

• A large and growing demand for bulk data in the Internet

• Yet, bulk transfers are expensive and inefficient

• The Internet is not designed for bulk content transfers

• Proposed an offline data transfer architecture– to exploit spare resources, to schedule transfers, to route efficiently, and to eliminate redundant transfers

• It relies on isolation, staging, & content-addressed tracking– preliminary evaluation shows lots of promise

Acknowldgements

• Students at MPI-SWS– Massimiliano Marcon– Andreas Haeberlen– Marcel Dischinger

• Faculty– Stefan Savage, UCSD– Amin Vahdat, UCSD– Peter Druschel, MPI-

SWS

Rethinking Internet Bulk Data Transfers Krishna P. Gummadi Networked Systems Group Max-Planck...

Documents

Transcript of Rethinking Internet Bulk Data Transfers Krishna P. Gummadi Networked Systems Group Max-Planck...