“Containership Markets”: A Comparison with Bulk Shipping ...
New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks
description
Transcript of New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks
New Algorithms for Planning Bulk Transfervia Internet and Shipping Networks
Brian Cho Indranil GuptaUniversity of Illinois at
Urbana-Champaign
2
Motivation: Ad-hoc Data Processing• Data-intensive research on OpenCirrus– Federated cloud: diverse geographic locations– Data scale of TBs
• Limited wide area bandwidth is a big bottleneck : Can take days or weeks to transfer over internet [Garfinkel 07]
• Success story: Washington Post– Hillary Clinton White House schedule
• Released as 17,481 pages non-searchable PDF images• Convert to searchable text and deliver to newsroom within the
same news cycle– Done within 26 hours with Amazon AWS
• Pay for bandwidth and computer usage
3
• Pandora (People and networks moving data around)– First ever solution to transfer data cooperatively between
multiple sources with internet and shipping edges– Produce optimal transfer plans that obey time deadlines
and minimize dollar cost Better than internet-only and shipping-only strategies
Bulk Transfer Options• Internet Transfer
– Grid: [GridFTP]– PlanetLab: [CoBlitz 06]
• Disk Shipping Transfer– [Jim Gray 03]– [PostManet 04]– [DOT 06]– Amazon AWS Import/Export
4
5-20 Mbps 1TB: 5-20 days
Data Source (Illinois)
Option 1: Internet Transfer
ComputationProvider
(Amazon)
Data Source(CMU)
$0.10 per GB
No Cost
5
Disk Interface 40 MB/s
Overnight: $60 per DiskTwo-Day: $30 per DiskGround: $10 per Disk
Data Source(Illinois)
Option 2: Disk Shipping Transfer
ComputationProvider
(Amazon)
Data Source(CMU)
Overnight: $50 per DiskTwo-Day: $25 per Disk
Ground: $5 per Disk
$0.02 per GB$80 per Disk
Overnight: $40 per DiskTwo-Day: $15 per Disk
Ground: $5 per Disk
6
Cooperative Transfer Solutions
• Good solutions– Meet deadlines– Minimize dollar cost
• Complexity– Global scale– Many strategies– Collaboration helps
• How to find the best solution?
Open Cirrus Sites
7
15 Days
DataSource A
No Cost
DataSource B
Example: Minimize Dollar Cost
CloudService
Provider
0.8 TB
1.2 TBLoading: $40Handling: $80
Total Cost: $125Total Time: 20 Days
5 Days .
Ground: $5 14 hours
8
DataSource A
1 Day
Overnight: $40
DataSource B
Example: Meet Deadline (3 days)while Minimizing Dollar Cost
CloudService
Provider
0.8 TB
1.2 TBLoading: $40Handling: $80
Total Cost: $210Total Time: 3 Days
1 Day .
Overnight: $50 . 14 hours
6 hours
9
Outline
• Motivation• Problem Formulation– Graph Model– Flow Over Time
• Solution: Pandora• Experimental Results• Conclusion
Graph Model: Internet Links
10
inet_out
inet_in
inet_out
inet_in
Incoming/Outgoing BW
Capacity (Mb/s)Cost ($/GB)Transit time (almost instantaneous)
Site A Site B
Graph Model: Shipment Links
11
inet_out
inet_in
ship_in
inet_out
inet_in
ship_in
Incoming/Outgoing BW
Disk Interface BW e.g., 40 MB/sCost: Loading ($/GB)
Capacity (Mb/s)Cost ($/GB)Transit time (almost instantaneous)
Capacity (almost infinite)Cost: Shipping and Handling ($/Disk)Transit time (Hrs)
Site A Site B
12
Data Transfer Over Time
• Goal: Meet time deadline T while minimizing dollar cost C
• Hard problem on graph with both Internet and Shipment links– NP-Hard– Formal problem and proof in paper
• Solution: Pandora computes optimal and approximate solutions
13
Solution: Pandora Overview
• Transform into static time-expanded network– Decomposition of shipping edges
• Solve min-cost flow on static network– Mixed Integer Program– Optimizations to reduce computation time
14
Time-expanded Network• Intuitively, incorporate time
into graph to create an extended graph representation
• Make T=deadlinecopies of each vertex
• Draw edges according to transit time
• Draw holdover edges
• [Ford Fulkerson 58]• Disk shipment represented as
time-expanded network
τ = 1τ = 3
T = 5
time
15
Decomposed Shipping Edges• Decompose shipping
edges to fixed cost edges1. Transit time2. Fixed cost3. Capacity
cost = $130
capacity = 2 TB
cost = $110
capacity = 2 TB
cost = $100 cap = 2 TB
16
• Fixed-cost edges make min-cost flow calculation NP-Hard• Mixed-Integer Program (MIP)
– Binary variable ye defined on fixed-cost edges
• Goal: Minimize dollar cost• Subject to– Capacity constraints (flowe ≤ capacitye ∙ ye)– Conservation of flow– Demands of sources and sink
• Proof of NP-Hardness and formal MIP in paper
Solution: Min-cost Flow Calculation using Mixed-Integer Program
17
Optimizations: Overview
• Size of MIP grows linearly with deadline T– Worst-case running time grows exponentially with T
• Reduce size of the MIP– Reduce number of shipment edges– Δ -condensed time-expanded networks
• More optimizations in paper
18
Optimizations: Reduce numberof shipment edges
• Can remove redundant shipment edges
• Example:– Overnight shipment sent
anytime before 4pm will arrive at destination at 8am
8am
4pm
3pm
2pm
1pm
noon
7am
19
Optimization: Δ-condensedTime-expanded Network
• Each batch of consecutive Δ time units condensed into one virtual time unit
• Solution has– Minimum cost– Deadline approximation
depending on Δ• More details in paper• [Fleischer Skutella 07] Δ = 2
20
Experimental Setup
• Trace-driven– Wrote scripts to communicate with FedEx web
services: queried package rates and destination time
– Internet BW from PlanetLab measurements• GNU Linear Programming Kit (GLPK)
21
Experimental Results:8 sources, 0.25 TB per node, Heterogeneous BW
• Direct Internet– Cost: $200– Time: 280 hrs– Cannot take
advantage of heterogeneous bandwidth
• Direct Overnight– Cost: $1,500– Time: 38 hrs– Cannot fill disks
to capacity
2 3 4 5 61
78
t 0.25 TBx 8Width proportional to BW
22
Experimental Results:8 sources, 0.25 TB per node, Heterogeneous BW
12 3
45
8 t7
6
1.92 TB0.14 TB
0.06 TB 0.08 TB
• Direct Internet– Cost: $200– Time: 280 hrs– Cannot take
advantage of heterogeneous bandwidth
• Direct Overnight– Cost: $1,500– Time: 38 hrs– Cannot fill disks
to capacity
• Pandora Deadline=96hrs– Cost: $183– Time: < 96 hrs
23
Experimental Results: Optimizations• Reducing shipment edges
decreases computation time• Using Δ-condensed time-expanded
networks decreases computation time– Deadlines met in our experiments
2 sources 1 source
24
Conclusion
• First ever solution to transfer data cooperatively between multiple sources with internet and shipping edges
• Produce optimal transfer plans that obey time deadlines and minimize dollar costBetter than internet-only and shipping-only
strategies• Reasonable computation time by using
optimizations