Evaluating a Defragmented DHT Filesystem Jeff Pang Phil Gibbons, Michael Kaminksy, Haifeng Yu,...

22
Evaluating a Defragmented DHT Filesystem Jeff Pang Phil Gibbons, Michael Kaminksy, Haifeng Yu, Sinivasan Seshan Intel Research Pittsburgh, CMU
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Evaluating a Defragmented DHT Filesystem Jeff Pang Phil Gibbons, Michael Kaminksy, Haifeng Yu,...

Evaluating a Defragmented DHT Filesystem

Jeff Pang

Phil Gibbons, Michael Kaminksy, Haifeng Yu, Sinivasan Seshan

Intel Research Pittsburgh, CMU

Problem Summary

150-210 211-400 401-513

TRADITIONAL DISTRIBUTED HASH TABLE (DHT)

• Each server responsible for pseudo-random range of ID space• Objects are given pseudo-random IDs

800-999

324

987

160

Problem Summary

150-210 211-400 401-513

DEFRAGMENTED DHT

• Each server responsible for dynamically balanced range of ID space• Objects are given contiguous IDs

800-999

320

321

322

Motivation

• Better availability• You depend on fewer servers when accessing

your files

• Better end-to-end performance• You don’t have to perform as many DHT lookups

when accessing your files

Availability Setup

• Evaluated via simulation• ~250 nodes with 1.5Mbps each• Faultload: PlanetLab failure trace (2003)

• included one 40 node failure event• Workload: Harvard NFS trace (2003)

• primarily home directories used by researchers

• Compare:• Traditional DHT: data placed using consisent hashing• Defragmented DHT: data placed contiguously and load

balanced dynamically (via Mercury)

Availability Setup

• Metric: failure rate of user “tasks”• Task(i,m) = sequence of accesses with a interarrival

threshold of i and max time of m• Task(1sec,5min) = sequence of accesses that are spaced

no more than 1 sec apart and last no more than 5 minutes

• Idea: capture notion of “useful unit of work”• Not clear what values are right• Therefore we evaluated many variations

Task(1sec,…)

<1sec<1sec 5min

Task(1sec,5min)

Availability Results

• Failure rate of 5 trials• Lower is better• Note log scale• Missing bars have 0 failures

• Explanation• User tasks access 10-20x

fewer nodes in the defragmented design

Performance Setup

• Deploy real implementation• 200-1000 virtual nodes with 1.5Mbps (Emulab)• Measured global e2e latencies (MIT King)• Workload: Harvard NFS

• Compare: • Traditional vs Defragmented

• Implementation• Uses Symphony/Mercury DHTs, respectively• Both use TCP for data transport• Both employ a Lookup Cache: remembers recently

contacted nodes and their DHT ranges

Performance Setup

• Metric: task(1sec,infinity) speedup• Task t takes 200msec in Traditional• Task t takes 100msec in Defragmented• speedup(t) = 200/100 = 2

• Idea: capture speedup for each unit of work that is independent of user think time• Note: 1 second interarrival threshold is

conservative => tasks are longer• Defragmented does better with shorter tasks

(next slide)

Performance Setup

• Accesses within a task may or may not be inter-dependent• Task = (A,B,…)• App. may read A, then depending on contents of A, read B • App. may read A and B regardless of contents

• Replay trace to capture both extremes• Sequential - Each access must complete before starting

the next (best for Defragmented)• Parallel - All accesses in a task can be submitted in parallel

(best for Traditional) [caveat: limited to 15 outstanding]

Performance Results

Performance Results

• Other factors: • TCP slow start• Most tasks are small

Overhead• Defragmented design is not free

• We want to maintain load balance• Dynamic load balance => data migration

Conclusions

• Defragmented DHT Filesystem benefits:• Reduces task failures by an order of magnitude• Speeds up tasks by 50-100%• Overhead might be reasonable: 1 byte written =

1.5 bytes transferred

• Key assumptions:• Most tasks are small to medium sized (file

systems, web, etc. -- not streaming)• Wide area e2e latencies are tolerable

Tommy Maddox Slides

Load Balance

Lookup Traffic

Availability Breakdown

Performance Breakdown

Performance Breakdown 2

• With parallel playback, the Defragmented suffers on the small number of very long tasksignore -

due to topology

Maximum Overhead

Other Workloads