A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et...

67
Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster” A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur, K. Ramchandran

Transcript of A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et...

Page 1: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

A Solution to the Network Challenges of �Data Recovery in Erasure-coded �

Distributed Storage Systems: �A Study on the Facebook Warehouse Cluster

K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur, K. Ramchandran

Page 2: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Outline  •  Introduc.on:  Erasure  coding  in  data  centers  

– Low  storage,  high  fault-­‐tolerance  – High  download  &  disk  IO  during  recovery    

•  Measurements  from  Facebook  warehouse  cluster  in  produc.on  

 

•  Proposed  alterna.ve:  Piggybacked-­‐RS  codes  – Same  storage  overhead  &  fault  tolerance  – 30%  reduc.on  in  download  &  disk  IO  

Page 3: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Outline  •  Introduc.on:  Erasure  coding  in  data  centers  

– Low  storage,  high  fault-­‐tolerance  – High  download  &  disk  IO  during  recovery    

•  Measurements  from  Facebook  warehouse  cluster  in  produc.on  

 

•  Proposed  alterna.ve:  Piggybacked-­‐RS  codes  – Same  storage  overhead  &  fault  tolerance  – 30%  reduc.on  in  download  &  disk  IO  

Page 4: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Need  for  Redundant  Storage  

• Frequent  unavailability  in  data-­‐centers  –  commodity  components  fail  frequently  –  soLware  glitches,  maintenance  shutdowns,  power  

failures  

•  Redundancy  gives  more  reliability  and  availability      

Page 5: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Popular  approach:  Replica.on  

•  Mul.ple  copies  of  data  across  machines    

•  E.g.,  GFS,  HDFS  store  3  replicas  by  default    

•  Typically  stored  across  different  racks      

a, b: data blocks

block 1 a

a

b

b

block 2

block 3

block 4

Page 6: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Petabyte  Scale  data:    Replica.on  expensive  

 •  Moderately  sized  data:  storage  is  cheap    

 ⇒    replica.on  viable  

•  Mul.ple  tens  of  PBs  ⇒  aggregate  storage  no  longer  cheap    ⇒  replica.on  is  expensive  

 

Page 7: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

block 1

block 2

block 3

block 4

a

b

a+b

a+2b parity  blocks  

data  blocks  block 1

block 2

block 3

block 4

Erasure Codes  Replication Reed-Solomon (RS) code

a

a

b

b

Redundancy   2x   2x  

Page 8: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

block 1

block 2

block 3

block 4

a

b

a+b

a+2b parity  blocks  

data  blocks  block 1

block 2

block 3

block 4

Erasure Codes  Replication Reed-Solomon (RS) code

a

a

b

b

First  order    comparison:  

tolerates  any  one  failure   tolerates  any  two  failures  

Redundancy   2x   2x  

Page 9: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

block 1

block 2

block 3

block 4

a

b

a+b

a+2b parity  blocks  

data  blocks  block 1

block 2

block 3

block 4

Erasure Codes  Replication Reed-Solomon (RS) code

a

a

b

b

First  order    comparison:  

tolerates  any  one  failure   tolerates  any  two  failures  

Redundancy   2x   2x  

Page 10: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

block 1

block 2

block 3

block 4

a

b

a+b

a+2b parity  blocks  

data  blocks  block 1

block 2

block 3

block 4

Erasure Codes  Replication Reed-Solomon (RS) code

a

a

b

b

First  order    comparison:  

tolerates  any  one  failure   tolerates  any  two  failures  

Redundancy   2x   2x  

Page 11: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

block 1

block 2

block 3

block 4

a

b

a+b

a+2b parity  blocks  

data  blocks  block 1

block 2

block 3

block 4

Erasure Codes  Replication Reed-Solomon (RS) code

a

a

b

b

First  order    comparison:  

tolerates  any  one  failure   tolerates  any  two  failures  

Redundancy   2x   2x  

Page 12: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

block 1

block 2

block 3

block 4

a

b

a+b

a+2b parity  blocks  

data  blocks  block 1

block 2

block 3

block 4

Erasure Codes  Replication Reed-Solomon (RS) code

a

a

b

b

First  order    comparison:  

tolerates  any  one  failure   tolerates  any  two  failures  

In  general:   order  of  magnitude  higher  MTTDL    with  much  lesser  storage  

lower  MTTDL,  high  storage  requirement  

Redundancy   2x   2x  

Page 13: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Erasure Codes  

   Using  RS  codes  instead  of  3-­‐replica.on  on  less-­‐frequently  accessed  data  has  led  to  

savings  of  mul.ple  Petabytes    in  the  Facebook  Warehouse  cluster  

Page 14: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Reed-­‐Solomon  (RS)  Codes  

•  (#data,  #parity)  RS  code:  –  tolerates  failure  of  any  #parity  blocks  –  these  (#data  +  #parity)  blocks  cons.tute  a  “stripe”  

•  Facebook  warehouse  cluster  uses      a  (10,  4)  RS  code  

a

b

a+b

a+2b

#data  =  2    (data  blocks)  

#parity  =  2    (parity  blocks)  

4  blocks    in  a  stripe  

Example:  (2,  2)  RS  code  

Page 15: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Why  RS  codes  ?  •  Maximum  possible  fault-­‐tolerance  for  storage  overhead    

–  storage-­‐capacity  op.mal  –  “maximum-­‐distance-­‐separable  (MDS)”  (in  coding  theory  parlance)  

 

•  Flexibility  in  choice  of  parameters  –  Supports  any  #data  and  #parity  

Page 16: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Why  RS  codes  ?  •  Maximum  possible  fault-­‐tolerance  for  storage  overhead    

–  storage-­‐capacity  op.mal  –  “maximum-­‐distance-­‐separable  (MDS)”  (in  coding  theory  parlance)  

 

•  Flexibility  in  choice  of  parameters  –  Supports  any  #data  and  #parity  

However…      result  in  increased  download  and  disk  IO  during  data  recovery  

Page 17: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Data  Recovery:  Increased  download  &  disk  IO  

Replication

a  

Download & IO 1x

a

a

b

b

block 1

block 2

block 3

block 4

a

Page 18: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Data  Recovery:  Increased  download  &  disk  IO  

Replication

a  

Download & IO 1x Download & IO

2x

b  

a+b  

a

a

b

b

a

b

a+b

a+2b

Reed-Solomon code

block 1

block 2

block 3

block 4

block 1

block 2

block 3

block 4

a a

Page 19: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Data  Recovery:  Increased  download  &  disk  IO  

Replication

a  

Download & IO 1x Download & IO

2x

b  

a+b  

a

a

b

b

a

b

a+b

a+2b

Reed-Solomon code

block 1

block 2

block 3

block 4

block 1

block 2

block 3

block 4

a a

In  general…  

Download    &  IO  required  =  #data  x  (size  of  data  to  be  recovered)  

Page 20: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Burdens  the  already  oversubscribed  Top-­‐of-­‐Rack  and  higher  level  switches  

Data  Recovery:  Burden  on  TOR  switches  

TOR   TOR   TOR   TOR  

AS/Router  

a   b  a  +  b  

a  +  2b  

…   …   …   …  

node  1   node  2   node  3   node  4  

a  

Page 21: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Outline  •  Introduc.on:  Erasure  coding  in  data  centers  

– Low  storage,  high  fault-­‐tolerance  – High  download  &  disk  IO  during  recovery    

•  Measurements  from  Facebook  warehouse  cluster  in  produc.on  

 

•  Proposed  alterna.ve:  Piggybacked-­‐RS  codes  – Same  storage  overhead  &  fault  tolerance  – 30%  reduc.on  in  download  &  disk  IO  

Page 22: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Brief  System  Descrip.on  

•  HDFS  cluster  with  mul.ple  thousands  of  nodes  •  Mul.ple  tens  of  PBs  and  growing  •  Data  immutable  un.l  deleted        Reducing  storage  requirements  is  of  high  importance  

Page 23: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Brief  System  Descrip.on  

•  HDFS  cluster  with  mul.ple  thousands  of  nodes  •  Mul.ple  tens  of  PBs  and  growing  •  Data  immutable  un.l  deleted        

•  Uses  (10,  4)  RS  code  to  reduce  storage  requirements  –  on  less-­‐frequently  accessed  data  

•  Mul.ple  PBs  of  RS  coded  data  

Reducing  storage  requirements  is  of  high  importance  

Page 24: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

block  1  

block  2  

block  10  

256  Mbytes  

…  

…  

data  blocks  

Brief  System  Descrip.on  

Page 25: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

1  byte  

block  1  

block  2  

block  10  

block  11  

block  14  

256  Mbytes  

…  

…  

…  

…  

data  blocks  

parity  blocks  

…  

…  

…  

…  

…  

Brief  System  Descrip.on  

Page 26: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Machine  Unavailability  Events  

Median  of  ≈50  machine-­‐unavailability  events  logged  per  day  

•  From  HDFS  Name-­‐Node  logs  •  Logged  when  no  heart-­‐beat  for  >  15min  •  Blocks  marked  unavailable,  periodic  recovery  process  

#mac

hine

-una

vaila

bilit

y ev

ents

logg

ed"

Day"

Page 27: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Dominant  scenario:  Single  block  recovery  

#  blocks  missing  in  stripe   %  of  stripes  with  missing  blocks  

1   98.08    

2   1.87  

3   0.036  

4   9  x  10-­‐6  

≥  5   9  x  10-­‐9  

Missing  blocks  per  stripe  

Page 28: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

#Blocks  Recovered  &  Cross-­‐rack  Transfers  

•  Median  of  180  TB  transferred  across  racks  per  day  for  recovery  opera.ons  

•  Around  5  .mes  that  under  3-­‐replica.on  

Page 29: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Outline  •  Introduc.on:  Erasure  coding  in  data  centers  

– Low  storage,  high  fault-­‐tolerance  – High  download  &  disk  IO  during  recovery    

•  Measurements  from  Facebook  warehouse  cluster  in  produc.on  

 

•  Proposed  alterna.ve:  Piggybacked-­‐RS  codes  – Same  storage  overhead  &  fault  tolerance  – 30%  reduc.on  in  download  &  disk  IO  

Page 30: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

1  byte  

block  1  

block  2  

block  3  

block  4  

1  byte  

Step  1:  Take  a  (2,  2)  Reed-­‐Solomon  code  

a1

a2

a1+a2

a1+2a2

b1

b2

b1+b2

b1+2b2

data  blocks  

parity  blocks  

Piggybacking:  Toy  Example  

Page 31: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

block  1  

block  2  

block  3  

block  4  

a1

a2

a1+a2

a1+2a2

b1

b2

b1+b2

b1+2b2

a1+a2 �

a2 � b2 �b1+b2 �

Piggybacking:  Toy  Example  

 (In  (2,2)  RS  code:  recovery  download  &  IO  =  4  bytes)  

Page 32: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

block  1  

block  2  

block  3  

block  4  

a1 �

a2 �

a1+a2 �

a1+2a2 �

b1 �

b2 �

b1+b2 �

b1+2b2+a1 �

Step  2:  Add  ‘piggybacks’  to  parity  nodes  

Piggybacking:  Toy  Example  

No  addi.onal  storage!  

Page 33: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Fault-­‐Tolerance  (toy  example)  

Same  fault  tolerance  as  RS  code:    can  tolerate  failure  of  any  2  nodes  

block  1  

block  2  

block  3  

block  4  

a1 �

a2 �

a1+a2 �

a1+2a2 �

b1 �

b2 �

b1+b2 �

b1+2b2+a1 �

Page 34: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Fault-­‐Tolerance  (toy  example)  

Same  fault  tolerance  as  RS  code:    can  tolerate  failure  of  any  2  nodes  

block  1  

block  2  

block  3  

block  4  

a1 �

a2 �

a1+a2 �

a1+2a2 �

b1 �

b2 �

b1+b2 �

b1+2b2+a1 �

a1 a2 �

Page 35: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Fault-­‐Tolerance  (toy  example)  

Same  fault  tolerance  as  RS  code:    can  tolerate  failure  of  any  2  nodes  

block  1  

block  2  

block  3  

block  4  

a1 �

a2 �

a1+a2 �

a1+2a2 �

b1 �

b2 �

b1+b2 �

b1+2b2+a1 �

a1 a2 �

subtract  

Page 36: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Fault-­‐Tolerance  (toy  example)  

Same  fault  tolerance  as  RS  code:    can  tolerate  failure  of  any  2  nodes  

block  1  

block  2  

block  3  

block  4  

a1 �

a2 �

a1+a2 �

a1+2a2 �

b1 �

b2 �

b1+b2 �

b1+2b2+a1 �

a1 a2 � b1 b2 �

Page 37: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Download  &  IO  only  3  bytes    (instead  of  4  bytes  as  in  RS)  

Recovery  (toy  example)  

block  1  

block  2  

block  3  

block  4  

a1 �

a2 �

a1+a2 �

a1+2a2 �

b1 �

b2 �

b1+b2 �

b1+2b2+a1 �

Page 38: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Download  &  IO  only  3  bytes    (instead  of  4  bytes  as  in  RS)  

Recovery  (toy  example)  

b2 �

b1+b2 �block  1  

block  2  

block  3  

block  4  

a1 �

a2 �

a1+a2 �

a1+2a2 �

b1 �

b2 �

b1+b2 �

b1+2b2+a1 �

b1+2b2+a1 �

Page 39: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Download  &  IO  only  3  bytes    (instead  of  4  bytes  as  in  RS)  

Recovery  (toy  example)  

b2 �

b1+b2 �block  1  

block  2  

block  3  

block  4  

a1 �

a2 �

a1+a2 �

a1+2a2 �

b1 �

b2 �

b1+b2 �

b1+2b2+a1 �

b1+2b2+a1 �

subtract  

Page 40: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Download  &  IO  only  3  bytes    (instead  of  4  bytes  as  in  RS)  

Recovery  (toy  example)  

b2 �

b1+b2 �block  1  

block  2  

block  3  

block  4  

a1 �

a2 �

a1+a2 �

a1+2a2 �

b1 �

b2 �

b1+b2 �

b1+2b2+a1 �

b1+2b2+a1 �

subtract  

Page 41: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

General  Piggybacking  Recipe  

To  construct  a  Piggybacked-­‐RS  code:    

•  Step  1:  Take  RS  code  with  iden.cal  parameters    

•  Step  2:  Add  carefully  designed  func.ons  from  one  byte  stripe  on  to  another  –  retains  same  fault-­‐tolerance  and  storage  overhead  –  piggyback  func.ons  designed  to  reduce  amount  of  download  and  IO  for  recovery  

 General  theory  and  algorithms:    K.V. Rashmi, Nihar Shah, K. Ramchandran, “A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes”, in IEEE International Symposium on Information Theory (ISIT) 2013.

Page 42: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

 (10,4)  Piggybacked-­‐RS    

alterna.ve  to    (10,4)  RS  currently  used  in  HDFS  

 

Page 43: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) �

f3(b1,...,b10) �

f4(b1,...,b10) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

1  byte   1  byte  

Step  1:  Take  a  (10,  4)  Reed-­‐Solomon  code    

Page 44: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f4(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f4(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

1  byte   1  byte  

Step  2:  Add  `Piggybacks’  

Page 45: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f4(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f4(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

Tolerates  any  4  block  failures  

Page 46: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f4(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f4(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

Tolerates  any  4  block  failures  

recover  a1,...,a10    like  in  RS  

Page 47: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f4(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f4(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

Tolerates  any  4  block  failures  

recover  a1,...,a10    like  in  RS  

Page 48: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f1(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f1(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f1(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

Tolerates  any  4  block  failures  

recover  a1,...,a10    like  in  RS  

subtract  piggybacks  (func.ons  of  a1,...,a10)  

Page 49: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f1(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f1(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f1(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

Tolerates  any  4  block  failures  

recover  a1,...,a10    like  in  RS  

subtract  piggybacks  (func.ons  of  a1,...,a10)  

recover  b1,...,b10    like  in  RS  

Page 50: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  1   a1 � b1 �

block  11  

block  12  

block  13  

block  14  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f4(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f4(0,...,0,a7,a8,a9,0) �

a10 � b10 �

.  .  .  .  .  .  

.  .  .  

Efficient  data-­‐recovery  

block  2   a2 � b2 �

block  3   a3 � b3 �

.  .  .   .  .  .  

Page 51: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  1   a1 � b1 �

block  11  

block  12  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

a10 � b10 �

.  .  .  .  .  .  

.  .  .  block  2   a2 � b2 �

block  3   a3 � b3 �

.  .  .   .  .  .  

Efficient  data-­‐recovery  

Page 52: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  1   a1 � b1 �

block  11  

block  12  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

a10 � b10 �

.  .  .  .  .  .  

.  .  .  block  2   a2 � b2 �

block  3   a3 � b3 �

.  .  .   .  .  .  

recover  b1,...,b10    like  in  RS  

Efficient  data-­‐recovery  

Page 53: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  1   a1 � b1 �

block  11  

block  12  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

a10 � b10 �

.  .  .  .  .  .  

.  .  .  block  2   a2 � b2 �

block  3   a3 � b3 �

.  .  .   .  .  .  

recover  b1,...,b10    like  in  RS  

Efficient  data-­‐recovery  

Page 54: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  1   a1 � b1 �

block  11  

block  12  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

a10 � b10 �

.  .  .  .  .  .  

.  .  .  block  2   a2 � b2 �

block  3   a3 � b3 �

.  .  .   .  .  .  

recover  b1,...,b10    like  in  RS  

subtract  f2(b1,...,b10)  

Efficient  data-­‐recovery  

Page 55: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  1   a1 � b1 �

block  11  

block  12  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

a10 � b10 �

.  .  .  .  .  .  

.  .  .  block  2   a2 � b2 �

block  3   a3 � b3 �

.  .  .   .  .  .  

recover  b1,...,b10    like  in  RS  

subtract  f2(b1,...,b10)  

Efficient  data-­‐recovery  

Page 56: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  1   a1 � b1 �

block  11  

block  12  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

a10 � b10 �

.  .  .  .  .  .  

.  .  .  block  2   a2 � b2 �

block  3   a3 � b3 �

.  .  .   .  .  .  

recover  b1,...,b10    like  in  RS  

subtract  f2(b1,...,b10)   remove  effect  of  a2  and  a3  to  get  a1  

Efficient  data-­‐recovery  

Page 57: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  1   a1 � b1 �

block  11  

block  12  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f1(a1,a2,a3,0,...,0) �

a10 � b10 �

.  .  .  .  .  .  

.  .  .  block  2   a2 � b2 �

block  3   a3 � b3 �

.  .  .   .  .  .  

recover  b1,...,b10    like  in  RS  

subtract  f2(b1,...,b10)   remove  effect  of  a2  and  a3  to  get  a1  

Download  &  IO:    20  in  RS    13  in  Piggybacked-­‐RS  

Page 58: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f4(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f4(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

Repair  of  blocks  1,2,3  

Efficient  data-­‐recovery  

Page 59: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f4(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f4(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

Repair  of  blocks  4,5,6  

Efficient  data-­‐recovery  

Page 60: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f4(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f4(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

Repair  of  blocks  7,8,9  

Efficient  data-­‐recovery  

Page 61: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

(10,4)  Piggybacked-­‐RS  code  

block  11  

block  12  

block  13  

block  14  

block  1  

block  10  

f1(a1,...,a10) �

f2(a1,...,a10) �

f3(a1,...,a10) �

f4(a1,...,a10) �

f1(b1,...,b10) �

f2(b1,...,b10) + f4(a1,a2,a3,0,...,0) �

f3(b1,...,b10) + f4(0,...,0,a4,a5,a6,0,...,0) �

f4(b1,...,b10) + f4(0,...,0,a7,a8,a9,0) �

a1 �

a10 �

b1 �

b10 �

.  .  .  .  .  .  

.  .  .  

Repair  of  block  10  

Efficient  data-­‐recovery  

Page 62: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Expected  Performance  •  Storage  efficiency  and  reliability  

–  no  addi.onal  storage  vs  RS    –  same  fault-­‐tolerance  vs  RS  

Page 63: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Expected  Performance  •  Storage  efficiency  and  reliability  

–  no  addi.onal  storage  vs  RS    –  same  fault-­‐tolerance  vs  RS  

•  Reduced  recovery  download  &  disk  IO  –  30%  less  for  single  block  recoveries  in  stripe  –  poten.al  reduc.on  >50TB  cross-­‐rack  traffic  per  day  

Page 64: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Expected  Performance  •  Storage  efficiency  and  reliability  

–  no  addi.onal  storage  vs  RS    –  same  fault-­‐tolerance  vs  RS  

•  Reduced  recovery  download  &  disk  IO  –  30%  less  for  single  block  recoveries  in  stripe  –  poten.al  reduc.on  >50TB  cross-­‐rack  traffic  per  day  

•  Recovery  .me:  expect  faster  recovery  –  need  to  connect  to  more  nodes  –  system  limited  by  disk  and  network  bandwidth  –  corroborated  by  preliminary  experiments  

–  hence,  expect  higher  MTTDL    

Page 65: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Related  Work:  Measurements  •  Exis.ng  Studies    

– Availability  studies:            Schroeder  &  Gibson  2007,  Jiang  et  al.  2008,  Ford  et  al.  2010  etc.  – Comparisons  between  replica.on  and  erasure  codes:            Rodrigues  &  Liskov  2005,  Weatherspoon  &  Kubiatowicz  2002  etc.    

•  Our  focus  – Increased  network  traffic  due  to  increased  downloads  during  recovery  of  erasure-­‐coded  data  

– Measurements  from  Facebook  warehouse  cluster  in  produc.on  

Page 66: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

•  Huang  et  al.  (Windows  Azure)  2012,  Sathiamoorthy  et  al.  (Xorbas)  2013  –  add  addi.onal  pari.es:  need  extra  storage  

•  Hu  et  al.  (NCFS)  2011  –  Network  file  system  using  ‘repair-­‐by-­‐transfer’  codes  (Shah  et  al.):              need  extra  storage  

•  Khan  et  al.  (Rotated-­‐RS)  2012  –  #parity  ≤  3    (also,  #data  ≤  36)    

•  Xiang  et  al.,  Wang  et  al.  (Op.mized  RDP  &  EVENODD)  2010  –  #parity  <=2  

•  Our  solu;on:  Piggybacked-­‐RS  –  no  addi.onal  storage:  storage-­‐capacity  op.mal  –  any  #data  &  #parity  –  as  good  as  or  bezer  than  Rotated-­‐RS,  op.mized  RDP  &  EVENODD  

Related  Work:  Codes  for  Efficient  Data  Recovery  

Page 67: A Solution to the Network Challenges of Data Recovery in Erasure … · 2019-12-18 · Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage:

Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”  

Summary  and  Future  Work  •  Erasure  codes  require  higher  download  &  IO  for  recovery  

•  Measurements  from  Facebook  warehouse  cluster  in  produc.on  

•  Piggybacked-­‐RS:  alterna.ve  to  RS    –  no  addi.onal  storage  required;  same  fault-­‐tolerance  as  RS  –  30%  reduc.on  in  download  &  disk  IO  for  recovery  

•  Future  Work  –  implementa.on  in  HDFS  (in  progress  at  UC  Berkeley)  –  empirical  evalua.on