MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce...

13
MRemu: An Emula-onbased Framework for Datacenter Network Experimenta-on using Realis-c MapReduce Traffic Marcelo Veiga Neves 1 , Cesar A. F. De Rose 1 , Kostas Katrinis 2 [email protected] 1 PUCRS, Porto Alegre, Brazil 2 IBM Research, Dublin, Ireland Oct, 2015

Transcript of MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce...

Page 1: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

MRemu:  An  Emula-on-­‐based  Framework  for  Datacenter  Network  Experimenta-on  using  Realis-c  

MapReduce  Traffic    

Marcelo  Veiga  Neves1,  Cesar  A.  F.  De  Rose1,    Kostas  Katrinis2  

[email protected]      

1  PUCRS,  Porto  Alegre,  Brazil  2  IBM  Research,  Dublin,  Ireland  

Oct,  2015  

Page 2: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Context  

•  Big  Data  &  MapReduce  analy-cs  frameworks  – Scale  out  to  hundred  or  even  thousands  of  commodity  servers  

–  Increased  network  traffic  volumes  and  mul-plicity  of  traffic  paWerns  

•  Data  center  networks  for  Big  Data  – Scale-­‐out  topologies  (e.g.,  fat-­‐tree,  leaf-­‐spine)  – Network  control  soZware  (e.g,  SDN  –  IPDPS’15)    

Page 3: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Problem  

•  The  need  for  a  real  hardware  infrastructure  –  is  oZen  not  a  valid  op-on    – even  when  datacenter  resources  are  available    •  access  is  not  con-nuous  •  not  prac-cal  to  reconfigure  them  in  order  to  evaluate  different  network  topologies  and  characteris-cs  (e.g.,  bandwidth  and  latency)  

•  Alterna-ves:  – Simula-on  &  Emula-on  

Page 4: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Problem  •  Most  research  on  data  center  networks  do  not  use  realis-c  Big  Data  traffic  –  synthe-c  traffic  paWerns  

•  Simplified  shuffle-­‐like  traffic  paWerns  –  e.g.,  all-­‐to-­‐all  –  not  consider  transfer  scheduling  decisions,  number  of  parallel  transfers,  etc.  

–  overlap  communica-on  with  computa-on  

•  How  the  reported  results  translate  to  performance  improvement  for  actual  analy-cs  run-mes?  

4  

Page 5: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Network  Traffic  in  real  Hadoop  Applica-ons  

•  a  

5  

Network  transfers  

Network  transfers  

Page 6: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Proposed  solu-on:  MRemu  

•  Emula-on-­‐based  framework  for  data  center  network  experimenta-on  

•  Highlights:  – Ability  to  run  a  complete  data  center  in  a  single  server  – Use  of  realis-c  network  traffic  

•  replay  or  extrapolate  from  execu-ons  of  real  applica-ons  in  produc-on  datacenters  

– Mimics  framework  internals  (e.g,  transfer  scheduling,  phases  overlaps,  etc.)  

– Unmodified  code  also  run  in  real  hardware  

Page 7: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

MRemu  Architecture  

7  

Job Trace

HadoopJob Tracing

SynthecticJob Generator

TopologyDescription

Mininet-HiFI

TopologyBuilder

ApplicationLauncher

NetworkMonitor

Data center emulator

TaskTracker

Job TraceParser

TrafficGenerator

Hadoop MapReduce emulator

Logger

JobTracker

*Mininet  can  be  replaced  with  real  hardware  (e.g.,  run  in  legacy  clusters)  

*  

Page 8: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Evalua-on  •  Mininet-­‐HiFi  has  already  been  validated  and  is  widely  used  to  reproduce  networking  research  experiments  

•  Accuracy  when  reproducing  MapReduce  workloads.    –  Comparison  with  traces  extracted  from  real  job  execu-ons  –  Two  opera-ons  modes:  replay  mode  and  hadoop  mode  

•  Execu-on  environment:  –  Shamrock  datacenter,  IBM  Research  –  HiBench  Benchmark  Suite:    Sort,  Nutch,  PageRank  and  Bayes  

8  

Handigol,  N.;  Heller,  B.;  Jeyakumar,  V.;  Lantz,  B.;  McKeown,  N.  “Reproducible  Network  Experiments  Using  Container-­‐Based  Emula-on”.  In:  Proceedings  of  the  8th  Interna-onal  Conference  on  Emerging  Networking  Experiments  and  Technologies,  2012,  pp.  253–264.  

Page 9: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Accuracy  Evalua-on  

9  

Job  Comple-on  Time  Accuracy   Individual  Flow  Comple-on  Time  Accuracy  

Page 10: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Nutch  applica-on  with  background  traffic  

Sort  applica-on  with  background  traffic  

Par--on  skew  problem  

Impact  of  the  network  topology  

Other  experiments  

Page 11: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

Conclusion  and  Future  Work  •  MRemu,  an  emula-on-­‐based  framework  that  enables  conduc-ng  datacenter  network  research  –  without  requiring  expensive  and  con-nuous  access  to  large-­‐scale  datacenter  hardware  resources    

•  Available  as  open  source:  –  hWps://github.com/mvneves/mremu  

•  Future  work:  –  Extend  it  to  other  frameworks  and  traffic  paWerns  –  Integrate  it  with  Mininet-­‐HiFI  cluster  edi-on  –  Support  to  migra-on  of  “virtual  machines”  

Page 12: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

References  •  NEVES,  M.  V.,:  Applica-on-­‐aware  networking  to  Accelerate  

MapReduce  Applica-ons  (Ph.D.  Disserta-on),  2015  

•  NEVES,  M.  V.;  KATRINIS,  M.  K.;  FRANKE,  H.;  DE  ROSE,  C.  A.  F.;  Pythia:  Faster  Big  Data  in  Mo-on  through  Predic-ve  SoZware-­‐Defined  Network  Op-miza-on  at  Run-me.  In:  IPDPS  2014,  Phoenix,  USA,  2014  

 •  NEVES,  M.  V.,  DE  ROSE,  C.  A.  F.,  KATRINIS,  K.  MRemu:  An  

Emula-on-­‐based  Framework  for  Datacenter  Network  Experimenta-on  using  Realis-c  MapReduce  Traffic,  MASCOTS  2015,  Atlanta,  USA,  2015.  

Page 13: MRemu: An Emulation-based Framework for Datacenter Network Experimentation using Realistic MapReduce Traffic

MRemu:  An  Emula-on-­‐based  Framework  for  Datacenter  Network  Experimenta-on  using  Realis-c  

MapReduce  Traffic    

Marcelo  Veiga  Neves1,  Cesar  A.  F.  De  Rose1,    Kostas  Katrinis2  

[email protected]      

1  PUCRS,  Porto  Alegre,  Brazil  2  IBM  Research,  Dublin,  Ireland  

Oct,  2015