Hadoop&BigData:Craingthe...

28
© Avalon Consul,ng, LLC 2014 Hadoop & Big Data: Cra>ing the Enterprise Strategy Sriram Mohan Senior Consultant, Avalon Consul,ng, LLC Associate Professor of CSSE, RoseHulman Ins,tute of Technology

Transcript of Hadoop&BigData:Craingthe...

Page 1: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Hadoop  &  Big  Data:  Cra>ing  the  Enterprise  Strategy  

Sriram  Mohan  Senior  Consultant,  Avalon  Consul,ng,  LLC  

Associate  Professor  of  CSSE,  Rose-­‐Hulman  Ins,tute  of  Technology  

Page 2: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Presenter  Overview  Who  we  are  •  Consultants  providing  expert  technical  integra,ons  for  

enterprise-­‐scale  Internet,  intranet,  and  extranet  sites    •  50+  staff,  mostly  senior-­‐level  consultants  •  Offices  in  Dallas  and  Washington,  D.C.    Best  known  for  our  work  in:  •  Enterprise  search  •  Hadoop,  big  data  •  Enterprise  content  management  •  Websites  &  portals  •  E-­‐learning  •  Unstructured,  semi-­‐structured  content      

 

Page 3: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Presenter  Overview  Who  we  are  •  Private  engineering  school  in  Terre  Haute,  Indiana  •  2,000  students,  12:1  faculty-­‐to-­‐student  ra,o  •  10  math,  science,  and  engineering  majors    Best  known  for  :  •  Best  undergraduate  engineering  school  in  the  country  •  14  years  in  a  row    

 

Page 4: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Overview  

•  Hadoop  ecosystem  •  Enterprise  strategy  •  Lambda  architecture    

Page 5: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

What  Is  Big  Data?  •  Volume  – Large  quan,,es  (think  gigabytes,  terabytes  of  informa,on  daily)  

•  Velocity  – Needs  to  processed  very  quickly  

•  Variety  – Data  might  be  structured,  unstructured,  varying  sources    

Page 6: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Overview  

•  Hadoop  ecosystem  •  Enterprise  Strategy  •  Lambda  Architecture    

Page 7: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Apache  Hadoop  

•  Distributed  plaaorm  for  data  processing  

•  Scalable  •  Runs  on  commodity  hardware  •  Data  &  analysis  coloca,on  

Page 8: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Typical  Hadoop  Stack  

Page 9: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Overview  

•  Hadoop  Ecosystem  •  Enterprise  strategy  •  Lambda  Architecture    

Page 10: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

How  Do  You  Introduce    Hadoop?  

Page 11: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Some  Tips  

•  Start  small  •  Build  a  POC  •  Replicate  an  exis,ng  system  in  Hadoop  (EDW  offload)  

Page 12: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

What  Would  an  EDW  Offload  Look  Like?  

•  How  do  you  bring  your  data  into  HDFS?  •  How  do  you  analyze  the  data  into  HDFS?  •  How  do  you  verify  the  results  of  the  analysis?  •  How  do  you  expose  the  results  of  your  analysis?  

Page 13: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

How  Do  You  Bring  Your  Data?  

•  Flume  •  Sqoop  •  Every  database/data  warehouse  has  a  Hadoop  connector  

Page 14: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

How  Do  You  Analyze  the  Data?  

•  Pig  •  Hive  •  HBase  

Page 15: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

How  Do  You  Verify?  

Page 16: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

How  Do  You  Expose  Your  Results?  

•  BI  tools  •  Export  the  data  back  to  a  data  warehouse  

Page 17: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Dealing  With  Semi-­‐Structured  Data  

•  Naviga,ng  the  world  of  NoSQL  with  Hadoop  •  Sample  use  cases  – Batch  processing  – Search  

•  How  does  Hadoop  fit  in?  – Use  Solr/elas,c  search    – Use  HBase  

Page 18: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Natural  Language  Processing  •  What  is  NLP?  •  Sample  use  cases  – Adding  metadata  to  emails  – Predic,ve  models  – Forecasts  

•  How  does  Hadoop  fit  in?  – Mahout  – Using  R  with  Hadoop  

Page 19: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Overview  

•  Hadoop  Ecosystem  •  Enterprise  Strategy  •  Lambda  architecture    

Page 20: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Resource  

•  Big  Data:  Principles  and  best-­‐prac4ces  of  scalable,  real4me  data  systems  –   Nathan  Marz  and  James  Warren  

Page 21: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Why  Do  We  Need  This?  

•  Compu,ng  arbitrary  func,ons  on  an  arbitrary  dataset  in  real-­‐,me  is  a  daun,ng  problem  

Page 22: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Batch  Layer  

Page 23: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Batch  Layer  

Page 24: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Serving  Layer  

Page 25: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Speed  Layer  

Page 26: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Handling  Queries  

Page 27: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Lambda  Architecture  

Page 28: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

©  Avalon  Consul,ng,  LLC  2014  

Ques,ons?