Hadoop&BigData:Craingthe...
Transcript of Hadoop&BigData:Craingthe...
© Avalon Consul,ng, LLC 2014
Hadoop & Big Data: Cra>ing the Enterprise Strategy
Sriram Mohan Senior Consultant, Avalon Consul,ng, LLC
Associate Professor of CSSE, Rose-‐Hulman Ins,tute of Technology
© Avalon Consul,ng, LLC 2014
Presenter Overview Who we are • Consultants providing expert technical integra,ons for
enterprise-‐scale Internet, intranet, and extranet sites • 50+ staff, mostly senior-‐level consultants • Offices in Dallas and Washington, D.C. Best known for our work in: • Enterprise search • Hadoop, big data • Enterprise content management • Websites & portals • E-‐learning • Unstructured, semi-‐structured content
© Avalon Consul,ng, LLC 2014
Presenter Overview Who we are • Private engineering school in Terre Haute, Indiana • 2,000 students, 12:1 faculty-‐to-‐student ra,o • 10 math, science, and engineering majors Best known for : • Best undergraduate engineering school in the country • 14 years in a row
© Avalon Consul,ng, LLC 2014
Overview
• Hadoop ecosystem • Enterprise strategy • Lambda architecture
© Avalon Consul,ng, LLC 2014
What Is Big Data? • Volume – Large quan,,es (think gigabytes, terabytes of informa,on daily)
• Velocity – Needs to processed very quickly
• Variety – Data might be structured, unstructured, varying sources
© Avalon Consul,ng, LLC 2014
Overview
• Hadoop ecosystem • Enterprise Strategy • Lambda Architecture
© Avalon Consul,ng, LLC 2014
Apache Hadoop
• Distributed plaaorm for data processing
• Scalable • Runs on commodity hardware • Data & analysis coloca,on
© Avalon Consul,ng, LLC 2014
Typical Hadoop Stack
© Avalon Consul,ng, LLC 2014
Overview
• Hadoop Ecosystem • Enterprise strategy • Lambda Architecture
© Avalon Consul,ng, LLC 2014
How Do You Introduce Hadoop?
© Avalon Consul,ng, LLC 2014
Some Tips
• Start small • Build a POC • Replicate an exis,ng system in Hadoop (EDW offload)
© Avalon Consul,ng, LLC 2014
What Would an EDW Offload Look Like?
• How do you bring your data into HDFS? • How do you analyze the data into HDFS? • How do you verify the results of the analysis? • How do you expose the results of your analysis?
© Avalon Consul,ng, LLC 2014
How Do You Bring Your Data?
• Flume • Sqoop • Every database/data warehouse has a Hadoop connector
© Avalon Consul,ng, LLC 2014
How Do You Analyze the Data?
• Pig • Hive • HBase
© Avalon Consul,ng, LLC 2014
How Do You Verify?
© Avalon Consul,ng, LLC 2014
How Do You Expose Your Results?
• BI tools • Export the data back to a data warehouse
© Avalon Consul,ng, LLC 2014
Dealing With Semi-‐Structured Data
• Naviga,ng the world of NoSQL with Hadoop • Sample use cases – Batch processing – Search
• How does Hadoop fit in? – Use Solr/elas,c search – Use HBase
© Avalon Consul,ng, LLC 2014
Natural Language Processing • What is NLP? • Sample use cases – Adding metadata to emails – Predic,ve models – Forecasts
• How does Hadoop fit in? – Mahout – Using R with Hadoop
© Avalon Consul,ng, LLC 2014
Overview
• Hadoop Ecosystem • Enterprise Strategy • Lambda architecture
© Avalon Consul,ng, LLC 2014
Resource
• Big Data: Principles and best-‐prac4ces of scalable, real4me data systems – Nathan Marz and James Warren
© Avalon Consul,ng, LLC 2014
Why Do We Need This?
• Compu,ng arbitrary func,ons on an arbitrary dataset in real-‐,me is a daun,ng problem
© Avalon Consul,ng, LLC 2014
Batch Layer
© Avalon Consul,ng, LLC 2014
Batch Layer
© Avalon Consul,ng, LLC 2014
Serving Layer
© Avalon Consul,ng, LLC 2014
Speed Layer
© Avalon Consul,ng, LLC 2014
Handling Queries
© Avalon Consul,ng, LLC 2014
Lambda Architecture
© Avalon Consul,ng, LLC 2014
Ques,ons?