Running Hadoop as Service in AltiScale Platform

Post on 14-Jul-2015

320 views 5 download

Transcript of Running Hadoop as Service in AltiScale Platform

Experiences in running Hadoop As A Service chaiken@altiscale.com = #HadoopSherpa

DAVID CHAIKEN • 21 NOVEMBER 2014

Talk Outline

Altiscale Company Introduction and Perspective

Altiscale Architecture

Use Cases: Performance, Job Analysis, Scheduling

Infinite Hadoop

Challenges to the Hadoop Community

Copyright  ©  2014  Al2scale,  Inc.  

Corporate Background

Hadoop-as-a-Service (HaaS) innovator

Company founded in 2012 (Palo Alto & Chennai)

Founding team from Yahoo •  Raymie Stata, CEO, Former CTO

•  David Chaiken, CTO, Former Chief Architect

•  Charles Wimmer, Head of Operations, Former SRE

Employees from Yahoo, Google, Netflix, LinkedIn, VMware and others

Top-tier investors Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Chennai

Long-term colleagues from Yahoo and before

IIT Madras Research Park (back gate of IIT-M)

Architecture, Core Development, Test (Apache Bigtop)

Control Plane agile development, 2-week sprints

Next: Test++, Customer Support, Operations

Copyright  ©  2014  Al2scale,  Inc.  

Everybody Loves Hadoop But…

Significant capex expenditure on infrastructure

•  Complex to manage and maintain

Time to get cluster up and running is long

Capacity planning is difficult

Skillset is difficult to recruit, train and retain

 What  about  the  cloud?  

Copyright  ©  2014  Al2scale,  Inc.  

True Hadoop-as-a-Service

Altiscale is the industry’s first purpose-built, petabyte scale Hadoop cloud

•  Altiscale operates Hadoop for you •  Infrastructure optimized to run Hadoop

fast and reliably •  Pay for Hadoop service, not

infrastructure

Copyright  ©  2014  Al2scale,  Inc.  

We Team With You To Help Deliver Insights

Poten2al  insights  from  a  flood  of  data  generated  by  the  

connected  world  

Our  Opera2ons  Team  and  Hadoop  Cloud  helps  realize  

those  insights  

+  

Customer   Al,scale  

Copyright  ©  2014  Al2scale,  Inc.  

Customers

Copyright  ©  2014  Al2scale,  Inc.  

How We Do It

Virtual  Hadoop  Cluster  

 YARN  Service  

HDFS  Service  

More  Apps  

File  Transfer  

 KaRa  Flume  

Data  Connect  

Hive   Pig   Oozie  

Pre-­‐configured  Apps   We  op2mize  the  job  to  complete  fast  

and  cost-­‐effec2vely  

Your  data  is  migrated  to  HDFS  

and  a  virtual  Hadoop  cluster  in  

our  cloud  

Our  Hadoop  Helpdesk  gives  you  access  to  Hadoop  experts  

Our  Hadoop  Opera2ons  Team  maintains  the  

cluster  and  plans  the  job  

Our  team  monitors  and  manages  the  job  through  to  comple2on  

We  provide  an  up2me  SLA  so  our  Hadoop  

cloud  is  always  available  Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Data and Control Planes

Copyright  ©  2014  Al2scale,  Inc.  

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Data and Control Planes

Altiscale Architecture: Customer Environments

Copyright  ©  2014  Al2scale,  Inc.  

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: O&O Hadoop Cluster

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Host Components

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Workbenches

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Data Transfer

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Portal and REST API

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Control Plane Databases

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Control Plane Services

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale Architecture: Hadoop-Based Analysis

Hadoop as a Service Offering

Data is migrated to our HDFS service HDFS  Service  

Data  Connectors  

Foundry  Apps  Apache  Mahout    Cascading    Revolu2on  R  KaRa/Camus  Avro    Pentaho  Ke\le    Matlab  Spark  Sqoop  H2O  

Core  Apps  Apache  Hive    Apache  Pig    Apache  Oozie    Apache  HCatalog    Apache  Flume    R  JDK/JRE  Python  H\pFS  FUSE  LZOP,  Snappy,  gzip  

Terminal access to Hadoop cluster and associated apps

Portal provides job status, billing and support information

1  

2  

3  

Copyright  ©  2014  Al2scale,  Inc.  

Challenges…

Copyright  ©  2014  Al2scale,  Inc.  

Disks: Configuration, Controllers, Density, Cost

Network: Jumbo Packet MTU

Memory: echo never > \

/sys/kernel/mm/redhat_transparent_hugepage/enabled

Network: When does locality matter?

Flash: When to use SSD?

Performance Challenges…

Copyright  ©  2014  Al2scale,  Inc.  

Customer provided Hive query + data sets (100GBs to ~5 TBs) Needed help optimizing the query Didn’t rewrite query immediately Wanted to characterize query performance and isolate bottlenecks first

Customer Case Study: Analyze Query

Ran original query on the datasets in our environment: •  Two M/R Stages: Stage-1, Stage-2

Long running reducers run out of memory •  set mapreduce.reduce.memory.mb=5120!•  Reduces slots and extends reduce time

Query fails to launch Stage-2 with out of memory •  set HADOOP_HEAPSIZE=1024 on client machine

Query has 250,000 Mappers in Stage-2 which causes failure

•  set mapred.max.split.size=5368709120 to reduce Mappers

Analyze and Tune Execution

Next challenge - how to visualize job execution? Existing hadoop/hive logs not sufficient for this task Wrote internal tools

•  parse job history files •  plot mapper and reducer execution

Analysis: Job Execution Characteristics

Analysis: Map (Stage-1)

Single  reduce  task  

Analysis: Reduce (Stage-1) Long Tail

Analysis: Map (Stage-2)

Analysis: Reduce (Stage-2)

Lone, long running reducer in first stage of query Analyzed input data:

•  Query split input data by userId •  Bucketizing input data by userId •  One very large bucket: “invalid” userId •  Discussed “invalid” userid with customer

An error value is a common pattern! •  Need to differentiate between “Don’t know and don’t care”

or “don’t know and do care.”

Analysis Execution: Findings

Loading data into DRAM makes processing fast! Examples: Spark, Impala, 0xdata, …, [SAP HANA], … Streaming systems (Storm, DataTorrent) may be similar Need to increase YARN container memory size

Interactive (DRAM-centric) Processing Systems

Caution: larger YARN container settings for interactive jobs may not be right for batch systems like Hive Container size: needs to combine vcores and memory: yarn.scheduler.maximum-allocation-vcores yarn.nodemanager.resource.cpu-vcores ...!

Hive + Interactive: Watch Out for Container Size

Attempting to schedule interactive systems and batch systems like Hive may result in fragmentation Interactive systems may require all-or-nothing scheduling Batch jobs with little tasks may starve interactive jobs

Hive + Interactive: Watch Out for Fragmentation

Solutions for fragmentation… Reserve interactive nodes before starting batch jobs Reduce interactive container size (if the algorithm permits) Node labels (YARN-726) and gang scheduling (YARN-624)

Hive + Interactive: Watch Out for Fragmentation

Altiscale’s point of view on Hadoop as a Service:

•  sell HDFS in increments of 10 TB

•  sell compute in increments of 10K TaskHours/Month

We market Infinite Hadoop, and provide services so that customers need not worry about cluster nodes.

But Apache Hadoop user interfaces provide node-oriented view of clusters…

Copyright  ©  2014  Al2scale,  Inc.  

Altiscale: Hadoop Storage and Compute

ResourceManager User Interface

Copyright  ©  2014  Al2scale,  Inc.  

ResourceManager User Interface

Copyright  ©  2014  Al2scale,  Inc.  

NameNode User Interface

Copyright  ©  2014  Al2scale,  Inc.  

NameNode User Interface

Copyright  ©  2014  Al2scale,  Inc.  

Feedback from Customers Storage plan normally easy to estimate

Compute plan is hard to estimate •  Customer pain point: achieving necessary

computation needs sometimes requires more peak compute capacity than provided by the number of nodes required for storage

•  Opportunity: average compute often requires less than the number of nodes required for storage

Copyright  ©  2014  Al2scale,  Inc.  

Solution: Change Altiscale’s Product! Make “Infinite” computation available to customers

Multitenancy implementation phases, each of which includes a milestone with production deliverables

0. Automation for burn/add/remove nodes 1. Deploy Linux containers using Docker 2. Decouple compute/storage + manual bursting 3. Automation: orchestrate add/remove nodes according to

allocation plan from the capacity team. 4. Optimized: predictive allocation, economic incentives

Copyright  ©  2014  Al2scale,  Inc.  

Physical Cluster per Customer

Copyright  ©  2014  Al2scale,  Inc.  

NM and DN in Docker Containers

Copyright  ©  2014  Al2scale,  Inc.  

Decouple Compute/Storage

Copyright  ©  2014  Al2scale,  Inc.  

What Customers Get On demand access to “Infinite” Computation

Ability to handle unexpected needs without contacting Altiscale

“Access to a $10M cluster for just $1M”

Future…

Ability to package Hadoop job environment using Docker (YARN-1964)

Copyright  ©  2014  Al2scale,  Inc.  

Hive + Hadoop debugging can get very complex •  Sifting through many logs and screens

•  Automatic transmission versus manual transmission

Static partitioning induced by Java Virtual Machine has benefits but also induces challenges. Where there are difficulties, there’s opportunity:

•  Better tooling, instrumentation, integration of logs/metrics

YARN still evolving into an operating system Just starting to build real multitenancy into Hadoop. Hadoop as a Service: aggregate and share expertise

Challenges to the Hadoop Community