Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

19
Page 1 © Hortonworks Inc. 2014 Discover HDP 2.1 Apache Hadoop 2.4.0, YARN & HDFS Hortonworks. We do Hadoop.

description

This is the presentation from the "Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS" webinar on May 28, 2014. Rohit Bahkshi, a senior product manager at Hortonworks, and Vinod Vavilapalli, PMC for Apache Hadoop, discuss an overview of YARN in HDFS and new features in HDP 2.1. Those new features include: HDFS extended ACLs, HTTPs wire encryption, HDFS DataNode caching, resource manager high availability, application timeline server, and capacity scheduler pre-emption.

Transcript of Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 1: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 1 © Hortonworks Inc. 2014

Discover HDP 2.1 Apache Hadoop 2.4.0, YARN & HDFS

Hortonworks. We do Hadoop.

Page 2: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 2 © Hortonworks Inc. 2014

Speakers

Justin Sears

Hortonworks Product Marketing Manager

Rohit Bakhshi

Hortonworks Senior Product Manager & PM for Apache Hadoop & Apache Solr in Hortonworks Data Platform

Vinod Vavilapalli

Foundational Hadoop Architect, Hortonworks Engineer, PMC for Apache Hadoop & Leads YARN Development at Hortonworks

Page 3: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 3 © Hortonworks Inc. 2014

Agenda

•  Overview of YARN in HDFS

•  New YARN & HDFS Features in HDP 2.1

•  Q & A

Page 4: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 4 © Hortonworks Inc. 2014

OPERATIONS  TOOLS  

Provision, Manage & Monitor

DEV  &  DATA  TOOLS  

Build & Test

A Modern Data Architecture AP

PLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

RDBMS   EDW   MPP  

Business    Analy<cs  

Custom  Applica<ons  

Packaged  Applica<ons  

Gov

erna

nce

&

Inte

grat

ion

ENTERPRISE HADOOP

Secu

rity

Ope

ratio

ns

Data Access

Data Management

SOURC

ES  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

GeolocaCon  Data  

Page 5: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 5 © Hortonworks Inc. 2014

HDP 2.1: Enterprise Hadoop

HDP 2.1 Hortonworks Data Platform

   

Provision,  Manage  &  Monitor  

 Ambari  

Zookeeper  

Scheduling    

Oozie  

Data  Workflow,  Lifecycle  &  Governance  

 Falcon  Sqoop  Flume  NFS  

WebHDFS  YARN  :  Data  Opera<ng  System  

DATA    MANAGEMENT  

DATA    ACCESS  GOVERNANCE  &  INTEGRATION   OPERATIONS  

Script    Pig      

Search    

Solr      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  AnalyCcs,    ISV  engines  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

Batch    

Map  Reduce  

   

SECURITY  

Authen<ca<on  Authoriza<on  Accoun<ng  

Data  Protec<on    

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

Page 6: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 6 © Hortonworks Inc. 2014

HDP 2.1: Data Management

HDP 2.1 Hortonworks Data Platform

Provision,  Manage  &  Monitor  

 Ambari  

Zookeeper  

Scheduling    

Oozie  

Data  Workflow,  Lifecycle  &  Governance  

 Falcon  Sqoop  Flume  NFS  

WebHDFS  

DATA    ACCESS  GOVERNANCE  &  INTEGRATION   OPERATIONS  

Script    Pig      

Search    

Solr      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  AnalyCcs,    ISV  engines  

Batch    

Map  Reduce  

   

SECURITY  

Authen<ca<on  Authoriza<on  Accoun<ng  

Data  Protec<on    

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

   YARN  :  Data  Opera<ng  System  

DATA    MANAGEMENT  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

Page 7: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 7 © Hortonworks Inc. 2014

Agenda

Overview Features Q & A

Page 8: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 8 © Hortonworks Inc. 2014

Apache Hadoop YARN and HDFS

Flexible Enables other purpose-built data processing models beyond MapReduce (batch), such as interactive and streaming

Efficient Double processing IN Hadoop on the same hardware while providing predictable performance & quality of service

Shared Provides a stable, reliable, secure foundation and shared operational services across multiple workloads

The Data Operating System for Hadoop 2.0

Data  Processing  Engines  Run  Na<vely  IN  Hadoop  BATCH  

MapReduce  INTERACTIVE  

Tez  STREAMING  

Storm  IN-­‐MEMORY  

Spark  GRAPH  Giraph  

SAS  LASR,  HPA  

ONLINE  HBase,  Accumulo  

 OTHERS  

 

HDFS:  Redundant,  Reliable  Storage  

YARN:  Cluster  Resource  Management      

Page 9: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 9 © Hortonworks Inc. 2014

Agenda

Overview Features Q & A

Page 10: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 10 © Hortonworks Inc. 2014

HDP 2.1 HDFS: What’s New

HDFS  Extended  ACLs  •  Provides  granular  access  control  to  datasets  in  HDFS  

Security  

THEM

E  

HTTPs  Wire  Encryp<on    •  swebhdfs:  HTTPs support for WebHDFS •  HTTPs support for Hadoop WebUI

Security  

THEM

E  

HDFS  DataNode  Caching  •  Enhanced  read  performance  via  in  memory  caching  of  files  

Performance  

THEM

E  

Page 11: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 11 © Hortonworks Inc. 2014

HDFS Coordinated DataNode Caching

•  In memory cache for HDFS file - enhanced read performance

•  Identify files to be cached through centralized management controls

•  Manage caching through pools and directives

Page 12: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 12 © Hortonworks Inc. 2014

HDP 2.1 YARN: What’s New

Resource  Manager  High  Availability  •  No  service  disrupCon  in  YARN  

Reliability  

THEM

E  

Applica<on  Timeline  Server  •  Operational monitoring across all YARN applications

Monitoring  

THEM

E  

Capacity  Scheduler  Pre-­‐emp<on  •  Enforce  SLAs  across  applicaCons  and  organizaCons  

Scheduling  

THEM

E  

Page 13: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 13 © Hortonworks Inc. 2014

YARN Resource Manager (RM) HA

Automated failover HDP detects and reacts to Resource Manager host & process failures

Active/Standby Standby ResourceManager with access to shared state store

Fencing Protection against Split Brain

Full stack resiliency - Entire HDP Stack certified with ResourceManager HA - RM Restart enables application recovery

Integrated into HDP stack - No external HA Frameworks - No external storage needed

Page 14: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 14 © Hortonworks Inc. 2014

Client

Standby RM

Active RM

ZooKeeper Service Cluster

Monitor and try to take active lock

Monitor and maintain active

lock

Store State

YARN RM HA: Architecture

NodeManager NodeManager NodeManager

Page 15: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 15 © Hortonworks Inc. 2014

Application Timeline Server

Entity and Event collection

Applications of all types can create entities and send events

Pluggable store Depending on site requirements

REST APIs Applications and user-interfaces can access information via REST

Visualizations Users can build tools and visualizations using the APIs

Users and Admins Applications as well as the system entities/events

Page 16: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 16 © Hortonworks Inc. 2014

Application Timeline Server

App  Timeline  Server  

AMBARI  

Custom  App  

Monitoring  Client  

Page 17: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 17 © Hortonworks Inc. 2014

Capacity Scheduler Preemption

•  Enforce

SLAs

•  Preempt

across

queues

1.  Current Capacity 2.  Guaranteed Capacity 3.  Pending Requests

Gather    Queue    State  ST

EP  1  

1.  Figure out what is needed to achieve capacity balance 2.  Select applications to preempt: Over cap. Qs and FIFO order 3.  Respect bounds on amount of preemption allowed for each

round

Iden<fy  set  of  preemp<ons  

STEP

 2  

1.  Remove reservations from the most recently assigned app 2.  Issue preemptions for containers of same app (reverse

chronological order, last assigned container first) 3.  App Master pre-emption is last resort.

Preempt  applica<on(s)  

STEP

 3  

1.  Track containers that have been issued by not yet executed preemption

2.  After a set of execution periods, forcibly kill these containers Kill  containers  

STEP

 4  

Page 18: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 18 © Hortonworks Inc. 2014

Agenda

Overview Features Q & A

Page 19: Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

Page 19 © Hortonworks Inc. 2014

Learn More About the Hadoop Operating System

Hortonworks.com/labs/yarn/

Register for the remaining 3 Discover HDP 2.1 Webinars

Hortonworks.com/webinars

Next Webinar:

Apache Solr for Hadoop Search

Thursday, June 12, 10am Pacific