Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
-
Upload
hortonworks -
Category
Software
-
view
1.390 -
download
0
description
Transcript of Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Page 1 © Hortonworks Inc. 2014
Discover HDP 2.1 Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks. We do Hadoop.
Page 2 © Hortonworks Inc. 2014
Speakers
Justin Sears
Hortonworks Product Marketing Manager
Rohit Bakhshi
Hortonworks Senior Product Manager & PM for Apache Hadoop & Apache Solr in Hortonworks Data Platform
Vinod Vavilapalli
Foundational Hadoop Architect, Hortonworks Engineer, PMC for Apache Hadoop & Leads YARN Development at Hortonworks
Page 3 © Hortonworks Inc. 2014
Agenda
• Overview of YARN in HDFS
• New YARN & HDFS Features in HDP 2.1
• Q & A
Page 4 © Hortonworks Inc. 2014
OPERATIONS TOOLS
Provision, Manage & Monitor
DEV & DATA TOOLS
Build & Test
A Modern Data Architecture AP
PLICAT
IONS
DATA
SYSTEM
REPOSITORIES
RDBMS EDW MPP
Business Analy<cs
Custom Applica<ons
Packaged Applica<ons
Gov
erna
nce
&
Inte
grat
ion
ENTERPRISE HADOOP
Secu
rity
Ope
ratio
ns
Data Access
Data Management
SOURC
ES
OLTP, ERP, CRM Systems
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
GeolocaCon Data
Page 5 © Hortonworks Inc. 2014
HDP 2.1: Enterprise Hadoop
HDP 2.1 Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS YARN : Data Opera<ng System
DATA MANAGEMENT
DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS
Script Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Others
In-‐Memory AnalyCcs, ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Batch
Map Reduce
SECURITY
Authen<ca<on Authoriza<on Accoun<ng
Data Protec<on
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox
Page 6 © Hortonworks Inc. 2014
HDP 2.1: Data Management
HDP 2.1 Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS
DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS
Script Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Others
In-‐Memory AnalyCcs, ISV engines
Batch
Map Reduce
SECURITY
Authen<ca<on Authoriza<on Accoun<ng
Data Protec<on
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox
YARN : Data Opera<ng System
DATA MANAGEMENT
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Page 7 © Hortonworks Inc. 2014
Agenda
Overview Features Q & A
Page 8 © Hortonworks Inc. 2014
Apache Hadoop YARN and HDFS
Flexible Enables other purpose-built data processing models beyond MapReduce (batch), such as interactive and streaming
Efficient Double processing IN Hadoop on the same hardware while providing predictable performance & quality of service
Shared Provides a stable, reliable, secure foundation and shared operational services across multiple workloads
The Data Operating System for Hadoop 2.0
Data Processing Engines Run Na<vely IN Hadoop BATCH
MapReduce INTERACTIVE
Tez STREAMING
Storm IN-‐MEMORY
Spark GRAPH Giraph
SAS LASR, HPA
ONLINE HBase, Accumulo
OTHERS
HDFS: Redundant, Reliable Storage
YARN: Cluster Resource Management
Page 9 © Hortonworks Inc. 2014
Agenda
Overview Features Q & A
Page 10 © Hortonworks Inc. 2014
HDP 2.1 HDFS: What’s New
HDFS Extended ACLs • Provides granular access control to datasets in HDFS
Security
THEM
E
HTTPs Wire Encryp<on • swebhdfs: HTTPs support for WebHDFS • HTTPs support for Hadoop WebUI
Security
THEM
E
HDFS DataNode Caching • Enhanced read performance via in memory caching of files
Performance
THEM
E
Page 11 © Hortonworks Inc. 2014
HDFS Coordinated DataNode Caching
• In memory cache for HDFS file - enhanced read performance
• Identify files to be cached through centralized management controls
• Manage caching through pools and directives
Page 12 © Hortonworks Inc. 2014
HDP 2.1 YARN: What’s New
Resource Manager High Availability • No service disrupCon in YARN
Reliability
THEM
E
Applica<on Timeline Server • Operational monitoring across all YARN applications
Monitoring
THEM
E
Capacity Scheduler Pre-‐emp<on • Enforce SLAs across applicaCons and organizaCons
Scheduling
THEM
E
Page 13 © Hortonworks Inc. 2014
YARN Resource Manager (RM) HA
Automated failover HDP detects and reacts to Resource Manager host & process failures
Active/Standby Standby ResourceManager with access to shared state store
Fencing Protection against Split Brain
Full stack resiliency - Entire HDP Stack certified with ResourceManager HA - RM Restart enables application recovery
Integrated into HDP stack - No external HA Frameworks - No external storage needed
Page 14 © Hortonworks Inc. 2014
Client
Standby RM
Active RM
ZooKeeper Service Cluster
Monitor and try to take active lock
Monitor and maintain active
lock
Store State
YARN RM HA: Architecture
NodeManager NodeManager NodeManager
Page 15 © Hortonworks Inc. 2014
Application Timeline Server
Entity and Event collection
Applications of all types can create entities and send events
Pluggable store Depending on site requirements
REST APIs Applications and user-interfaces can access information via REST
Visualizations Users can build tools and visualizations using the APIs
Users and Admins Applications as well as the system entities/events
Page 16 © Hortonworks Inc. 2014
Application Timeline Server
App Timeline Server
AMBARI
Custom App
Monitoring Client
Page 17 © Hortonworks Inc. 2014
Capacity Scheduler Preemption
• Enforce
SLAs
• Preempt
across
queues
1. Current Capacity 2. Guaranteed Capacity 3. Pending Requests
Gather Queue State ST
EP 1
1. Figure out what is needed to achieve capacity balance 2. Select applications to preempt: Over cap. Qs and FIFO order 3. Respect bounds on amount of preemption allowed for each
round
Iden<fy set of preemp<ons
STEP
2
1. Remove reservations from the most recently assigned app 2. Issue preemptions for containers of same app (reverse
chronological order, last assigned container first) 3. App Master pre-emption is last resort.
Preempt applica<on(s)
STEP
3
1. Track containers that have been issued by not yet executed preemption
2. After a set of execution periods, forcibly kill these containers Kill containers
STEP
4
Page 18 © Hortonworks Inc. 2014
Agenda
Overview Features Q & A
Page 19 © Hortonworks Inc. 2014
Learn More About the Hadoop Operating System
Hortonworks.com/labs/yarn/
Register for the remaining 3 Discover HDP 2.1 Webinars
Hortonworks.com/webinars
Next Webinar:
Apache Solr for Hadoop Search
Thursday, June 12, 10am Pacific