Hadoop in the Enterprise - README

Hadoop in the Enterprise

Jeff Markham Technical Director, APAC Hortonworks

Modern Architecture with Hadoop 2

Hadoop Wave ONE: Web-scale Batch Apps

Customers want solutions & convenience

Customers want technology & performance

Source: Geoffrey Moore - Crossing the Chasm

2006 to 2012 Web-Scale

Batch Applications

Innovators, technology enthusiasts

Early adopters,

visionaries

Early majority,

pragmatists

Late majority,

conservatives

Laggards, Skeptics

Customers want solutions & convenience

Customers want technology & performance

Hadoop Wave TWO: Broad Enterprise Apps

Source: Geoffrey Moore - Crossing the Chasm

Innovators, technology enthusiasts

Early adopters,

visionaries

Early majority,

pragmatists

Late majority,

conservatives

Laggards, Skeptics

2013 & Beyond Batch, Interactive, Online, Streaming, etc., etc.

2.0 Architected for the Broad Enterprise

Hadoop 2.0 Key Highlights

Rolling Upgrades

Disaster Recovery

Snapshots

Full Stack HA

Hive on Tez

HDP 2.0 Features

Single Cluster, Many Workloads

INTERACTIVE

ONLINE

STREAMING

ZERO downtime

Multi Data Center

Point in time Recovery

Reliability

Interactive Query

Mixed workloads

Enterprise Requirements

The 1st Generation of Hadoop: Batch

HADOOP 1.0 Built for Web-Scale Batch Apps

Single App

INTERACTIVE

Single App

•  All other usage patterns must leverage that same infrastructure

•  Forces the creation of silos for managing mixed workloads

Single App

ONLINE

A Transition From Hadoop 1 to 2

HADOOP 1.0

HDFS (redundant, reliable storage)

MapReduce (cluster resource management

& data processing)

A Transition From Hadoop 1 to 2

HADOOP 1.0

MapReduce (cluster resource management

& data processing)

YARN (cluster resource management)

MapReduce (data processing)

Others (data processing)

HADOOP 2.0

The Enterprise Requirement: Beyond Batch

To become an enterprise viable data platform, customers have told us they want to store ALL DATA in one place and interact with it in MULTIPLE WAYS Simultaneously & with predictable levels of service

HDFS (Redundant, Reliable Storage)

BATCH INTERACTIVE STREAMING GRAPH IN-‐MEMORY HPC MPI ONLINE OTHER

YARN: Taking Hadoop Beyond Batch

• Created to manage resource needs across all uses

• Ensures predictable performance & QoS for all apps • Enables apps to run “IN” Hadoop rather than “ON”

– Key to leveraging all other common services of the Hadoop platform: security, data lifecycle management, etc.

ApplicaIons Run NaIvely IN Hadoop

HDFS2 (Redundant, Reliable Storage)

YARN (Cluster Resource Management)

BATCH (MapReduce)

INTERACTIVE (Tez)

STREAMING (Storm, S4,…)

GRAPH (Giraph)

IN-‐MEMORY (Spark)

HPC MPI (OpenMPI)

ONLINE (HBase)

OTHER (Search) (Weave…)

Old School Hadoop: MapReduce

ResourceManager

Client

MapReduce Status

Job Submission

Client

NodeManager

Container Container

NodeManager

App Mstr Container

NodeManager

Container App Mstr

Node Status

Resource Request

New School Hadoop with YARN

5 5 Key Benefits of YARN

1.  Scale!

2.  Compatibility with MapReduce.

3.  Improved cluster utilization.

4.  New Programming Models

5.  Agility

Apache Tez

• An alternate data processing framework to MapReduce

•  Improves performance of low-latency applications

SQL-IN-Hadoop with Apache Hive

• Apache Hive: First Application to use YARN • Hive on Tez optimizes resource for Hive

queries to improve performance – Apache Hive is the standard for SQL interaction

in Hadoop (Most applications claim Hive compatibility today)

– Apache Tez: optimized for YARN, general purpose processing framework for existing Hadoop applications

Stinger Initiative Simple Focus

MAP REDUCE TEZ

Business AnalyIcs

Custom Apps

SInger Phase 3 •  Vector Query •  Buffer Cache •  Query Planner

SInger Phase 2 •  YARN Resource Mgmnt •  Hive on Apache Tez •  Query Service (always on)

SInger Phase 1 •  Base OpJmizaJons •  SQL AnalyJcs •  ORCFile Format

1 2Improve existing tools & preserve investments

Enable Hive to support interactive workloads

Increased SQL Compatibility

100x Performance Improvement

SQL Compliance Highlights

Hive: More SQL & 100X Faster

Stinger Phase 3 •  Vector Query •  Buffer Cache •  Query Planner

Stinger Phase 2 •  YARN Resource Mgmnt •  Hive on Apache Tez •  Query Service

Stinger Phase 1 •  Base Optimizations •  SQL Analytics •  ORCFile Format

We Are Here

Done in Hive 0.11

VARCHAR

DECIMAL

Sub-queries for IN/NOT IN, HAVING

EXISTS / NOT EXISTS

INTERSECT, EXCEPT

UNION DISTINCT and UNION outside of subquery

ROLLUP and CUBE

Windowing functions (OVER, RANK, etc.)

Work Started

Hive’s Performance Trajectory

http://hortonworks.com/blog/delivering-on-stinger-a-phase-3-progress-update/

Making Hadoop Enterprise Ready

Thank You!

http://hortonworks.com/sandbox

Hadoop in the Enterprise - README

Documents

Transcript of Hadoop in the Enterprise - README

Enterprise and Data Center Hadoop Infrastructure

Readme for HEMS Enterprise - eq2.com · Readme for HEMS Enterprise V7.01/7.02 (v7.04 minor bug fix v7.03 interim release) HEMS Enterprise 1. HEMS Enterprise - Equipment and work order

BigDataTech 2015 Is Hadoop Enterprise ready?

Hadoop Present - Open Enterprise Hadoop

Hadoop Architecture Options for Existing Enterprise DataWarehouse

Enterprise and Data Center Hadoop · PDF fileEnterprise and Data Center X9 Hadoop Infrastructure . Confidential Hadoop Decision Questions

Adobe Application Manager Enterprise Edition Readme · 2020. 7. 2. · Adobe Application Manager Enterprise Edition Release 2.1 Readme 6 Packages can be created from multiple-disk

Hadoop and Data Science for the Enterprise (Strata & Hadoop World Conference Oct 29 2013)

Oracle's Hyperion Enterprise Release 6.5.1.1.0 Readme

MODERN INFRASTRUCTURE FOR ENTERPRISE … INFRASTRUCTURE FOR ENTERPRISE HADOOP AND ... Native Hadoop Integration with Distribution ... elements of a modern infrastructure for enterprise

Restaurant Enterprise Solution (RES) Version 4.6 ReadMe First · Restaurant Enterprise Solution (RES) Version 4.6 ReadMe First About This Document ReadMe First is a comprehensive

Securing Hadoop in an Enterprise Context

Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"

Enterprise Apache Hadoop: State of the Union

Managing Enterprise Hadoop Clusters with Apache Ambari

SQL on Hadoop for Enterprise Analytics

Restaurant Enterprise Solution (RES) Version 4.8 … · MD0003-151 April 15, 2010 Page 1 of 151 Restaurant Enterprise Solution (RES) Version 4.8 ReadMe First About This Document ReadMe

Chicago Hadoop Users Group: Enterprise Data Workflows

Designing Hadoop for the Enterprise Data Center

The Enterprise Use of Hadoop