BigData Meetup - OpenStack Sahara

29
Sergey Lukjanov Andrew Lazarev The State of OpenStack Data Processing: Sahara

description

Big Data Meetup - http://www.meetup.com/Big-Data-Science/events/162471852/

Transcript of BigData Meetup - OpenStack Sahara

Page 1: BigData Meetup - OpenStack Sahara

Sergey LukjanovAndrew Lazarev

The State of OpenStackData Processing: Sahara

Page 2: BigData Meetup - OpenStack Sahara

Agenda

• Sahara overview• Status & Roadmap• EDP Technical Concepts• Live demo

Page 3: BigData Meetup - OpenStack Sahara

Agenda

• Sahara overview• Status & Roadmap• EDP Technical Concepts• Live demo

Page 4: BigData Meetup - OpenStack Sahara

What is OpenStack?

https://www.openstack.org/software/

Page 5: BigData Meetup - OpenStack Sahara

OpenStack Data Processing: Sahara

Mission: To provide a scalable data processing stack and associated management interfaces.

• provision and operate data processing clusters • schedule and operate data processing jobs

Data processing in Sahara == Hadoop, Spark, etc.

Page 6: BigData Meetup - OpenStack Sahara

Hadoop - Big Data Platform

© http://hortonworks.com/hadoop/yarn/

Page 7: BigData Meetup - OpenStack Sahara

Trends

http://www.google.com/trends/

Page 8: BigData Meetup - OpenStack Sahara

Use cases

• Self-service provisioning of Hadoop clusters• Utilization of unused compute capacity for

bursty workloads• Dev -> Stage -> Prod lifecycle• Run Hadoop workloads in few clicks without

expertise in Hadoop ops

Page 9: BigData Meetup - OpenStack Sahara

Contributors

Page 10: BigData Meetup - OpenStack Sahara

Architecture overview

Data Sources

Savanna Python Client RE

ST A

PI

Cluster Configuration

Manager

Horizon

Keystone

Auth

Data Access Layer

Swift

Savanna Pages

HadoopVM

Vendors Plugins

HadoopVM

HadoopVM

HadoopVM

Resources Orchestration

Manager

Job Sources Job

Manager

Heat

Nova

Glance

Cinder

Neutron

Trove DB

Page 11: BigData Meetup - OpenStack Sahara

Agenda

• Sahara overview• Status & Roadmap• EDP Technical Concepts• Live demo

Page 12: BigData Meetup - OpenStack Sahara

● Part of Mirantis OpenStack● Part of OpenStack Integrated release from Juno● Launchpad home page https://launchpad.net/sahara● Integrated with OpenStack CI/CD

○ https://github.com/openstack/sahara● Features: cluster provisioning and basic EDP● Active contributors: Red Hat and Hortonworks● Supported Hadoop distros:

○ Vanilla Apache Hadoop 1.2.1, 2.3.0 and 2.4.1○ Hortonworks Data Platform 1.3.2 and 2.0.6○ Cloudera CDH5○ Spark 0.9.1 and 1.0.0

Current Status

Page 13: BigData Meetup - OpenStack Sahara

● Hadoop clusters operation and provisioning○ Templates for Hadoop cluster configuration○ REST API for cluster startup and operations○ Manual cluster scaling (add/remove nodes)○ Data node anti-affinity○ Swift integration

● UI integrated into Horizon● Plugin mechanism for integration with different Hadoop

distributions - Vanilla Apache, Hortonworks, Cloudera, Spark

Features - Cluster Ops

Page 14: BigData Meetup - OpenStack Sahara

● EDP - API to execute MapReduce jobs without exposing details of underlying infrastructure (similar to AWS EMR)○ Pluggable workflow engine: Oozie, Spark ○ Pluggable data sources: Swift, HDFS, Ceph○ Supported job types: Jar, Pig, Hive

● User-friendly UI for ad-hoc analytics queries based on Hive or Pig

● Transient clusters creation for a single job

Features - Jobs Ops

Page 15: BigData Meetup - OpenStack Sahara

● Neutron and nova networking support● Keystone trust model for async operations● Full support of data locality - rack and 4-level

awareness for HDFS and Swift● Python client● Integration with OpenStack ecosystem: Heat,

Tempest, Devstack, Ceilometer

Features - OpenStack Integration

Page 16: BigData Meetup - OpenStack Sahara

● Support of more distributives○ MapR plugin (on review now)○ Storm plugin (work in progress)

● Native Ceph support● Ironic integration (Bare metal provisioning)● Complete work on distributed Sahara engine

Kilo Tentative Plans

Page 17: BigData Meetup - OpenStack Sahara

Agenda

• Sahara overview• Status & Roadmap• EDP Technical Concepts• Live demo

Page 18: BigData Meetup - OpenStack Sahara

Elastic Data Processing

● EDP - API for executing MapReduce jobs on Hadoop clusters (similar to AWS EMR)○ Supported data sources: Swift, HDFS, Ceph○ Supported job types: Java actions,

MapReduce, MapReduce.Streaming, Pig, Hive○ Pluggable workflow management engine:

Oozie, Spark ● Supports both Hadoop 1 & 2● Job executions on transient clusters

Page 19: BigData Meetup - OpenStack Sahara

EDP Use Cases

● Simplified task executions. You don’t need to know Hadoop!

● Bursty workload: ad-hoc queries requiring a significant resource only for short time period

● Utilization of free IaaS capacity for Hadoop tasks

Page 20: BigData Meetup - OpenStack Sahara

EDP - Data Sources

Swift Sahara EDP

INPUT

OUTPUT

HadoopVM

HadoopVM

HadoopVM

HadoopVM

swift://some_container/INPUT

swift://some_container/OUTPUT

Page 21: BigData Meetup - OpenStack Sahara

EDP - Job Binaries

Swift

Sahara DB

Sahara EDP

internal-db://script.pig

swift://some_container/mapreduce.jar

1. Pig, Hive scripts2. Executable Jar files3. Pluggable binaries and

libraries

Page 22: BigData Meetup - OpenStack Sahara

EDP - Job Execution. Step 1

Sahara

SwiftINPUT

DB: Jar, Pig

EDP

Jar, Pig

Page 23: BigData Meetup - OpenStack Sahara

EDP - Job Execution. Step 2

Sahara

SwiftINPUT

DB: Jar, Pig

EDP

Jar, Pig

JobTracker

Oozie

HadoopVM

HadoopVM

HadoopVM

Page 24: BigData Meetup - OpenStack Sahara

EDP - Job Execution. Step 3

Sahara

SwiftINPUT

DB: Jar, Pig

EDP

Jar, Pig

HadoopVM

HadoopVM

HadoopVM

JobTracker

OozieExecute a job

Page 25: BigData Meetup - OpenStack Sahara

EDP - Job Execution. Step 4

Sahara

SwiftINPUT

DB: Jar, Pig

EDP

Jar, Pig

HadoopVM

HadoopVM

HadoopVM

JobTracker

Oozie

Page 26: BigData Meetup - OpenStack Sahara

EDP - Job Execution. Step 5

Sahara

SwiftINPUT

DB: Jar, Pig

EDP

Jar, Pig

HadoopVM

HadoopVM

HadoopVM

workflow.xm

l

1. Job-specific configurations

2. URLs to binaries

3. URLs for data sources

4. Credentials

JobTracker

Oozie

Page 27: BigData Meetup - OpenStack Sahara

EDP - Job Execution. Step 6

Sahara

SwiftINPUT

DB: Jar, Pig

EDP

Jar, Pig

HadoopVM

HadoopVM

HadoopVM

workflow.xm

l

Data Processing

OUTPUT

1. Job-specific configurations

2. URLs to binaries

3. URLs for data sources

4. Credentials

JobTracker

Oozie

Page 28: BigData Meetup - OpenStack Sahara

Agenda

• Sahara overview• Status & Roadmap• EDP Technical Concepts• Live demo

Page 29: BigData Meetup - OpenStack Sahara

Q&A