Post on 17-Aug-2015
Big Data Technologies
Sahara Intro & Future Plan
Weiting Chen
weiting.chen@intel.com
SSG / STO / BDT
Legal Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.
© 2015 Intel Corporation.
SSG / STO / BDT
WHO WE ARE
Bring Cloudera CDH 5.3 Plugin into OpenStack SaharaComplete to add all the services in Cloudera CDH 5.3 and integrate them into Sahara CDH Plugin
Provide Complete Integration Test to Help a Better User ExperienceA complete integration testing in OpenStack Sahara to help deliver a good user experience in Sahara CDH Plugin
Rank #3 Commits Company in Sahara ContributionRanked after #1 Mirantis and #2 Red Hat
SSG / STO / BDT
OPENSTACK HISTORY
Austin
BexarCactusDiablo
EssexFolsom
GrizzlyHavana
IcehouseJuno
Kilo
NovaSwift
GlanceHorizonKeystoneQuantumCinder
CeilometerTroveSahara
Ironic
• Zaqar• Manila• Designat
e• Barbican
Incubation2010
20112012
20132014
2015
SSG / STO / BDT
Move Focus from IaaS to PaaS and SaaSmore and more applications(xxx-as-a-service) based on OpenStack infrastructure
SSG / STO / BDT
~ 25.9% CAGR
Big Data Market expects to grow from 16.5 billion (2014) to 41.5 billion (2018), it also includes cloud infrastructure segment from 1.2 billion (2014) to 4.7 billion (2018)
200 Billion
Cloud market will hit 118 billion in 2015, 200 billion by 2018, from 95.8 million market reached in 2014.
Trend
Source from IDC 2014
Cloud-based solution will shape IT spending for years. IDC estimates cloud services spending will continue to grow at double-digit rates for the next few years.
FROM THE MARKET
Big Data Cloud Market X-as-a-Service
SSG / STO / BDT
Big DataInternet Of Thing
THE VISION
Cloud ComputingDifferent data source
will come from diversity of devices.
Using data processing model to process the data and transfer it become high value.
A shared resources infrastructure to support a flexible IT environment and fulfill the requirement on demand.
SSG / STO / BDT
OpenStack vs Hadoop
Most Companies using OpenStack cluster in their IT environment are also preparing another Hadoop cluster for Big Data analytics.
Sahara is a solution to bring Hadoop and OpenStack together.
SSG / STO / BDT
SAHARA BACKGROUND
Basic Idea comes from Amazon Elastic MapReduce (EMR)
To provide users easily provisioning Hadoop clusters by specifying several parameters
Analytics as a Service for data scientist or analyst
SSG / STO / BDT
Sahara Key Features - Provision Cluster
Create/Terminate Cluster
• Heat API/Nova Direct API
• Neutron/Nova Network
• Floating IP Management
• Anti-affinity
Cluster Scaling
• Add Node/Remove Node
Support Plugins
• Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR
SSG / STO / BDT
Sahara Key Features - Elastic Data Processing
Support Job Type
• Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase
Support Data Locality
• Rack/Hypervisor/Swift
Data Source
• Internal: Ephemeral Disk/Cinder
• External: Swift
Run Job in Transient Cluster
*Different Plugin provide different capabilities
SSG / STO / BDT
WORKING FLOW
Fast Cluster Provisioning
Select Hadoop Version
Select Base Image w/ Hadoop
Define Cluster
Configuration
Provision Cluster
Operate Cluster
Terminate Cluster
Analytic as a Service using Elastic Data Processing
Select Hadoop Version
Configure JobsSet Limit for Cluster
Execute Jobs Get The Result
• Choose type of the job: pig, hive, jar-file, etc.• Select input and output data location (Swift support)• Cluster will be removed automatically after the job completion
• Provide the details Hadoop configuration, like size, topology, and others• Sahara will provision VMs, install and configure Hadoop• Support Scale out Cluster to add/remove nodes
SSG / STO / BDT
CLOUDERA CDH PLUGIN
Controller Computing Node1
VM1 - Master VM2 - Slave
Cloudera Manager(Cloudera Express v5.1.3,
CDH v5.0.0 & CM API v7)
Job History
Resource Manager
Oozie Server
Name Node
Secondary
Name Node
Data Node
Node Manager
Cloudera Manager API Python Client
(Migrate from CM-API Client)
Sahara Service
Horizon(OpenStack Dashboard)
CDH Plugin
Step1: Create VM via Heat by using Cluster Template. CM must be included in one master machine.Step2: Use CM API Client to connect to CM and provision the other services in the cluster.
STEP1
STEP2
CDH ClusterEnd Customer
SSG / STO / BDT
DATA PROCESSING MODEL
Swift
OpenStack
Virtual Clusters
OpenStack
Virtual Clusters
HDFS
Collector Agent
Data Stream
Pattern 2: External - SwiftPattern 1: Internal - HDFS Only
Collector Agent
Collecting DataCollecting Data
OpenStack use Swift as a data source to store input and output data. The benefit is to process the data directly and persist the data via Swift.
OpenStack support to create HDFS on Cinder or Ephemeral Disk. This method can provide a better data processing performance via Ephemeral Disk or to persist the data via Cinder with lower performance.
Cinder
Ephemeral Disk
MapReduce MapReduce
SSG / STO / BDT
Current Issue
~30% Performance Loss
We use Sahara with KVM to create a Hadoop Cluster(HDFS in Ephemeral Disk) and compare with a Bare Metal Hadoop in the same servers.
Different workloads(Hi-Bench) may shown different results.
SSG / STO / BDT
Beyond The Performance…Performance may always be an issue compare with Hypervisor and Bare Metal
SSG / STO / BDT
IT Integration
Sahara must provide an elastic platform to fulfill the customer’s request and to adopt big data’s infrastructure. To support more technologies can help Sahara seamless integrating to customer’s IT environment.
EDP should provide a simple interface to help data scientists only need to focus on their own expertise and no worry about how to deploying clusters. Analytics-as-a-Service is a trend in the future.
Workload-based EDP
SSG / STO / BDT
MORE …Bare Metal Support
• OpenStack Ironic
Docker Support
• Nova-docker driver, OpenStack Magnum
Support More Storage Backend
• OpenStack Manila, External HDFS
Complete to Support More Data Processing Model
• Hadoop, Spark, …etc
SSG / STO / BDT
WHAT’S NEW IN KILO
• Vanilla support Hadoop v1.2.1 and Hadoop 2.6
• Spark Plugin
• Cloudera CDH Plugin
• MapR Plugin
• Storm Plugin
• New Horizon UI with New Guide Panel
• Default Template Support