Kerberos and Health Checks and Bare Metal, Oh My!Updates to OpenStack Sahara in Newton
Updates to OpenStack Sahara in Newton
Vitaly Gridnev, Sahara PTL (Mirantis)Elise Gafford, Sahara Core (Red Hat)
Nikita Konovalov, Sahara Core (Mirantis)
Agenda
1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A
Agenda
1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A
Sahara: The Use Cases
● Data Processing Cluster Management○ On-demand, scalable, configurable, persistent clusters○ Supports multiple plugins (Apache, Ambari, CDH, MapR...)○ Integrates with Heat, Glance, Nova, Neutron, and Cinder
● EDP (Elastic Data Processing)○ Supports multiple job types (Java, MR, Hive, Pig, Spark, Storm...)○ Supports transient clusters (spin up, process, shut down) or
persistent clusters○ Integrates with Swift and/or Manila (optionally)
Sahara: The API
Sahara: The Project
● Cluster provisioning plugins:○ Cloudera Distribution of Hadoop (using Cloudera Manager)○ Hortonworks Data Platform (using Apache Ambari)○ MapR○ “Vanilla” Apache Hadoop, Spark, and Storm
● EDP job types:○ MapReduce, Java, Hive, and Pig jobs (using Apache Oozie)○ Spark, Spark Streaming, and Storm jobs (using Apache Spark and Apache Storm)
● Image packing repository (sahara-image-elements)● Framework to validate Sahara installation (sahara-tests)● UI plugin● OpenStackClient plugin
Agenda
1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A
Event log for clusters
● Cluster events about provisioning: allows to understand what is the current status of cluster provisioning, or reasons of failure
● Available since Newton for clusters created by using Ambari
● Supported in CLI since Newton, with full dump of all steps and events
Event log for clusters
Event log for clusters
Health checks for clusters
● Users are interested in monitoring cluster state after cluster provisioning: vital for long living clusters
● Sahara in Liberty doesn't have any monitoring of the health of cluster processes. A cluster can be broken or unavailable but Sahara will still think that it is in ACTIVE status.
Health checks for clusters
● Clusters health checks have been implemented since Mitaka
● Available for clusters deployed using Ambari and Cloudera Manager. Less availability for vanilla clusters
● Since Newton checks are available for the MapR plugin
● Health results can be set to notify Ceilometer● Easy to recheck health
Health checks for clusters
Health checks for clusters
Health checks for clusters
Health checks for clustersNext steps are:
● More detailed health checks○ Particular datanode/slave failure○ No enough space in HDFS
● Suggestions/actions to repair health:○ Datanode replacement○ New nodes○ Restarting services
● More flexible configuration of health checks (advanced health checks, on disabling/enabling health checks for some reason)
Agenda
1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A
Security improvements● Security is an important part of created clusters● Previously security could be enabled only by
managers calling only Ambari and Cloudera Manager directly, but that leads to a situation in which Sahara will not perform auth operations, and EDP does not work
● Security is important not just for clusters, but for Sahara itself
Security improvements
In Newton the following Kerberos security features were implemented:
● MIT KDC can be preconfigured (or an existing KDC can be used)● Oozie client was re-implemented to support auth operations with Kerberos● Spark job executions are also supported● Keys are distributed on nodes for system users (hdfs, hadoop, spark)● Supported for clusters deployed using Ambari and Cloudera Manager● Note: Be sure that latest hadoop-swift jars are in place for Swift data sources!
Security improvements
Security improvements● Bandit tests per commit● Improved secret storage
(using Barbican and Castellan) was implemented in the previous release
Agenda
1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A
Where we were
Sahara had 2 flows that were relevant to image manipulation:
● Pre-Nova spawn image packing○ Used sahara-image-elements repository to generate images (to store in Glance)
● Post-Nova spawn cluster generation from “clean” (OS-only) images○ Logic maintained in Sahara process within plugins
● Pre-Configuration validation of images by plugins○ Remember how I said we had 2 flows relevant to image manipulation?○ We didn’t do this at all.
Where We Were: Problems
● Duplication of logic○ Steps required for packing images and “clean” image clusters were often identical, but had to
be expressed separately (in DIB and in Python).
● Poor validation○ Plugins did not validate that images provided to them met their needs.○ Failures due to image contents were late and sometimes difficult to understand.
● Poor encapsulation○ Image generation and cluster provisioning logic for any one plugin are really one application○ Maintaining them in two places allows versionitis and dependency problems○ Having one monolithic repo for all plugins makes them less pluggable
Our Dream Implementation
● All flows share common logic:○ Image packing○ Image validation○ Clean image cluster gen
● Image manipulation is stored and versioned within plugins● The user can still generate images with a CLI...● But they can also use an API to generate images in clean build environments● ... And both dev test cycles and user retries are as quick and painless as
possible
The plan
1. Build a validation engine that ensures that images meet a specificationa. YAML-based spec definition
2. Extend that engine to optionally modify images to spec3. Build a CLI to expose this functionality4. Create and test specifications for each plugin to support this method5. Deprecate sahara-image-elements (only when this method proves stable)6. Build an API to:
a. Spawn a clean tenant-plane image build environmentb. Download a base image from Glance and modify it to specc. Push the new image back to Glance and register it for use by Sahara
Where we are
1. Build a validation engine that ensures that images meet a specificationa. YAML-based spec definition
2. Extend that engine to optionally modify images to spec3. Build a CLI to expose this functionality4. Create and test specifications for each plugin to support this method5. Deprecate sahara-image-elements (only when this method proves stable)6. Build an API to:
a. Spawn a clean tenant-plane image build environmentb. Download a base image from Glance and modify it to specc. Push the new image back to Glance and register it for use by Sahara
What it looks like: the specs
● YAML-based definitions● Argument definitions for
configurability● Idempotent resource
declarations○ Scripts must be written
idempotently, as always in resource declarations
● Logical control operators (any, all, os_case, etc.)
What it looks like: the CLICommand structure:
sahara-image-pack --image ./image.qcow2 PLUGIN VERSION [plugin arguments]
Features:
● Auto-generates help text from arguments● Idempotent and modifies images in-place
○ Very fast test cycles and retries● Allows freeform bash scripts and more
structured resources○ Though it’s on you to make your scripts
idempotent● Test-only mode to validate without change
What it’s doing
The images module runs a sequence of steps against a remote machine
● Validation uses the Sahara SSH remote in read-only mode
● Clean image gen uses the SSH remote● Image packing uses a libguestfs Python
API image handle
All three use the same logic, contained in the appropriate plugin
Plugin implementation targeting O!
Agenda
1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A
Ironic integration
Why should you run Bare Metal in OpenStack:
● Big Data workload originates from Bare Metal installations● Quick cluster scalability may have lower priority than a long running stability
and persistence● Best performance by design, no virtualization overhead● The ability to manage a baremetal cluster with the OpenStack API
Bare Metal compared to Virtualized
Bare metal (Ironic) Virtual Machines
Cluster size flexibility Dedicating nodes completely. Flavor based scheduling
Resource utilization The host is 100% utilized. KVM has memory overhead. Other VM may abuse host’s resources.
Data locality Data is accessible directly from the local disks.
Locality may be achieved by proper resource scheduling
Live migration A host may be lost completely. Supported for some target daemons
Some tips before running Bare Metal
● Scheduling is not trivial. The Cloud operator may need to specify additional Flavors, Availability Zones, or other metadata
● Storage is not backed by Cinder for Bare Metal○ Sahara does disk discover on it’s own○ Disks are different from the on w/o root mount are going to be dedicated to HDFS
● Non-standard hardware will require drivers built into the provisioning image● Network tenant isolation is achievable through manual hardware switch
configurations
Agenda
1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A
What is NEW in NEWton
● Designate integration;● API Improvements: pagination for list operations, API to
manage/enable/disable plugins;● New plugin versions
○ HDP 2.4 supported○ MapR 5.2.0○ CDH 5.7.x○ Vanilla + Spark on YARN
What is NEW in Newton
● Sahara tests framework to validate environment readiness for Sahara’s clusters
○ Sahara tempest plugin with more tests (CLI, API)
○ Sahara scenario framework with a bunch of templates
○ Published on PyPi https://pypi.python.org/pypi/sahara-tests
Q&A
Useful links and materials
● Sahara wiki https://wiki.openstack.org/wiki/Sahara● Sahara specs https://specs.openstack.org/openstack/sahara-specs/● Sahara docs http://docs.openstack.org/developer/sahara/● Sahara images http://sahara-files.mirantis.com/images/upstream/newton/
Top Related