Bigdata design doc.pptx

23
Develop Big Data Solution with Juju

Transcript of Bigdata design doc.pptx

Page 1: Bigdata design doc.pptx

Develop Big Data Solutionwith Juju

Page 2: Bigdata design doc.pptx

From complexity to simplicity

Page 3: Bigdata design doc.pptx

Agenda

● What is Canonical?● Challenges of building big data solutions.● What is Juju?● Apache Hadoop pluggable model

○ Pluggin big data services into Hadoop● Demo

Page 4: Bigdata design doc.pptx

Canonical the company behind Ubuntu.

Page 5: Bigdata design doc.pptx
Page 6: Bigdata design doc.pptx
Page 7: Bigdata design doc.pptx

Challenges of building big data solutions.

● Many hadoop distributions● Many apache projects to integrate into solutions.

Page 8: Bigdata design doc.pptx

Hadoop distributions

● Similar to Linux, Hadoop has many distributions● Top commercial offerings: Cloudera, MapR, Hortonworks, IBM BigInsights● Open source distribution: Apache Hadoop

● Issues● Each distribution has different packaging style● Each distribution has different installation blueprints

● e.g., users, install locations, etc.

● Different dependencies ● e.g., IBM BigInsights requires IBM JAVA

● Different hardware● e.g., POWER, x86, ARM

Page 9: Bigdata design doc.pptx

Big Data Ecosystem

● Apache Hadoop - provides following services:● HDFS - Hadoop distributed file system, manages data● MapReduce - Hadoop data processing unit, Hadoop job● YARN - Hadoop resource manager and job scheduler - manages jobs

● Apache Spark - In memory data processing unit, integrated with YARN.

● The Hadoop Ecosystem includes many additional components - big data service consumers: ● Data Ingestion: Flume, Hue and Sqoop, etc.● Data Analysis: Spark, Hive, Pig, Impala, etc.● Data Visualization: Hue, Zeppelin, etc.

Page 10: Bigdata design doc.pptx

What is Juju?

● Juju is the modeling language for service oriented deployment in cloud.● Juju allows you to deploy, configure, manage, maintain, and scale big data services quickly on

public clouds, as well as on physical servers, OpenStack, and containers. ○ Juju major properties:

■ Deploy, connect, scale■ Reliability■ Open Source■ Repeatability■ Speed■ Observability

Page 11: Bigdata design doc.pptx

What is Juju?....

● Juju has two components:○ Charms, model of how a (unique) micro-service shall be deployed,

scaled and integrated

■ Could be written in any language - big data charms are mostly

coded in python

○ Bundles, that represent a set of charms/services integrated together, regardless of their individual scale■ Big data Solution

Page 12: Bigdata design doc.pptx

What is Juju?...

Page 13: Bigdata design doc.pptx

How we used Juju to solve the problem?

● Developed a vendor/release agnostic installation Charms.● All big data services use Apache standard interfaces to connect big data

services (dfs, map-reduce)● Introduced Apache hadoop plugin charm● Enables diverse solutions regardless of core and surrounding services

● Swappable components means rapid development at every layer● Data Ingestion● Data Processing● Data Visualizations

Page 14: Bigdata design doc.pptx

What that means?

● Install time: Common Installation method for services○ Vendor agnostic

○ Release agnostic - except for new features

● Run time: pluggable services interaction ○ lego blocks

Page 15: Bigdata design doc.pptx

Apache Hadoop Pluggable model

YARN Service

HDFSService

Spark Service

Page 16: Bigdata design doc.pptx

Big Data services accessing Apache Hadoop HDFS and YARN● Hadoop-Client charm

● Using Hadoop command-line component● Preconfigured to run MapReduce jobs● Preconfigured to access HDFS

● Hadoop-plugin charm● For Big Data services requiring Hadoop Java API (used by Hive, Pig, etc.)● Preconfigured to connect to Hadoop cluster

● Hadoop Services Relations● Provides hostname/port to communicate with HDFS/YARN

Page 17: Bigdata design doc.pptx

Vendor agnostic Installation

● Operating System independence● Tarballs● Eliminate OS packaging dependencies

● Architecture independence● Determine requirements at deployment time

● Example from Hive● http://bazaar.launchpad.net/~bigdata-dev/charms/trusty/apache-hive/trunk/view/head:/resources.yaml

resources: hive-ppc64le: url: http://<url>/apache-hive-0.13.0-bin.tar.gz hash: 4c835644eb72a08df059b86c45fb159b95df08e831334cb57e24654ef078e7ee hash_type: sha256 hive-x86_64: url: http://<url>apache-hive-1.0.0-bin.tar.gz hash: b8e121f435defeb94d810eb6867d2d1c27973e4a3b4099f2716dbffafb274184 hash_type: sha256

Page 18: Bigdata design doc.pptx

Vendor agnostic Installation ….

● Vendor properties● Provide default values and allows fine-tuning● Allows vendor-specific configuration

● Example from Hive● http://bazaar.launchpad.net/~bigdata-dev/charms/trusty/apache-hive/trunk/view/head:/dist.yaml

vendor: 'apache'hadoop_version: '2.4.1'packages: - 'libmysql-java' - 'mysql-client'groups: - 'hadoop'users: hive: groups: ['hadoop']dirs: hive: path: '/usr/lib/hive' owner: 'hive' group: 'hadoop'ports: hive: port: 10000

Page 19: Bigdata design doc.pptx

Hadoop Plugin Charm

● Single, simplified connection point to Hadoop HDFS and YARN● Relating to plugin installs and manages:

● Java Runtime● Access to interact with the data set

○ Hadoop API and CLI○ Hadoop config /etc/hadoop/conf○ /etc/hosts updates○ update environments: i.e. HADOOP_CONF_DIR

● Allows service reusability across Hadoop versions and distributions

from jujubigdata.relations import HadoopPlugin

if HadoopPlugin().hdfs_is_ready():pig.install()pig.configure()

Page 20: Bigdata design doc.pptx

First, the “hard” part:

juju quickstart apache-core-batch-processing

if needed> juju add-units -n 10 compute-slave

Ok, now that you have a fully working Big Data deployment with Apache Hadoop, let’s get to the interesting bit.

Page 21: Bigdata design doc.pptx

Let’s add Apache Hive

Data Analytics with MySQL

juju deploy apache-hive hive

juju deploy mysql

juju add-relation plugin hive

juju add-relation hive mysql

Page 22: Bigdata design doc.pptx

And Apache Pig

Data Analysis with Apache Pig language

juju deploy apache-pig pig

juju add-relation plugin pig