Bigdata design doc.pptx

Develop Big Data Solutionwith Juju

From complexity to simplicity

Agenda

● What is Canonical?● Challenges of building big data solutions.● What is Juju?● Apache Hadoop pluggable model

○ Pluggin big data services into Hadoop● Demo

�

Canonical the company behind Ubuntu.

Challenges of building big data solutions.

● Many hadoop distributions● Many apache projects to integrate into solutions.

Hadoop distributions

● Similar to Linux, Hadoop has many distributions● Top commercial offerings: Cloudera, MapR, Hortonworks, IBM BigInsights● Open source distribution: Apache Hadoop

● Issues● Each distribution has different packaging style● Each distribution has different installation blueprints

● e.g., users, install locations, etc.

● Different dependencies ● e.g., IBM BigInsights requires IBM JAVA

● Different hardware● e.g., POWER, x86, ARM

Big Data Ecosystem

● Apache Hadoop - provides following services:● HDFS - Hadoop distributed file system, manages data● MapReduce - Hadoop data processing unit, Hadoop job● YARN - Hadoop resource manager and job scheduler - manages jobs

● Apache Spark - In memory data processing unit, integrated with YARN.

● The Hadoop Ecosystem includes many additional components - big data service consumers: ● Data Ingestion: Flume, Hue and Sqoop, etc.● Data Analysis: Spark, Hive, Pig, Impala, etc.● Data Visualization: Hue, Zeppelin, etc.

What is Juju?

● Juju is the modeling language for service oriented deployment in cloud.● Juju allows you to deploy, configure, manage, maintain, and scale big data services quickly on

public clouds, as well as on physical servers, OpenStack, and containers. ○ Juju major properties:

■ Deploy, connect, scale■ Reliability■ Open Source■ Repeatability■ Speed■ Observability

What is Juju?....

● Juju has two components:○ Charms, model of how a (unique) micro-service shall be deployed,

scaled and integrated

■ Could be written in any language - big data charms are mostly

coded in python

○ Bundles, that represent a set of charms/services integrated together, regardless of their individual scale■ Big data Solution

What is Juju?...

How we used Juju to solve the problem?

● Developed a vendor/release agnostic installation Charms.● All big data services use Apache standard interfaces to connect big data

services (dfs, map-reduce)● Introduced Apache hadoop plugin charm● Enables diverse solutions regardless of core and surrounding services

● Swappable components means rapid development at every layer● Data Ingestion● Data Processing● Data Visualizations

What that means?

● Install time: Common Installation method for services○ Vendor agnostic

○ Release agnostic - except for new features

● Run time: pluggable services interaction ○ lego blocks

Apache Hadoop Pluggable model

YARN Service

HDFSService

Spark Service

Big Data services accessing Apache Hadoop HDFS and YARN● Hadoop-Client charm

● Using Hadoop command-line component● Preconfigured to run MapReduce jobs● Preconfigured to access HDFS

● Hadoop-plugin charm● For Big Data services requiring Hadoop Java API (used by Hive, Pig, etc.)● Preconfigured to connect to Hadoop cluster

● Hadoop Services Relations● Provides hostname/port to communicate with HDFS/YARN

Vendor agnostic Installation

● Operating System independence● Tarballs● Eliminate OS packaging dependencies

● Architecture independence● Determine requirements at deployment time

● Example from Hive● http://bazaar.launchpad.net/~bigdata-dev/charms/trusty/apache-hive/trunk/view/head:/resources.yaml

resources: hive-ppc64le: url: http://<url>/apache-hive-0.13.0-bin.tar.gz hash: 4c835644eb72a08df059b86c45fb159b95df08e831334cb57e24654ef078e7ee hash_type: sha256 hive-x86_64: url: http://<url>apache-hive-1.0.0-bin.tar.gz hash: b8e121f435defeb94d810eb6867d2d1c27973e4a3b4099f2716dbffafb274184 hash_type: sha256

http://bazaar.launchpad.net/~bigdata-dev/charms/trusty/apache-hive/trunk/view/head:/resources.yaml

http://bazaar.launchpad.net/~bigdata-dev/charms/trusty/apache-hive/trunk/view/head:/resources.yaml

Vendor agnostic Installation ….

● Vendor properties● Provide default values and allows fine-tuning● Allows vendor-specific configuration

● Example from Hive● http://bazaar.launchpad.net/~bigdata-dev/charms/trusty/apache-hive/trunk/view/head:/dist.yaml

vendor: 'apache'hadoop_version: '2.4.1'packages: - 'libmysql-java' - 'mysql-client'groups: - 'hadoop'users: hive: groups: ['hadoop']dirs: hive: path: '/usr/lib/hive' owner: 'hive' group: 'hadoop'ports: hive: port: 10000

http://bazaar.launchpad.net/~bigdata-dev/charms/trusty/apache-hive/trunk/view/head:/dist.yaml

http://bazaar.launchpad.net/~bigdata-dev/charms/trusty/apache-hive/trunk/view/head:/dist.yaml

Hadoop Plugin Charm

● Single, simplified connection point to Hadoop HDFS and YARN● Relating to plugin installs and manages:

● Java Runtime● Access to interact with the data set

○ Hadoop API and CLI○ Hadoop config /etc/hadoop/conf○ /etc/hosts updates○ update environments: i.e. HADOOP_CONF_DIR

● Allows service reusability across Hadoop versions and distributions

from jujubigdata.relations import HadoopPlugin

if HadoopPlugin().hdfs_is_ready():pig.install()pig.configure()

First, the “hard” part:

juju quickstart apache-core-batch-processing

if needed> juju add-units -n 10 compute-slave

Ok, now that you have a fully working Big Data deployment with Apache Hadoop, let’s get to the interesting bit.

Let’s add Apache Hive

Data Analytics with MySQL

juju deploy apache-hive hive

juju deploy mysql

juju add-relation plugin hive

juju add-relation hive mysql

And Apache Pig

Data Analysis with Apache Pig language

juju deploy apache-pig pig

juju add-relation plugin pig

References and Contact Info

● Core bundle technical documentation● Mailing Lists

○ [email protected]○ Juju: https://lists.ubuntu.com/mailman/listinfo/juju

● IRC (Freenode)○ #juju

■ asanjar, cory_fu, kwmonroe

● Web○ jujcharms.com/big-data○ jujucharms.com/docs

http://bazaar.launchpad.net/~bigdata-dev/charms/bundles/apache-core-batch-processing/trunk/view/head:/DEV-README.md

http://bazaar.launchpad.net/~bigdata-dev/charms/bundles/apache-core-batch-processing/trunk/view/head:/DEV-README.md

mailto:[email protected]

mailto:[email protected]

https://lists.ubuntu.com/mailman/listinfo/juju

https://jujucharms.com/big-data

https://jujucharms.com/big-data

https://jujucharms.com/docs

https://jujucharms.com/docs

Bigdata design doc.pptx

Data & Analytics

Transcript of Bigdata design doc.pptx