Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

33
Making sense of Apache Bigtop, ODPi and why it all matters to Apache Apex Roman Shaposhnik, [email protected], @rhatr Director of Open Source Strategy, Pivotal Inc.

Transcript of Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Page 1: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Making sense of Apache Bigtop, ODPi and why it all matters to Apache Apex

Roman Shaposhnik, [email protected], @rhatrDirector of Open Source Strategy, Pivotal Inc.

Page 2: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

A slide deck build via “Apache Way”• Bigtop community contributors• Roman Shaposhnik• Konstantin Boudnik• Nate D'Amico• Evans Ye & Darren Chen (Trend Micro)

Page 3: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

What is Apache Bigtop?• Apache Bigtop is to Hadoop what Debian is to Linux• A 100% open, community driven distribution of bigdata

management platform based on Apache Hadoop• A place where all communities around big data come

together• The thing everybody (Pivotal, Cloudera, Hortonworks,

WANDisco, IBM, Amazon, TrendMicro) is building off of• A cutting edge, quickly evolving distribution and a set of

tools

Page 4: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

GNU Software Linux kernel

Page 5: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Hadoop Ecosystem(Pig, Hive, Spark) Linux kernel

Hadoop(HDFS + YARN + MR)

Page 6: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

ODPi is a nonprofit organization committed to simplification & standardization of the big data ecosystem with a common reference

specification called ODPi Core.As a shared industry effort , ODPi is focused on promoting and advancing the state of Apache Hadoop®

and Big Data Technologies for the Enterprise.

Page 7: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

February 2015 December 2015September 2015

Page 8: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Page 9: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

What has ODPi done so far (1.0.1)?• Runtime specification• https://github.com/odpi/specs/blob/master/ODPi-Runtime.md

• Validation testsuite• http://repo.odpi.org/ODPi/1.0/acceptance-tests/

• Reference implementation binaries• http://repo.odpi.org/ODPi/1.0/{centos6, ubuntu-14.04}

Page 10: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

What are we working on?• Operations specification

• https://github.com/odpi/specs/blob/master/ODPi-Operations.md

• ISV “ODPi compatible” policy

• Expanding ODPi core beyond Apache Hadoop & Ambari• Hive• ????

• How can you help?• Share usecases• Test against reference implementation• Contribute to upstream ASF projects

Page 11: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

What’s in is Bigtop?

• A set of binary packages• just like CDH/PHD/HDP/ODPi/etc.

• Integration code• Packaging code• Deployment code• Orchestration code• Validation code• Continuous Integration infrastructure

Page 12: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Integration/packaging

• Linux packages• RPM, DEB• RHEL/CentOS(Fedora), SLES(OpenSUSE), Debian, Ubuntu• VirtualBox, VMWare, etc. VM images

• Challenge: Linux packaging is node-centric• “smart” tarballs• Docker or BOSH images

Page 13: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Integration testing based on iTest

• Clean-room provisioning• these ain’t your gramp’s unit tests

• Versioned test artifacts• JVM-base test artifacts• Matching stacks of components and integration tests• Plug’n’play architecture: Gradle/Groovy, JARs/artifacts

Page 14: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Puppet 3.x deployment

• Master-less puppet• $ puppet apply bigtop-deploy/puppet/manifests/site.pp # on each node

• Cluster topology is kept in Hiera

bigtop::hadoop_head_node: "hadoopmaster.example.com" hadoop::hadoop_storage_dirs: - ”/mnt” hadoop_cluster_node::cluster_components: - yarn - zookeeper bigtop::bigtop_repo_uri: "http://bigtop-repos.s3.amazonaws.com/releases/1.1.0/…”

Page 15: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

One click Bigtop provisioning

Page 16: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Who is this for?

• For Hadoop app developers, cluster admins, users• Run a Hadoop cluster to test your code on• Try & test configurations before applying to Production• Play around with Bigtop Big Data Stack

• For contributors• Easy to test your packaging, deployment, testing code

• For vendors• CI out of the box —> patch upstream code made easier

Page 17: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Works great, but…

•Need to add vagrant public key into docker images• Too many issues with auto-created boot2docker hosting VM• A bug for docker provider keep opening for almost 2y•Waiting for machine to boot' hangs infinitely

• Can not share same code for different providers anyway•Not all the docker options supported in Vagrantfile•Does not support Docker Swarm• Slow

Page 18: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Docker Compose

Page 19: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Implementation

• Create docker containers:• docker-compose scale bigtop=3

• Volumes:• Bigtop Puppet configurations• Bigtop Puppet code• /etc/hosts

•Compatible with Docker Machine and Swarm

Page 20: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Docker Machine and Swarm

Page 21: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Juju orchestration

$ juju boostrap$ juju deploy hadoop-processing

Page 22: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

https://jujucharms.com/hadoop-processing/

Page 23: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Juju orchestration

$ juju add-unit slave -n 2

Page 24: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Juju orchestration

$ juju action do namenode/0 smoke-test$ juju action do resourcemanager/0 smoke-test$ watch -n 0.5 juju action status

Page 25: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Early Mission AccomplishedFoundation for commercial Hadoop distros/services

Leveraged by app providers…

Page 26: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Page 27: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Blue prints for data engineering

• BigPetStore• Data Generator• Examples using tools in Hadoop ecosystem to process

data• Build system and tests for integrating tools and multiple

JVM languages• Started by Dr. Jay Vyas, prinicipal software engineer at

Red Hat, Inc.

Page 28: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Datamodel

Page 29: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Transaction Purchase Model

Page 30: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Lambda/Stream Architectures

HDFS + Zookeeper +

Page 31: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

New focus and target end users

Data engineers vs distro builders

Enhance Operations/Deployment

Reference implementations & tutorials

Page 32: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Data data data…Smarter/Realistic test data -bigpetstore -bigtop-bazaar -weather data gen

Tutorial/Learning Data sets -githubarchive.org -more tbd…

Page 33: Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Thank You, Q&A