Deploying Hadoop-Based Bigdata Environments
-
Upload
puppet-labs -
Category
Technology
-
view
119 -
download
3
description
Transcript of Deploying Hadoop-Based Bigdata Environments
Click to edit Master subtitle style
Roman [email protected], Cloudera Inc.
Deploying Hadoop-Based Bigdata Environments
“[Tall] Tales From The Frontier”
2
$ whoami
An open source software developer Linux kernel, C/C++ compilers, FFmpeg, Plan9
A Hadoop and all around UNIX guy root@cloudera
Member of the “Kitchen” team
Apache Software Foundation Incubator PMC [Bigtop], Hadoop Development Tools, Celix, Helix
VP of Apache Bigtop
3
ZooKeeper (coordination)
HUE (web based UI)
HBase YARN/MR1HBase
HDFS (filesystem)
Pig (DQL) Hive (SQL) Impala (SQL)
Oozie
4
ZooKeeper (coordination)
HUE (web based UI)
HBase YARN/MR1HBase
HDFS (filesystem)
Pig (DQL) Hive (SQL) Impala (SQL)
Oozie
5
It is a jungle out there
Zookeeper
Hadoop
HDFS
YARN
MR1
HTTPFS
HBase
Pig
Hive
Impala
Sqoop
Oozie
Whirr
Mahout
Flume
Giraph
Hama
Hue
Solr
Crunch
JDK/JRE
Kerberos
Ganglia
Nagios
JSVC
Tomcat
Utils
Postgress
HTTPD
6
And the answer is:
Puppet[forge]
7
One way of using Apache software
$ wget http://apache.org/httpd.tar.gz
$ tar xzvf httpd.tar.gz
$ cd httpd
$ ./configure ; make
$ make install
ERROR: can't write to /usr/local/bin
$ sudo make install
8
A different way
$ sudo apt-get install httpd
Would you like to also upgrade your conf?
9
Is there apt-get install hadoop ?
Hadoop is still in a very active development Hadoop is Java based Hadoop is a distributeddistributed application Hadoop is way more than HDFS + MR
10
Project-by-project approach
“Passively” maintained code Packaging, OS-level (init.d)
Developer-centric view Edit-compile-debug cycle vs. deployment Lack of integration testing
Differences in distributions/packaging: Where is this valid: /usr/libexec ?
Combinatoric explosion of dependencies
11
HBaseHBase
Hadoop (1.0, 0.22, 0.23)
Dependencies Inferno:
Hive 0.8.1
HBaseHbase (0.92, 0.90)
A million dollar question:$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/lib
12
HBaseHBase
Hadoop (1.0, 0.22, 0.23)
Dependencies Inferno:
Hive 0.8.1
HBaseHbase (0.92, 0.90)
A million dollar question:$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/lib
hbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar
13
Remember what Debian did to Linux?
GNU Software Linux kernelLinux kernel
14
Bigtop is trying to do it with Hadoop
Hadoop Ecosystem(Pig, Hive, Mahout) Linux kernel
Hadoop(HDFS + MR)
CDH4 beta 1
15
What's there in Bigtop
Build/Packaging infrastructure RPM, DEB, (tarballs, homebrew/MacPorts) VirtualBox, VMWare and KVM VMs Fedora, OpenSUSE, Mageia, CentOS, Ubuntu
Puppet deployment infrastrucutre Integration test infrastrucutre (iTest) Bigtop Jenkins:
http://bigtop01.cloudera.org:8080
16
And the answer is:
Puppet[Bigtop]
17
System software deployment
Packages vs. Puppet code package/file/service
What is packaging? dependency tracking build encapsulation java packaging file layout user creation service registration
18
Does it really work?
Java packaging maven/ivy integration
file layout side-by-side installations of the same package
user creation LDAP/AD provisioning
service registration start on install vs. start on reboot
19
Petascale distributed systems
Scale Yahoo! ~5000 nodes
Deployment orchestration Kerberos::Host_keytab <| title == "hdfs" |> ->
Service["hadoop-hdfs-datanode"]
Highly coordinated distributed system It ain't HTTPD/loadbalancer Rolling upgrades/asynchronous rollbacks
20
Back to tarballs and shell?
What's better for Puppet: fpm or rpm? What is the role of Puppet?
coordinating the entire system: lack of DSL converging an isolated node: will it ever work? a building block for an agent-based system
One agent to rule them all? there's no spoon^H^H^H^H^H^ agent: Whirr MCollective Cloudera Manager, Ambari
21
Evolution, not perfection!
Minimalistic, highly consistent packages /usr/lib/hadoop, /etc/hadoop/conf (alternative) fail gracefully: .... || : ) Java packaging is not solved [yet]: symlinks
Minimalistic Puppet code package/file/service masterless (most of the time) integration with Whirr
BoxGrinder
22
The road ahead
New kind of configuration management /etc/hadoop vs Zookeeper
New kinds of system packaging Parcels (tarballs + metadata) HPS (Hadoop Packaging System)
Orchestration: to puppet or not to puppet? Cloudera Manager Apache Ambari (incubating) Reactor 8: http://reactor8.com
23
Java Packaging
Fate of Java OpenJDK
OSGi Hadoop's view: MAPREDUCE-1700
https://issues.apache.org/jira/browse/MAPREDUCE-1700
Project Jigsaw Language tie-ins? Really?
Linux vendors getting their act together
24
Integration testing
Clean room provisioning Those ain't unit tests – they trash the system
Cluster topology and cluster state discovery How can puppet help us?
Cluster state manipulation Test-driven orchestration Chaos Monkey
How to be successful in OS co-opetition Make everything pluggable (and subvert ;-))
25
Anatomy of iTest
Versioned, JVM-based test/data artifacts Dependency between test artifacts Matching stack of integration tests Implementation
Maven artifacts, pom files JUnit test-execution entry point Groovy for scripting
26
Who's the target audience
End users YOU!
ASF Projects/Bigdata developers from Avro to Zookeeper
Bigdata solutions vendors Cloudera, EMC, Hortonworks, Karmasphere
DevOPs Ebay, Yahoo, Facebook, LinkedIn
27
Who's on-board?
Cloudera CDH4 is 100% based on Bigtop (hadoop v2) Available @cloudera.com
Canonical Ubuntu Server: Hadoop and Bigdata blueprint
https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hdp-hadoop
TrendMicro Hortonworks (partially) EMC, EBay (early stages of prototyping)
28
What's happening?
A special release: Bigtop 0.3.0-incubating Hadoop 1.0.1
Last stable release: Bigtop 0.5.0 Hadoop 2.0.2-alpha
Next stable release: Bigtop 0.6.0 End of Mar 2013 release Hadoop 2.0.3-beta Major focus on developers
29
What Bigtop needs from you?
More of you! Meetup: “Silicon Valley Hands-on Programming”
http://www.meetup.com/HandsOnProgrammingEvents/
More infrastructure for build/test EC2, Supercell, EMC magic cluster, CloudStack
More integration tests Convince your bosses to commit to Bigtop
Validate upstream release using Bigtop
30
Contact§ Bigtop home @Apache:
• http://incubator.apache.org/bigtop/§ Hangout places:
• {dev,user}@bigtop.apache.org• #bigtop on Freenode
§ Roman Shaposhnik• [email protected], [email protected]