Post on 06-May-2015
Building Hadoop Based Big Data Environment
Evans Ye @ TWHUG
2013/12/14
• Evans Ye @
• Dumbo Team
• http://dumbointaiwan.blogspot.tw/
Who am I
04/11/2023 Copyright 2013 Trend Micro Inc.
• Building your own Hadoop version
• Hadoop Deployment
• Hadoop release engineering
• The development environment
• Bigtop puppet
Agenda
04/11/2023 Copyright 2013 Trend Micro Inc.
• Add your own patch at any time– From community perspective, they need to take care about
backward complicity,which need much more time and effort on it.
• Fetch official patches in to current adopted version– You may not upgrade your Hadoop version frequently,
But there’s a specific need for that patch.
• Flexibility, Business needed features
Why Build our own version
04/11/2023 Copyright 2013 Trend Micro Inc.
04/11/2023 Copyright 2013 Trend Micro Inc.
As a Beginner
04/11/2023 Copyright 2013 Trend Micro Inc.
What’s your work?Build Hadoop Infrastructure
04/11/2023 Copyright 2013 Trend Micro Inc.
I thought you just need to yum install Hadoop.….
• git clone
• Make some changes
• Builde binary tarball
Brute force
04/11/2023 Copyright 2013 Trend Micro Inc.
core-site.xmlhdfs-site.xml
mapred-site.xml…
How to do version control?
04/11/2023 Copyright 2013 Trend Micro Inc.
Bigtop
• Apache Hadoop App developers: – Run pseudo-distributed Hadoop cluster to test your code on.
• Vendors: – Build your own Apache Hadoop distribution, customized from
Apache Bigtop bits.
• Packaging, Deployment, Integration Testing
How bigtop helps you
04/11/2023 Copyright 2013 Trend Micro Inc.
• Ubuntu 10.10
• CentOS 5/6
• Fedora 18
• Mageia 1
• openSUSE 12.2
Supported Linux Distro
04/11/2023 Copyright 2013 Trend Micro Inc.
• Build hadoop-common (see BUILDING.txt)
– hadoop-common$ mvn package –Pdist,docs,src,native -Dtar
• Prepare your src tar in bigtop
• Bigtop$ make hadoop-rpm
Build
04/11/2023 Copyright 2013 Trend Micro Inc.
04/11/2023 Copyright 2013 Trend Micro Inc.
Hadoop Deployment
• Hadoop related config– core-site.xml– hdfs-site.xml– mapred-site.xml– log4j.properties– hadoop-env.sh– fair-scheduler.xml– rack-topology– hadoop-metrics.properties– taskcontroller.cfg
Configuration files
04/11/2023 Copyright 2013 Trend Micro Inc.
• Hadoop related file and directory– Namenode metadata
• /name/1, /name/2– Datanode
• /data/1, /data/2 , /data/3 , /data/4– Tasktracker
• /mapred/1/local, /mapred/2/local– …
Local Directories
04/11/2023 Copyright 2013 Trend Micro Inc.
More hadoop ecosystem
04/11/2023 Copyright 2013 Trend Micro Inc.
04/11/2023 Copyright 2013 Trend Micro Inc.
• Lots of nodes need to be configured
• Less human involved, less mistake made
• Configuration changed quite often– adjust fair scheduler– enable/disable short circuit– try more performance improvement configurations
Problems to solve
04/11/2023 Copyright 2013 Trend Micro Inc.
Hadooppet
• A IT automation tool to help system administrators automate the many repetitive tasks
• You need to only define the desired state
What is puppet ?
04/11/2023 Copyright 2013 Trend Micro Inc.
• A general hadoop cluster deployment tool based on puppet
• Kerberos / ldap auto configured
• A set of hadoop / kerberos management tool
• A set of sanity check scripts for trend hadoop related services
• Manage configuration on puppetmaster
What is Hadooppet ?
04/11/2023 Copyright 2013 Trend Micro Inc.
• Abstract environment specific configurations in a single configuration file
• setup.sh– namenode_fqdns=(“dev1.example.com” “dev2.example.com”)– namenode_dirs=(“/name/1” “/name/2”)– namenode_heap=32g– map_slots=5– reduce_slots=3– …
Design
04/11/2023 Copyright 2013 Trend Micro Inc.
• Can be used to setup any kind of hadoop cluster
• When doing main version upgarade, minimal the downtime– hadoop1 hadoop2
Namenode Active/Standby NamenodeSecondarynamenode Journalnodes ZKFC
Benifits
04/11/2023 Copyright 2013 Trend Micro Inc.
04/11/2023 Copyright 2013 Trend Micro Inc.
Release Engineering
04/11/2023 Copyright 2013 Trend Micro Inc.
• Build src tarball in hadoop-common
• Build rpms in bigtop
• submit build to release yum repo
• yum update on hadoop cluster…
Manually
04/11/2023 Copyright 2013 Trend Micro Inc.
• Setup hadoop-common daily build
• Setup Bigtop release Build – should be manually triggered
• Setup Hadooppet daily build– Run sanity checks on a REAL CLUSTER
Continuous Integration
04/11/2023 Copyright 2013 Trend Micro Inc.
• Build a Xen Server Cluster
Virtualization
04/11/2023 Copyright 2013 Trend Micro Inc.
04/11/2023 Copyright 2013 Trend Micro Inc.
• Pycon 2012– Small Python Tools for Software Release Engineering
• An automation tool to manageVM lifecycle
• Use Python XenAPI
• Create temporary VM for testingby self service
• Destroy it when the testingis finished
give-me-vm
04/11/2023 Copyright 2013 Trend Micro Inc.
• ./give_me_vm.py
• setup passphraseless ssh between each VM
• set hostname
• Install Hadooppet on master
• run deployment
• run sanity checks
• ./destroy_vm.py
Build auto deployment on Hadooppet
04/11/2023 Copyright 2013 Trend Micro Inc.
04/11/2023 Copyright 2013 Trend Micro Inc.
Development Environment
For hadoop service developers…
• No enough hadoop client for each developers
• Developer can not reach server side while developing hadoop related services
• Can not experiment new technology like impala spark flume
• CI on Hadoop related services
04/11/2023 Copyright 2013 Trend Micro Inc.
give-me-vm + Hadoop all-in-one VM
• Use Hadooppet to setup a peudo-distributed hadoop VM as Xenserver template
• get a Hadoop all-in-one VM via give-me-vm
• Services integrate its CI test with hadoop all-in-one VM
04/11/2023 Copyright 2013 Trend Micro Inc.
04/11/2023 Copyright 2013 Trend Micro Inc.
Bigtop
puppet
Bigtop puppet
• Bigtop also has a set of puppet scripts to deploy Hadoop ecosystem
04/11/2023 Copyright 2013 Trend Micro Inc.
Bigtop puppet
• Preparation:– A VM with jdk, puppet installed– mkdir –p /data/{1,2}– git clone https://github.com/apache/bigtop.git
04/11/2023 Copyright 2013 Trend Micro Inc.
• There’re many great deployment tool exist– Ambari, CM, ETU appliance– Choose suitable distribution by your business need
• If you want to do it by yourself– Bigtop can do packaging for you easily– Leverage bigtop puppet module for your deployment
Conclusion
04/11/2023 Copyright 2013 Trend Micro Inc.
Questions?
Thank you !