Building hadoop based big data environment

Post on 06-May-2015

1.272 views 1 download

Transcript of Building hadoop based big data environment

Building Hadoop Based Big Data Environment

Evans Ye @ TWHUG

2013/12/14

• Evans Ye @

• Dumbo Team

• http://dumbointaiwan.blogspot.tw/

Who am I

04/11/2023 Copyright 2013 Trend Micro Inc.

• Building your own Hadoop version

• Hadoop Deployment

• Hadoop release engineering

• The development environment

• Bigtop puppet

Agenda

04/11/2023 Copyright 2013 Trend Micro Inc.

• Add your own patch at any time– From community perspective, they need to take care about

backward complicity,which need much more time and effort on it.

• Fetch official patches in to current adopted version– You may not upgrade your Hadoop version frequently,

But there’s a specific need for that patch.

• Flexibility, Business needed features

Why Build our own version

04/11/2023 Copyright 2013 Trend Micro Inc.

04/11/2023 Copyright 2013 Trend Micro Inc.

As a Beginner

04/11/2023 Copyright 2013 Trend Micro Inc.

What’s your work?Build Hadoop Infrastructure

04/11/2023 Copyright 2013 Trend Micro Inc.

I thought you just need to yum install Hadoop.….

• git clone

• Make some changes

• Builde binary tarball

Brute force

04/11/2023 Copyright 2013 Trend Micro Inc.

core-site.xmlhdfs-site.xml

mapred-site.xml…

How to do version control?

04/11/2023 Copyright 2013 Trend Micro Inc.

Bigtop

• Apache Hadoop App developers: – Run pseudo-distributed Hadoop cluster to test your code on.

• Vendors: – Build your own Apache Hadoop distribution, customized from

Apache Bigtop bits.

• Packaging, Deployment, Integration Testing

How bigtop helps you

04/11/2023 Copyright 2013 Trend Micro Inc.

• Ubuntu 10.10

• CentOS 5/6

• Fedora 18

• Mageia 1

• openSUSE 12.2

Supported Linux Distro

04/11/2023 Copyright 2013 Trend Micro Inc.

• Build hadoop-common (see BUILDING.txt)

– hadoop-common$ mvn package –Pdist,docs,src,native -Dtar

• Prepare your src tar in bigtop

• Bigtop$ make hadoop-rpm

Build

04/11/2023 Copyright 2013 Trend Micro Inc.

04/11/2023 Copyright 2013 Trend Micro Inc.

Hadoop Deployment

• Hadoop related config– core-site.xml– hdfs-site.xml– mapred-site.xml– log4j.properties– hadoop-env.sh– fair-scheduler.xml– rack-topology– hadoop-metrics.properties– taskcontroller.cfg

Configuration files

04/11/2023 Copyright 2013 Trend Micro Inc.

• Hadoop related file and directory– Namenode metadata

• /name/1, /name/2– Datanode

• /data/1, /data/2 , /data/3 , /data/4– Tasktracker

• /mapred/1/local, /mapred/2/local– …

Local Directories

04/11/2023 Copyright 2013 Trend Micro Inc.

More hadoop ecosystem

04/11/2023 Copyright 2013 Trend Micro Inc.

04/11/2023 Copyright 2013 Trend Micro Inc.

• Lots of nodes need to be configured

• Less human involved, less mistake made

• Configuration changed quite often– adjust fair scheduler– enable/disable short circuit– try more performance improvement configurations

Problems to solve

04/11/2023 Copyright 2013 Trend Micro Inc.

Hadooppet

• A IT automation tool to help system administrators automate the many repetitive tasks

• You need to only define the desired state

What is puppet ?

04/11/2023 Copyright 2013 Trend Micro Inc.

• A general hadoop cluster deployment tool based on puppet

• Kerberos / ldap auto configured

• A set of hadoop / kerberos management tool

• A set of sanity check scripts for trend hadoop related services

• Manage configuration on puppetmaster

What is Hadooppet ?

04/11/2023 Copyright 2013 Trend Micro Inc.

• Abstract environment specific configurations in a single configuration file

• setup.sh– namenode_fqdns=(“dev1.example.com” “dev2.example.com”)– namenode_dirs=(“/name/1” “/name/2”)– namenode_heap=32g– map_slots=5– reduce_slots=3– …

Design

04/11/2023 Copyright 2013 Trend Micro Inc.

• Can be used to setup any kind of hadoop cluster

• When doing main version upgarade, minimal the downtime– hadoop1 hadoop2

Namenode Active/Standby NamenodeSecondarynamenode Journalnodes ZKFC

Benifits

04/11/2023 Copyright 2013 Trend Micro Inc.

04/11/2023 Copyright 2013 Trend Micro Inc.

Release Engineering

04/11/2023 Copyright 2013 Trend Micro Inc.

• Build src tarball in hadoop-common

• Build rpms in bigtop

• submit build to release yum repo

• yum update on hadoop cluster…

Manually

04/11/2023 Copyright 2013 Trend Micro Inc.

• Setup hadoop-common daily build

• Setup Bigtop release Build – should be manually triggered

• Setup Hadooppet daily build– Run sanity checks on a REAL CLUSTER

Continuous Integration

04/11/2023 Copyright 2013 Trend Micro Inc.

• Build a Xen Server Cluster

Virtualization

04/11/2023 Copyright 2013 Trend Micro Inc.

04/11/2023 Copyright 2013 Trend Micro Inc.

• Pycon 2012– Small Python Tools for Software Release Engineering

• An automation tool to manageVM lifecycle

• Use Python XenAPI

• Create temporary VM for testingby self service

• Destroy it when the testingis finished

give-me-vm

04/11/2023 Copyright 2013 Trend Micro Inc.

• ./give_me_vm.py

• setup passphraseless ssh between each VM

• set hostname

• Install Hadooppet on master

• run deployment

• run sanity checks

• ./destroy_vm.py

Build auto deployment on Hadooppet

04/11/2023 Copyright 2013 Trend Micro Inc.

04/11/2023 Copyright 2013 Trend Micro Inc.

Development Environment

For hadoop service developers…

• No enough hadoop client for each developers

• Developer can not reach server side while developing hadoop related services

• Can not experiment new technology like impala spark flume

• CI on Hadoop related services

04/11/2023 Copyright 2013 Trend Micro Inc.

give-me-vm + Hadoop all-in-one VM

• Use Hadooppet to setup a peudo-distributed hadoop VM as Xenserver template

• get a Hadoop all-in-one VM via give-me-vm

• Services integrate its CI test with hadoop all-in-one VM

04/11/2023 Copyright 2013 Trend Micro Inc.

04/11/2023 Copyright 2013 Trend Micro Inc.

Bigtop

puppet

Bigtop puppet

• Bigtop also has a set of puppet scripts to deploy Hadoop ecosystem

04/11/2023 Copyright 2013 Trend Micro Inc.

Bigtop puppet

• Preparation:– A VM with jdk, puppet installed– mkdir –p /data/{1,2}– git clone https://github.com/apache/bigtop.git

04/11/2023 Copyright 2013 Trend Micro Inc.

• There’re many great deployment tool exist– Ambari, CM, ETU appliance– Choose suitable distribution by your business need

• If you want to do it by yourself– Bigtop can do packaging for you easily– Leverage bigtop puppet module for your deployment

Conclusion

04/11/2023 Copyright 2013 Trend Micro Inc.

Questions?

Thank you !