Bare metal Hadoop provisioning
-
Upload
godatadriven -
Category
Investor Relations
-
view
1.210 -
download
3
description
Transcript of Bare metal Hadoop provisioning
GoDataDrivenPROUDLY PART OF THE XEBIA GROUP
Bare metal Hadoop provisioning
Kris GeusebroekBig Data Hacker
With ansible and cobbler
1
-- Big Data Borat
“Give man Hadoop cluster he gain insight for a day. Teach man build Hadoop cluster he soon leave for better job. #bigdata”
2
-- Kris Geusebroek
“We’re hiring”
3
GoDataDriven
Don’t want to...Manually install everything needed for a Hadoop cluster...
4
GoDataDriven
Separate layers...- Hardware- OS- Basic install and configuration (Firewalls, IPSec, IPV6, NTPd, raise ulimits, disk formatting and mounting)- Cluster install (Cloudera Manager or Hortonworks Data Platform)- Extra stuff (Monitoring Ganglia, R & R-packages, ......)
5
GoDataDriven
Want...- Horizontal scaling: Effort for an extra machine is minimal- Commodity Industry standard hardware - So cope with errors, malfunctioning, re-installation- Multiple clusters- Experiment first with appropriate configuration for a specific goal - Think memory, hard disks, number of nodes
6
GoDataDriven
Want...- Automate all the tasks for every layer- Parameterise a lot- Simple configuration of the separate layers- Definition of roles (masternode, datanode etc.)
7
GoDataDriven
Possible with...Vendor specific toolsproblem here is they can do only a subset of all tasks
8
GoDataDriven
What we have done here...Nothing new, just another possibility
Nothing tool specific- demo installs Cloudera Manager, but works also with Hortonworks Data Platform.
Most important is:
9
GoDataDriven
Stack...
10
-- Big Data Borat
“Essentially, this solution is CoSSaaS.”
11
-- Big Data Borat
“Essentially, this solution is CoSSaaS. (Couple of Shell Scripts as a Service)”
12
GoDataDriven
Cobbler...
Cobbler used for - CMS- DHCP server- OS image hosting- OS kickstart
cobblerd.org
13
GoDataDriven
Ansible...
Ansible used for - Tying it all together
- Initial setup of network config- One time push of SSH key- Full software install
ansible.cc
14
GoDataDriven
Cloudera Manager...
Cloudera Manager used for - Cluster install software.
- Currently manual labour, can be automated using the API
cloudera.com
15
GoDataDriven
Show me the code...
Add node information to the cobbler CMSFirst make the install dvd known to cobbler:mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvdcobbler import --path=/mnt/dvd --name=CentOS64
Next make the node information known:sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01 --mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=True
If needed, re-enable the netboot flag:sudo cobbler system edit --name=node01 --netboot-enabled=True
16
GoDataDriven
Show me the code...
Ansible needs to know what goes where[cluster]node01node02node03
[cobbler]cobbler
[proxy]cobbler
[ganglia-master]node01
[ganglia-nodes:children]cluster
[cloudera-manager]node01
17
GoDataDriven
Show me the code...
For the rest it’s just a DSL thinghy with extra’s- hosts: - cloudera-manager - cluster user: root sudo: yes vars_files: - vars/common.yml tasks: - include: cloudera-manager/tasks/common.yml handlers: - include: cloudera-manager/handlers/main.yml
- name: Configure CM4 Repo copy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=root group=root
- name: Install CM4 common stuff yum: name=$item state=installed
18
Demo...
19
GoDataDriven
Shared problems...- No magic: Vendor specific hardware can screw things up (strange names for disk mounts for example)- Bios settings, different RAID settings are not handled (yet).- Large amount of initial network traffic with large clusters (N-times downloading the same software packages from yum repositories) => Repo mirroring to the rescue- MAC address of all nodes must be known
20
GoDataDriven
Take aways...- Do automate from the start- It’s easy- Use (our) open source code to get a head starthttps://github.com/godatadriven/ansible_cluster- Our team will do the additional consultancy
21
GoDataDriven
We’re hiring / Questions? / Thank you!
Kris GeusebroekBig Data Hacker
22