Hadoop operations

40
APOLLO GROUP So Your Cluster Isn't Yahoo-sized (yet) Hadoop Operations: Starting Out Small Michael Arnold Principal Systems Engineer 14 June 2012

description

 

Transcript of Hadoop operations

Page 1: Hadoop operations

APOLLO GROUP

So Your Cluster Isn't Yahoo-sized (yet)

Hadoop Operations: Starting Out Small

Michael ArnoldPrincipal Systems Engineer14 June 2012

Page 2: Hadoop operations

2APOLLO GROUP

Who

What (Definitions)

Decisions for Now

Decisions for Later

Lessons Learned

Agenda

© 2012 Apollo Group

Page 3: Hadoop operations

3APOLLO GROUP

APOLLO GROUP

Who

© 2012 Apollo Group

Page 4: Hadoop operations

4APOLLO GROUP

Who is Apollo?

© 2012 Apollo Group

Apollo Group is a leading provider of higher education programs for working adults.

Page 5: Hadoop operations

5APOLLO GROUP

Systems Administrator

Automation geek

13 years in IT

I deal with:

–Server hardware specification/configuration

–Server firmware

–Server operating system

–Hadoop application health

–Monitoring all the above

Who is Michael Arnold?

© 2012 Apollo Group

Page 6: Hadoop operations

6APOLLO GROUP

APOLLO GROUP

What

Definitions

© 2012 Apollo Group

Page 7: Hadoop operations

7APOLLO GROUP

Q: What is a tiny/small/medium/large cluster?

A:

–Tiny: 1-9

–Small: 10-99

–Medium: 100-999

–Large: 1000+

–Yahoo-sized: 4000

Definitions

© 2012 Apollo Group

Page 8: Hadoop operations

8APOLLO GROUP

Q: What is a “headnode”?

A: A server that runs one or more of the following Hadoop processes:

–NameNode

–JobTracker

–Secondary NameNode

–ZooKeeper

–HBase Master

Definitions

© 2012 Apollo Group

Page 9: Hadoop operations

9APOLLO GROUP

APOLLO GROUP

What decisions should you make now and which can you postpone for later?

Decisions for Now

© 2012 Apollo Group

Page 10: Hadoop operations

10APOLLO GROUP

Amazon

Apache

Cloudera

Greenplum

Hortonworks

IBM

MapR

Platform Computing

Which Hadoop distribution?

© 2012 Apollo Group

Page 11: Hadoop operations

11APOLLO GROUP

Can be OK for small clusters BUT

–virtualization adds overhead

–can cause performance degradation

–cannot take advantage of Hadoop rack locality

Virtualization can be good for:

–functional testing of M/R job or workflow changes

–evaluation of Hadoop upgrades

Should you virtualize?

© 2012 Apollo Group

Page 12: Hadoop operations

12APOLLO GROUP

Inexpensive

Not “enterprisey” hardware

–No RAID*

–No Redundant power*

Low power consumption

No optical drives

–get systems that can boot off the network

* except in headnodes

What sort of hardware should you be considering?

© 2012 Apollo Group

Page 13: Hadoop operations

13APOLLO GROUP

Start at the bottom and work your way up

Leave room in your cabinets for more machines

Plan for capacity expansion

© 2012 Apollo Group

Page 14: Hadoop operations

14APOLLO GROUP

Deploy your initial cluster in two cabinets

–One headnode, one switch, and several (five) datanodes per cabinet

Plan for capacity expansion (cont.)

© 2012 Apollo Group

Page 15: Hadoop operations

15APOLLO GROUP

Install a second cluster in the empty space in the upper half of the cabinet

Plan for capacity expansion (cont.)

© 2012 Apollo Group

Page 16: Hadoop operations

16APOLLO GROUP

APOLLO GROUP

What decisions should you make now and which can you postpone for later?

Decisions for Later

© 2012 Apollo Group

Page 17: Hadoop operations

17APOLLO GROUP

Depends upon your:

Budget

Data size

Workload characteristics

SLA

What size cluster?

© 2012 Apollo Group

Page 18: Hadoop operations

18APOLLO GROUP

Are your MapReduce jobs:

compute-intensive?

reading lots of data?

http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/

What size cluster? (cont.)

© 2012 Apollo Group

Page 19: Hadoop operations

19APOLLO GROUP

If more than one switch in the cluster:

YES

Should you implement rack awareness?

© 2012 Apollo Group

Page 20: Hadoop operations

20APOLLO GROUP

If not in the beginning, then as soon as possible.

Boot disks will fail.

Automated OS and application installs:

–Save time

–Reduce errors

•Cobbler/Spacewalk/Foreman/xCat/etc

•Puppet/Chef/Cfengine/shell scripts/etc

Should you use automation?

© 2012 Apollo Group

Page 21: Hadoop operations

21APOLLO GROUP

APOLLO GROUP

Lessons Learned

© 2012 Apollo Group

Page 22: Hadoop operations

22APOLLO GROUP

Don't add redundancy and features (server/network) that will make things more

complicated and expensive.

Hadoop has built-in redundancies.

Don't overlook them.

Keep It Simple

© 2012 Apollo Group

Page 23: Hadoop operations

23APOLLO GROUP

Twelve hours of manual work in the datacenter is not fun.

Make sure all server firmware is configured identically.

–HP SmartStart Scripting Toolkit

–Dell OpenManage Deployment Toolkit

–IBM ServerGuide Scripting Toolkit

Automate the Hardware

© 2012 Apollo Group

Page 24: Hadoop operations

24APOLLO GROUP

(Just not of the Hadoop software.)

Datanodes can be decommissioned, patched, and added back into the cluster without service

downtime.

Rolling upgrades are possible

© 2012 Apollo Group

Page 25: Hadoop operations

25APOLLO GROUP

Bad NIC/switchport can cause cluster slowness.

Slow disks can cause intermittent job slowdowns.

The smallest thing can have a big impact on the cluster

© 2012 Apollo Group

Page 26: Hadoop operations

26APOLLO GROUP

On ext3/ext4:

–Small blocks are not padded to the HDFS block-size, but rather the actual size of the data.

–Each HDFS block is actually two files on the datanode's filesystem:

•The actual data and

•A metadata/checksum file

HDFS blocks are weird

© 2012 Apollo Group

# ls -l blk_1058778885645824207*

-rw-r--r-- 1 hdfs hdfs 35094 May 14 01:26 blk_1058778885645824207

-rw-r--r-- 1 hdfs hdfs 283 May 14 01:26 blk_1058778885645824207_19155994.meta

Page 27: Hadoop operations

27APOLLO GROUP

Be careful tuning your datanode filesystems.

• mkfs -t ext4 -T largefile4 ... (probably bad)

• mkfs -t ext4 -i 131072 -m 0 ... (better)

Do not prematurely optimize

© 2012 Apollo Group

/etc/mke2fs.conf

[fs_types]

hadoop = {

features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink, extra_isize

inode_ratio = 131072

blocksize = -1

reserved_ratio = 0

default_mntopts = acl,user_xattr

}

Page 28: Hadoop operations

29APOLLO GROUP

hdfs://hdfs.delta.hadoop.apollogrp.edu:8020/

mapred.delta.hadoop.apollogrp.edu:8021

http://oozie.delta.hadoop.apollogrp.edu:11000/

hiveserver.delta.hadoop.apollogrp.edu:10000

Yes, the names are long, but I bet you can figure out how to connect to Bravo Cluster.

Use DNS-friendly names for services

© 2012 Apollo Group

Page 29: Hadoop operations

30APOLLO GROUP

pdsh/Cluster SSH/mussh/etc

SSH in a for loop is so 2010

FUNC/MCollective

Use a parallel, remote execution tool

© 2012 Apollo Group

Page 30: Hadoop operations

31APOLLO GROUP

20-100GB /var/log

–Implement log purging cronjobs or your log directories will fill up.

Beware: M/R jobs can fill up /tmp as well.

Make your log directories as large as you can.

© 2012 Apollo Group

Page 31: Hadoop operations

33APOLLO GROUP

Serial Over LAN is awesome when booting a system.

Standardized hardware/temperature monitoring.

Simple remote power control.

Insist on IPMI 2.0 for out of band management of server hardware.

© 2012 Apollo Group

Page 32: Hadoop operations

34APOLLO GROUP

Enable portfast on your server switch ports or the BMCs may never get a DHCP lease.

Spanning-tree is the devil

© 2012 Apollo Group

Page 33: Hadoop operations

35APOLLO GROUP 35APOLLO GROUP

You may end up doing so as well.

Apollo has re-built it's cluster four times.

© 2012 Apollo Group

Page 34: Hadoop operations

36APOLLO GROUP

First build

Cloudera Professional Services helped install CDH

Four nodes

Manually build OS via USB CDROM.

CDH2

Apollo Timeline

© 2012 Apollo Group

Page 35: Hadoop operations

37APOLLO GROUP

Second build

Cobbler

All software deployment is via kickstart. Very little is in puppet. Config files are deployed via wget.

CDH2

Apollo Timeline

© 2012 Apollo Group

Page 36: Hadoop operations

38APOLLO GROUP

Third build

OS filesystem partitioning needed to change.

Most software deployment still via kickstart.

CDH3b2

Apollo Timeline

© 2012 Apollo Group

Page 37: Hadoop operations

39APOLLO GROUP

Fourth build

HDFS filesystem inodes needed to be increased.

Full puppet automation.

Added redundant/hotswap enterprise hardware for headnodes.

CDH3u1

Apollo Timeline

© 2012 Apollo Group

Page 38: Hadoop operations

40APOLLO GROUP

Hardware

–disk failures (40+)

–disk cabling (6)

–RAM (2)

–switch port (1)

Software

–Cluster

•NFS (NN -> 2NN metadata)

–Job

•TT java heap

•Running out of /tmp or /var/log/hadoop

•Running out of HDFS space

Cluster failures at Apollo

© 2012 Apollo Group

Page 39: Hadoop operations

41APOLLO GROUP

You can spend all the time in the world trying to get the best CPU/RAM/HDD/switch/cabinet configuration, but you are running on pure luck until you understand your cluster's workload.

Know your workload

© 2012 Apollo Group

Page 40: Hadoop operations

42APOLLO GROUP

APOLLO GROUP

Questions?

© 2012 Apollo Group