Hadoop - Lessons Learned

Hadooplessons learned

@tcurdtgithub.com/tcurdt

yourdailygeekery.com

hiring

Agenda

· hadoop? really? cloud?· integration· mapreduce· operations· community and outlook

Why Hadoop?

“It is a new and improved version of enterprise tape

drive”

20 machines20 files, 1.5 GB each

grep “needle” file

hadoop job grep.jar

0 17.5 35.0 52.5 70.0

unfair

Map Reduce

Run your own?

http://bit.ly/elastic-mr-pig



Integration

black box

· hadoop-cat

· hadoop-grep

· hadoop-range --prefix /logs --from 2012-05-15 --until 2012-05-22 --postfix /*play*.seq | xargs hadoop jar

· streaming jobs

Engineers

· mount hdfs

· pig / hive

· data dumps

Non-Engineering Folks

Map Reduce

InputFormat

HDFS files

Split

Map

Combiner

Partitioner

Copy and Merge

Reducer

OutputFormat

Reducer

Sort

Split

Map

Combiner

Sort

Split

Map

Combiner

Sort

Split

Map

Combiner

Sort

Combiner Combiner

MAPREDUCE-346 (since 2009)

12/05/25 01:27:38 INFO mapred.JobClient: Reduce input records=106..12/05/25 01:27:38 INFO mapred.JobClient: Combine output records=40912/05/25 01:27:38 INFO mapred.JobClient: Map input records=11270584412/05/25 01:27:38 INFO mapred.JobClient: Reduce output records=412/05/25 01:27:38 INFO mapred.JobClient: Combine input records=64842079..12/05/25 01:27:38 INFO mapred.JobClient: Map output records=64841776

map in : 112705844 *********************************map out : 64841776 *****************combine in : 64842079 *****************combine out : 409 |reduce in : 106 |reduce out : 4 |

Job Counters

map in : 20000 **************map out : 40000 ******************************combine in : 40000 ******************************combine out : 10001 ********reduce in : 10001 ********reduce out : 10001 ********

Job Counters

mapred.reduce.tasks = 0

Map-only

public class EofSafeSequenceFileInputFormat<K,V> extends SequenceFileInputFormat<K,V> { ...}

public class EofSafeRecordReader<K,V> extends RecordReader<K,V> { ... public boolean nextKeyValue() throws IOException, InterruptedException { try { return this.delegate.nextKeyValue(); } catch(EOFException e) { return false; } } ...}

EOF on append

ASN1, custom java serialization, Thrift

Serialization

before

now

protobuf

public static class Play extends CustomWritable {

public final LongWritable time = new LongWritable();

public final LongWritable owner_id = new LongWritable();

public final LongWritable track_id = new LongWritable();

public Play() { fields = new WritableComparable[] { owner_id, track_id, time }; }}

Custom Writables

BytesWritable bytes = new BytesWritable();...byte[] buffer = bytes.getBytes();

Fear the State

public void reduce( LongTriple key, Iterable<LongWritable> values, Context ctx) {

for(LongWritable v : values) { } for(LongWritable v : values) { }}

public void reduce( LongTriple key, Iterable<LongWritable> values, Context ctx) { buffer.clear(); for(LongWritable v : values) { buffer.add(v); } for(LongWritable v : buffer.values()) { }}

Re-Iterate

HADOOP-5266 (applied to 0.21.0)

long min = 1;long max = 10000000;

FastBitSet set = new FastBitSet(min, max);

for(long i = min; i<max; i++) { set.set(i);}

BitSets

org.apache.lucene.util.*BitSet

Data Structures

http://bit.ly/data-structureshttp://bit.ly/bloom-filtershttp://bit.ly/stream-lib

http://bit.ly/data-structures

http://bit.ly/data-structures

http://bit.ly/bloom-filters

http://bit.ly/bloom-filters

http://bit.ly/stream-lib

http://bit.ly/stream-lib

General Tips

· test on small datasets, test on your machine

· many reducers

· always consider a combiner and partitioner

· pig / streaming for one-time jobs,java/scala for recurring

http://bit.ly/map-reduce-book



Operations

pdsh -w "hdd[001-019]" \"sudo sv restart /etc/sv/hadoop-tasktracker"

runit / init.d

pdsh / dsh

use chef / puppet

Hardware

· 2x name nodes raid 1

· 12 cores, 48GB RAM, xfs, 2x1TB

· n x data nodes no raid

· 12 cores, 16GB RAM, xfs, 4x2TB

Monitoringdfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31dfs.period=10dfs.servers=...

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31mapred.period=10mapred.servers=...

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31jvm.period=10jvm.servers=...

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31rpc.period=10rpc.servers=...

# ignoreugi.class=org.apache.hadoop.metrics.spi.NullContext

Monitoring

total capacity capacity used

Compression

# of 64MB blocks# of bytes needed# of bytes used# bytes reclaimed

bzip2 / gzip / lzo / snappyio.seqfile.compression.type = BLOCKio.seqfile.compression.blocksize = 512000

Janitor

hadoop-expire -url namenode.here -path /tmp -mtime 7d -delete

The last block of an HDFS block only occupies the required space. So a 4k file only consumes 4k on disk.-- Owen

BUSTED

find \ -wholename "/var/log/hadoop/hadoop-*" \ -wholename "/var/log/hadoop/job_*.xml" \ -wholename "/var/log/hadoop/history/*" \ -wholename "/var/log/hadoop/history/\\.*.crc" \ -wholename "/var/log/hadoop/history/done/*" \ -wholename "/var/log/hadoop/history/done/\\.*.crc" \ -wholename "/var/log/hadoop/userlogs/attempt_*" \ -mtime +7 \ -daystart \ -delete

Logfiles

Limits

hdfs hard nofile 128000hdfs soft nofile 64000mapred hard nofile 128000mapred soft nofile 64000

fs.file-max = 128000

sysctl.conf

limits.conf

Localhost

127.0.0.1 localhost localhost.localdomain127.0.1.1 hdd01

127.0.0.1 localhost localhost.localdomain127.0.1.1 hdd01.some.net hdd01

before

hadoop

Rackaware

<property> <name>topology.script.file.name</name> <value>/path/to/script/location-from-ip</value> <final>true</final></property>

#!/usr/bin/rubylocation = { 'hdd001.some.net' => '/ams/1', '10.20.2.1' => '/ams/1', 'hdd002.some.net' => '/ams/2', '10.20.2.2' => '/ams/2',}

puts ARGV.map { |ip| location[ARGV.first] || '/default-rack' }.join(' ')

site config

topology script

for f in `hdfs hadoop fsck / | grep "Replica placement policy is violated" | awk -F: '{print $1}' | sort | uniq | head -n1000`; do hadoop fs -setrep -w 4 $f hadoop fs -setrep 3 $fdone

Fix the Policy

hadoop fsck / -openforwrite -files | grep -i "OPENFORWRITE: MISSING 1 blocks of total size" | awk '{print $1}' | xargs -L 1 -i hadoop dfs -mv {} /lost+notfound

Fsck

Community

hadoop

* from markmail.org

Community

The Enterprise Effect

“The Community Effect” (in 2011)

Community

mapreduce

core

* from markmail.org

The Future

real timeincremental

flexible pipelinesrefined API

refined implementation

Real Time Datamining and Aggregation at Scale (Ted Dunning)

Eventually Consistent Data Structures (Sean Cribbs)

Real-time Analytics with HBase (Alex Baranau)

Profiling and performance-tuning your Hadoop pipelines (Aaron Beppu)

From Batch to Realtime with Hadoop (Lars George)

Event-Stream Processing with Kafka (Tim Lossen)

Real-/Neartime analysis with Hadoop & VoltDB (Ralf Neeb)

Take Aways

·use hadoop only if you must·really understand the pipeline·unbox the black box

@tcurdtgithub.com/tcurdt

yourdailygeekery.com

That’s it folks!

Hadoop - Lessons Learned

Technology

Transcript of Hadoop - Lessons Learned