Advanced Hadoop Tuning and Optimization - Hadoop Consulting

22
Presented By: Sanjay Sharma Advanced Hadoop Tuning and Optimizations Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Transcript of Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Page 1: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Presented By:

Sanjay Sharma

Advanced Hadoop Tuning and Optimizations

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 2: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Hadoop- The Good/Bad/Ugly

Hadoop is GOOD- that is why we all are here

2

Hadoop is not BAD- else we would not be here

Hadoop is sometimes Ugly- why? Out of the box configuration not friendly Difficult to debug Performance – tuning/optimizations is a

black art

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 3: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Configuration parameters

Page 4: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Compression

mapred.compress.map.output: Map Output Compression

Default: False Pros: Faster disk writes, lower disk space usage, lesser time

spent on data transfer (from mappers to reducers). Cons: Overhead in compression at Mappers and decompression

at Reducers. Suggestions: For large cluster and large jobs this property

should be set true. The compression codec can also be set through the property mapred.map.output.compression.codec (Default is org.apache.hadoop.io.compress.DefaultCodec).

4

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 5: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Speculative Execution

mapred.map/reduce.tasks.speculative.execution: Enable/Disable task (map/reduce) speculative Execution

Default: True Pros: Reduces the job time if the task progress is slow due to memory

unavailability or hardware degradation. Cons: Increases the job time if the task progress is slow due to complex and

large calculations. On a busy cluster speculative execution can reduce overall throughput, since redundant tasks are being executed in an attempt to bring down the execution time for a single job.

Suggestions: In large jobs where average task completion time is significant (> 1 hr) due to complex and large calculations and high throughput is required the speculative execution should be set to false.

5

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 6: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Number of Maps/Reducers

mapred.tasktracker.map/reduce.tasks.maximum: Maximum tasks (map/reduce) for a tasktracker

Default: 2 Suggestions:

Recommended range - (cores_per_node)/2 to 2x(cores_per_node), especially for large clusters.

This value should be set according to the hardware specification of cluster nodes and resource requirements of tasks (map/reduce).

6

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 7: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

File block size

dfs.block.size: File system block size Default: 67108864 (bytes) Suggestions:

Small cluster and large data set: default block size will create a large number of map tasks. e.g. Input data size = 160 GB and dfs.block.size = 64 MB then the minimum no. of maps=

(160*1024)/64 = 2560 maps. If dfs.block.size = 128 MB minimum no. of maps= (160*1024)/128 = 1280 maps. If dfs.block.size = 256 MB minimum no. of maps= (160*1024)/256 = 640 maps.

In a small cluster (6-10 nodes) the map task creation overhead is considerable. So dfs.block.size should be large in this case but small enough to utilize all the cluster resources.

The block size should be set according to size of the cluster, map task complexity, map task capacity of cluster and average size of input files.

7

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 8: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Sort size

io.sort.mb: Buffer size (MBs) for sorting Default: 100 Suggestions:

For Large jobs (the jobs in which map output is very large), this value should be increased keeping in mind that it will increase the memory required by each map task. So the increment in this value should be according to the available memory at the node.

Greater the value of io.sort.mb, lesser will be the spills to the disk, saving write to the disk.

8

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 9: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Sort factor

io.sort.factor: Stream merge factor Default: 10 Suggestions:

For Large jobs (the jobs in which map output is very large and number of maps are also large) which have large number of spills to disk, value of this property should be increased.

The number of input streams (files) to be merged at once in the map/reduce tasks, as specified by io.sort.factor, should be set to a sufficiently large value (for example, 100) to minimize disk accesses.

Increment in io.sort.factor, benefits in merging at reducers since the last batch of streams (equal to io.sort.factor) are sent to the reduce function without merging, thus saving time in merging.

9

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 10: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

JVM reuse

mapred.job.reuse.jvm.num.tasks: Reuse single JVM Default: 1 Suggestions: The minimum overhead of JVM creation for each task is

around 1 second. So for the tasks which live for seconds or a few minutes and have lengthy initialization, this value can be increased to gain performance.

10

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 11: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Reduce parallel copies

mapred.reduce.parallel.copies: Threads for parallel copy at reducer Default: 5 Description: The number of threads used to copy map outputs to the

reducer. Suggestions: For Large jobs (the jobs in which map output is very large),

value of this property can be increased keeping in mind that it will increase the total CPU usage.

11

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 12: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

The Other Threads

dfs.namenode{/mapred.job.tracker}.handler.count :server threads that handle remote procedure calls (RPCs)

Default: 10 Suggestions: This can be increased for larger server (50-64).

dfs.datanode.handler.count :server threads that handle remote procedure calls (RPCs)

Default: 3 Suggestions: This can be increased for larger number of HDFS clients (6-8).

tasktracker.http.threads : number of worker threads on the HTTP server on each TaskTracker

Default: 40 Suggestions: The can be increased for larger clusters (50).

12

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 13: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Other hotspots

Page 14: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Revelation-Temporary space

Temporary space allocation: Jobs which generate large intermediate data (map output) should have

enough temporary space controlled by property mapred.local.dir. This property specifies list directories where the MapReduce stores intermediate data for jobs. The data is cleaned-up after the job completes.

By default, replication factor for file storage on HDFS is 3, which means that every file has three replicas. As a rule of thumb, at least 25% of the total hard disk should be allocated for intermediate temporary output. So effectively, only ¼ hard disk space is available for business use.

The default value for mapred.local.dir is ${hadoop.tmp.dir}/mapred/local. So if mapred.local.dir is not set, hadoop.tmp.dir must have enough space to hold job’s intermediate data. If the node doesn’t have enough temporary space the task attempt will fail and starts a new attempt, thus impacting the performance.

14

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 15: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Java- JVM

JVM tuning: Besides normal java code optimizations, JVM settings for each child task

also affects the processing time. On slave node end, the task tracker and data node use 1 GB RAM each. Effective use of the remaining RAM as well as choosing the right GC mechanism for each Map or Reduce task is very important for maximum utilization of hardware resources. The default max RAM for child tasks is 200MB which might be insufficient for many production grade jobs. The JVM settings for child tasks are governed by mapred.child.java.opts property.

Use JDK 1.6 64 BIT– + +XX:CompressedOops helpful in dealing with OOM errors

Do remember changing Linux open file descriptor Set java.net.preferIPv4Stack set to true, to avoid timeouts in cases where

the OS/JVM picks up an IPv6 address and must resolve the hostname.

15

Page 16: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Logging

Is a friend to developers, Foe in production Default - INFO level

dfs.namenode.logging.level hadoop.job.history hadoop.logfile.size/count

16

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 17: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Static Data strategies

Available Approaches JobConf.set(“key”,”value”) Distributed cache HDFS shared file

Suggested approaches if above ones not efficient Memcached Tokyocabinet/TokyoTyrant Berkley DB HBase

17

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 18: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Debugging and profiling- Arun C Murthy

Hadoop Map-Reduce – Tuning and Debugging- from Arun C Murthy presentation

Debugging Log files/UI view Local runner Single machine mode Set keep.failed.task.files to true and use the IsolationRunner

Profiling Set mapred.task.profile to true Use mapred.task.profile.{maps|reduces} hprof support is built-in Use mapred.task.profile.params to set options for the debugger Possibly DistributedCache for the profiler’s agent

18

Page 19: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Tuning - Arun C MurthyHadoop Map-Reduce – Tuning and Debugging- from Arun C Murthy

presentation Tuning

Tell HDFS and Map-Reduce about your network! – Rack locality script: topology.script.file.name Number of maps – Data locality Number of reduces – You don’t need a single output file!Log files/UI view Amount of data processed per Map - Consider fatter maps, Custom input format Combiner - multi-level combiners at both Map and Reduce Check to ensure the combiner is useful! Map-side sort -io.sort.mb, io.sort.factor, io.sort.record.percent, io.sort.spill.percent Shuffle

Compression for map-outputs – mapred.compress.map.output , mapred.map.output.compression.codec , lzo via libhadoop.so, tasktracker.http.threads

mapred.reduce.parallel.copies, mapred.reduce.copy.backoff, mapred.job.shuffle.input.buffer.percent, mapred.job.shuffle.merge.percent, mapred.inmem.merge.threshold, mapred.job.reduce.input.buffer.percent

Compress the job output Miscellaneous -Speculative execution, Heap size for the child, Re-use jvm for maps/reduces, Raw

Comparators 19

Page 20: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Next steps

Hadoop Vaidya (since 0.20.0) Job configuration analyzer (WIP-to be contributed

back to Hadoop) Part of Analyze Job web ui Analyze and suggest config parameters from job.xml Smart suggestion engine/auto-correction

20

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 21: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Conclusion

Performance of Hadoop MapReduce jobs can be improved without increasing the hardware costs, by tuning several key configuration parameters for cluster specifications, input data size and processing complexity.

21

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj

Page 22: Advanced Hadoop Tuning and Optimization - Hadoop Consulting

References

Hadoop.apache.org Hadoop-performance tuning--white paper v1

1.pdf – Arun C Murthy Intel_White_Paper_Optimizing_Hadoop_Deploym

ents.pdf

22

Download the Whitepaper: Deriving Intelligence from Large Data Using Hadoop and Applying Analytics at http://bit.ly/cNCCGj