Hadoop HP Day2

hPot-Tech

1 Map reduce Programming

The Configuration API

Components in Hadoop are configured using Hadoops own configuration API. org.apache.hadoop.conf.Configuration : - represents a collection of configuration properties

and their values. Each property is named by a String, and the type of a value may be one of several types

o including Java primitives such as boolean, int, long, and float o other useful types such as String, Class, and java.io.File, and collections of Strings

hPot-Tech


The Configuration API..

hPot-Tech


Tool implementation :

hPot-Tech


Packaging a Job

a jobs classes must be packaged into a job JAR file to send to the cluster Any dependent JAR files can be packaged in a lib subdirectory in the job JAR file.

The client classpath

The users client-side classpath set by hadoop jar is made up of: The job JAR file Any JAR files in the lib directory of the job JAR file, and the classes directory. The classpath defined by HADOOP_CLASSPATH, if set

hPot-Tech


Launching a Job

To launch the job, we need to run the driver, specifying the cluster that we want to run the job on with the -conf option

hPot-Tech


The Job output..

hPot-Tech


The MapReduce Web UI.

A web UI for viewing information about your jobs. useful for

o following a jobs progress while it is running o finding job statistics and logs after the job has completed.

http://jobtracker-host:50030/.

hPot-Tech


The jobtracker page

hPot-Tech


The job page

hPot-Tech


Map Reduce Programming

hPot-Tech


The MapReduce Approach

Shared memory approach (OpenMP, MPI, ...)

I Developer needs to take care of (almost) everything I Synchronization, Concurrency I Resource allocation

MapReduce: a shared nothing approach

I Most of the above issues are taken care of I Problem decomposition and sharing partial results need particular attention I Optimizations (memory and network consumption) are tricky

hPot-Tech


Functional Programming Roots

Key feature: higher order functions I Functions that accept other functions as arguments I Map and Fold

Figure: Illustration of map and fold.

hPot-Tech



map phase: Given a list, map takes as an argument a function f (that takes a

single argument) and applies it to all element in a list

fold phase: Given a list, fold takes as arguments a function g (that takes two

arguments) and an initial value g is first applied to the initial value and the first item in the list The result is stored in an intermediate variable, which is used as an

input together with the next item to a second application of g The process is repeated until all items in the list have been

Consumed

hPot-Tech



We can view map as a transformation over a dataset This transformation is specified by the function f Each functional application happens in isolation The application of f to each element of a dataset can be parallelized in a

straightforward manner

We can view fold as an aggregation operation The aggregation is defined by the function g Data locality: elements in the list must be brought together If we can group element of the list, also the fold phase can proceed in parallel

Associative and commutative operations Allow performance gains through local aggregation and reordering

hPot-Tech


Functional Programming and MapReduce

Equivalence of MapReduce and Functional Programming: The map of MapReduce corresponds to the map operation The reduce of MapReduce corresponds to the fold operation

The framework coordinates the map and reduce phases: How intermediate results are grouped for the reduce to happen in parallel

In practice: User-specified computation is applied (in parallel) to all input records of a dataset

Intermediate results are aggregated by another user-specified Computation

hPot-Tech


Mappers and Reducers

hPot-Tech


Data Structures

Key-value pairs are the basic data structure in MapReduce

Keys and values can be: integers, float, strings, raw bytes They can also be arbitrary data structures

The design of MapReduce algorithms involes: Imposing the key-value structure on arbitrary datasets

o E.g.: for a collection of Web pages, input keys may be URLs and values may be the HTML content

In some algorithms, input keys are not used, in others they uniquely identify a record

Keys can be combined in complex ways to design various algorithms

hPot-Tech


A MapReduce job

The programmer defines a mapper and a reducer as follows2:

o map: (k1; v1) ! [(k2; v2)] o reduce: (k2; [v2]) ! [(k3; v3)]

A MapReduce job consists in:

o A dataset stored on the underlying distributed filesystem, which is split in a number of files across machines

o The mapper is applied to every input key-value pair to generate intermediate key-value pairs

o The reducer is applied to all values associated with the same intermediate key to generate output key-value pairs

hPot-Tech


Where the magic happens

Implicit between the map and reduce phases is a distributed group by operation on intermediate keys

Intermediate data arrive at each reducer in order, sorted by the key No ordering is guaranteed across reducers

Output keys from reducers are written back to the distributed filesystem

The output may consist of r distinct files, where r is the number of reducers Such output may be the input to a subsequent MapReduce phase

Intermediate keys are transient:

They are not stored on the distributed filesystem They are spilled to the local disk of each machine in the cluster

hPot-Tech


A Simplified view of MapReduce

Figure: Mappers are applied to all input key-value pairs, to generate an arbitrary number of intermediate pairs. Reducers are applied to all intermediate values associated with the same intermediate key. Between the map and reduce phase lies a barrier that involves a large distributed sort and group by

hPot-Tech


hPot-Tech


Hello World in MapReduce

Input: Key-value pairs: (docid, doc) stored on the distributed filesystem docid: unique identifier of a document doc: is the text of the document itself

Mapper:

Takes an input key-value pair, tokenize the document Emits intermediate key-value pairs: the word is the key and the integer is the value

The framework: Guarantees all values associated with the same key (the word) are brought to the

same reducer

The reducer: Receives all values associated to some keys Sums the values and writes output key-value pairs: the key is the word and the value

is the number of occurrences

hPot-Tech


Implementation and Execution Details

The partitioner is in charge of assigning intermediate keys (words) to reducers

Note that the partitioner can be customized

How many map and reduce tasks?

The framework essentially takes care of map tasks The designer/developer takes care of reduce tasks

hPot-Tech


Restrictions

Using external resources

E.g.: Other data stores than the distributed file system Concurrent access by many map/reduce tasks

Side effects Not allowed in functional programming E.g.: preserving state across multiple inputs State is kept internal

I/O and execution External side effects using distributed data stores (e.g. BigTable) No input (e.g. computing _), no reducers, never no mappers

hPot-Tech


The Execution Framework

hPot-Tech


The Execution Framework

MapReduce program, a.k.a. a job:

Code of mappers and reducers Code for combiners and partitioners (optional) Configuration parameters All packaged together

A MapReduce job is submitted to the cluster

The framework takes care of eveything else

hPot-Tech


Tutorial: Map Reduce

hPot-Tech


hPot-Tech


Debugging a Job

The web UI (debug statement to log to standard error) custom counter

hPot-Tech


Add debugging to the mapper:

hPot-Tech


The tasks page

hPot-Tech


The task details page

hPot-Tech


Hadoop Logs

hPot-Tech


Anything written to standard output or standard error is directed to the relevant log file.

hPot-Tech


Remote Debugging

debugger is hard to arrange when running the job on a cluster options :

o Reproduce the failure locally o Use JVM debugging options o Use task profiling o Use IsolationRunner

set keep.failed.task.files to true to keep a failed tasks files.

hPot-Tech


Tuning a Job

hPot-Tech


Job Submission

JobClient class

The runJob() method creates a new instance of a JobClient Then it calls the submitJob() on this class

Simple verifications on the Job

Is there an output directory? Are there any input splits? Can I copy the JAR of the job to HDFS?

NOTE: the JAR of the job is replicated 10 times

hPot-Tech


MapReduce Workflows

o When the processing gets more complex : o As a rule of thumb, think about adding more jobs, rather than adding complexity to jobs.

o For more complex problems, o Consider a higher-level language than Map-Reduce, such as Pig, Hive, Cascading,

Cascalog, or Crunch. o One immediate benefit is that it frees you from the translation into MapReduce jobs,

allowing you to concentrate on the analysis you are performing.

hPot-Tech


JobControl:

When there is more than one job in a MapReduce workflow : For a linear chain, the simplest approach is to run each job one after another :

For anything more complex than a linear chain,

o org.apache.hadoop.mapreduce.jobcontrol.JobControl : o represents a graph of jobs to be run. o add the job configurations, o tell the JobControl instance the dependencies between jobs. o run the JobControl in a thread, and it runs the jobs in dependency order. o can poll for progress, and when the jobs have finished, you can query for all the jobs

statuses and the associated errors for any failures. o If a job fails, JobControl wont run its dependencies.

hPot-Tech


Advance MapReduce How Map reduce works?

hPot-Tech


Classic MapReduce

hPot-Tech


Failures

Major benefits of using Hadoop is its ability to handle failures and allow job to complete. Task failure:

When user code in the map or reduce task throws a runtime exception. The error ultimately makes it into the user logs. Hanging tasks are dealt with differently : mapred.task.timeout When the jobtracker is notified of a task attempt that has failed (by the tasktrackers heartbeat call), it will reschedule execution of the task. The jobtracker will try to avoid rescheduling the task on a tasktracker where it has previously

failed

hPot-Tech


Failures Tasktracker failure :

The jobtracker will notice a tasktracker that has stopped sending heartbeats if it hasnt received one for 10 minutes (configured via the mapred.task tracker.expiry.interval property, in milliseconds)

And remove it from its pool of tasktrackers to schedule tasks on.

Jobtracker failure

Failure of the jobtracker is the most serious failure mode. Hadoop has no mechanism for dealing with jobtracker failureit is a single point of failure

so in this case all running jobs fail. After restarting a jobtracker, any jobs that were running at the time it was stopped will need to

be resubmitted

hPot-Tech


Partitioners and Combiners

hPot-Tech


Partitioners

Partitioners are responsible for:

Dividing up the intermediate key space Assigning intermediate key-value pairs to reducers Specify the task to which an intermediate key-value pair must be copied

Hash-based partitioner

Computes the hash of the key modulo the number of reducers r This ensures a roughly even partitioning of the key space

o However, it ignores values: this can cause imbalance in the data processed by each reducer

When dealing with complex keys, even the base partitioner may need customization

hPot-Tech


Combiners

Combiners are an (optional) optimization:

Allow local aggregation before the shuffle and sort phase Each combiner operates in isolation

Essentially, combiners are used to save bandwidth

E.g.: word count program

Combiners can be implemented using local data-structures

E.g., an associative array keeps intermediate computations and aggregation thereof The map function only emits once all input records (even all input splits) are

processed

hPot-Tech


Partitioners and Combiners, an Illustration

Note: in Hadoop, partitioners are executed before combiners

hPot-Tech


hPot-Tech


Lab : Combiner & Partitioners

hPot-Tech


MRUnit Map Reduce Unit Testing. The map and reduce functions in MapReduce are easy to test in isolation

MRUnit :

a testing library that makes easy to pass known inputs to a mapper or a reducer and check that the outputs are as expected.

used in conjunction with a standard test execution framework, such as JUnit.

hPot-Tech


Mapper

hPot-Tech


Reducer

hPot-Tech


Tutorial : MRUnit.

hPot-Tech


hPot-Tech


Hadoop MapReduce Types and Formats

hPot-Tech


MapReduce Types Input / output to mappers and reducers

a. map: (k1; v1) ! [(k2; v2)] b. reduce: (k2; [v2]) ! [(k3; v3)]

In Hadoop, a mapper is created as follows: a. void map(K1 key, V1 value, OutputCollector output, Reporter reporter) b.

Types: a. K types implement WritableComparable b. V types implement Writable

hPot-Tech


What is a Writable

Hadoop defines its own classes for strings (Text), integers (intWritable), etc...

All keys are instances of WritableComparable o Why comparable?

All values are instances of Writable

hPot-Tech


hPot-Tech


Reading Data

Datasets are specified by InputFormats

I InputFormats define input data (e.g. a file, a directory) I InputFormats is a factory for RecordReader objects to extract key-value records from the input source

InputFormats identify partitions of the data that form an InputSplit

InputSplit is a (reference to a) chunk of the input processed by a single map o Largest split is processed first

Each split is divided into records, and the map processes each record (a key-value pair) in turn

Splits and records are logical, they are not physically bound to a file

hPot-Tech


The relationship between InputSplit and HDFS blocks

hPot-Tech


FileInputFormat and Friends

TextInputFormat Traeats each newline-terminated line of a file as a value

KeyValueTextInputFormat Maps newline-terminated text lines of key SEPARATOR value

SequenceFileInputFormat Binary file of key-value pairs with some additional metadata

SequenceFileAsTextInputFormat Same as before but, maps (k.toString(), v.toString())

hPot-Tech


Filtering File Inputs

FileInputFormat reads all files out of a specified directory and send them to the mapper

Delegates filtering this file list to a method subclasses may override

Example: create your own xyzFileInputFormat to read *.xyz from a directory list

hPot-Tech


Record Readers

Each InputFormat provides its own RecordReader implementation

LineRecordReader

Reads a line from a text file

KeyValueRecordReader

Used by KeyValueTextInputFormat

hPot-Tech


Input Split Size

hPot-Tech


Sending Data to Reducers

Map function receives OutputCollector object

OutputCollector.collect() receives key-value elements

Any (WritableComparable, Writable) can be used By defalut, mapper output type assumed to be the same as the reducer output type

hPot-Tech


WritableComparator

Compares WritableComparable data

Will call the WritableComparable.compare() method Can provide fast path for serialized data

Configured through:

JobConf.setOutputValueGroupingComparator()

hPot-Tech


Partitioner

int getPartition(key, value, numPartitions)

Outputs the partition number for a given key One partition == all values sent to a single reduce task

HasPartitioner used by default Uses key.hashCode() to return partion number

JobConf used to set Partitioner implementation

hPot-Tech


The Reducer

void reduce(k2 key, Iterator values,OutputCollector output, Reporter reporter )

Keys and values sent to one partition all go to the same reduce task

Calls are sorted by key Early keys are reduced and output before late keys

hPot-Tech


Writing the Output

hPot-Tech


Writing the Output

Analogous to InputFormat

TextOutputFormat writes key value strings to output file

SequenceFileOutputFormat uses a binary format to pack key-value pairs

NullOutputFormat discards output

hPot-Tech


Lab :- Input and Output

hPot-Tech


Map Side and Reduce Side Joins

hPot-Tech


Joins

MapReduce can perform joins between large datasets

hPot-Tech


Join:-

performed by the mapper, it is called a map-side join

performed by the reducer it is called a reduce-side join.

hPot-Tech


Map-Side Joins A map-side join between large inputs works by

performing the join before the data reaches the map function.

The inputs to each map must be partitioned and sorted in a particular way.

Each input dataset must be divided into the same number of partitions, and it must be sorted by the same key (the join key) in each source.

All the records for a particular key must reside in the same partition.

hPot-Tech


Reduce-Side Joins A reduce-side join is more general than a map-

side join the input datasets dont have to be structured in

any particular way the mapper tags each record with its source and

uses the join key as the map output key, so that the records with the same key are brought together in the reducer.

hPot-Tech


Lab : Map Side Join.

Managing a Hadoop Cluster

Hadoop Cluster Component

NameNode: Manages the namespace, file

system metadata, and access control. There is

exactly one NameNode in each cluster.

SecondaryNameNode: Downloads SecondaryNameNode: Downloads

periodic checkpoints from the NameNode for

fault-tolerance. There is exactly one

SecondaryNameNode in each cluster.


JobTracker: Hands out tasks to the slave nodes. There is exactly one JobTracker in each cluster.

DataNode: Holds file system data; each data node manages its own locally-attached storage node manages its own locally-attached storage (i.e., the node's hard disk) and stores a copy of some or all blocks in the file system. There are one or more DataNodes in each cluster. If your cluster has only one DataNode then file system data cannot be replicated.


TaskTracker: Slaves that carry

out map and reduce tasks. There are one or

more TaskTrackers in each cluster.

HDFS Architecture

Namenode

Datanodes Datanodes

Client

Read

Metadata opsMetadata(Name, replicas..)(/home/foo/data,6. ..

Block ops

3/3/2013 5

Breplication

Rack1 Rack2

Client

Blocks

Write

Platform requirements for Hadoop

Java Requirements

Hadoop is a Java-based system. Recent versions of

Hadoop require Sun Java 1.6.

Operating System Operating System

As Hadoop is written in Java, it is mostly portable

between different operating systems

Downloading and Installing Hadoop

Topology of a typical Hadoop cluster .

Installation Steps

Installed java

ssh and sshd

gunzip hadoop-0.18.0.tar.gz

Or tar vxf hadoop-0.18.0.tar Or tar vxf hadoop-0.18.0.tar

Set JAVA_HOME in conf/hadoop-env.sh

Modified hadoop-site.xml

Hadoop Installation Flavors

Standalone

Pseudo-distributed

Hadoop clusters of multiple nodes

Additional Configuration

conf/masters

contains the hostname of the SecondaryNameNode

It should be fully-qualified domain name.

conf/slaves conf/slaves

the hostname of every machine in the cluster which

should start TaskTracker and DataNode daemons

Ex:slave01

slave02

slave03

Advance Configuration

enable passwordless ssh

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

The ~/.ssh/id_dsa.pub and authorized_keys

files should be replicated on all machines in

the cluster.

Advance Configuration

Various directories should be created on each

node

The NameNode requires the NameNode metadata

directorydirectory

$ mkdir -p /home/hadoop/dfs/name

Every node needs the Hadoop tmp directory and

DataNode directory created

Advance Configuration..

bin/slaves.sh allows a command to be

executed on all nodes in the slaves file. $ mkdir -p /tmp/hadoop

$ export HADOOP_CONF_DIR=${HADOOP_HOME}/conf

$ export HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves

$ ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /tmp/hadoop"$ ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /tmp/hadoop"

$ ${HADOOP_HOME}/bin/slaves.sh "mkdir -p /home/hadoop/dfs/data

Format HDFS

$ bin/hadoop namenode -format

start the cluster:

$ bin/start-all.sh

Important Directories

Directory Description Default location Suggested location

HADOOP_LOG_DIROutput location for log files

from daemons${HADOOP_HOME}/logs /var/log/hadoop

hadoop.tmp.dirA base for other temporary

directories/tmp/hadoop-${user.name} /tmp/hadoop

dfs.name.dirWhere the NameNode

metadata should be stored${hadoop.tmp.dir}/dfs/name /home/hadoop/dfs/name

dfs.data.dirWhere DataNodes store their

blocks${hadoop.tmp.dir}/dfs/data /home/hadoop/dfs/data

mapred.system.dirThe in-HDFS path to shared

MapReduce system files

${hadoop.tmp.dir}/mapred/sy

stem/hadoop/mapred/system

Recommended configuration

dfs.name.dir and dfs.data.dir be moved out

from hadoop.tmp.dir.

Adjust mapred.system.dir

Selecting Machines

Hadoop is designed to take advantage of

whatever hardware is available

Hadoop jobs written in Java can consume

between 1 and 2 GB of RAM per corebetween 1 and 2 GB of RAM per core

If you use HadoopStreaming to write your jobs

in a scripting language such as Python, more

memory may be advisable.

Cluster Configurations

Small Clusters: 2-10 Nodes

Medium Clusters: 10-40 Nodes

Large Clusters: Multiple Racks

Small Clusters: 2-10 Nodes

In two nodes,

one node: NameNode/JobTracker and a

DataNode/TaskTracker;

the other node: DataNode/TaskTracker. the other node: DataNode/TaskTracker.

Clusters of three or more machines typically

use a dedicated NameNode/JobTracker, and

all other nodes are workers.

configuration in conf/hadoop-site.xml

mapred.job.trackerhead.server.node.com:9001

fs.default.name

hdfs://head.server.node.com:9000

hadoop.tmp.dir/tmp/hadooptrue

mapred.system.dir/hadoop/mapred/systemtrue

ue>

dfs.data.dir/home/hadoop/dfs/datatrue

dfs.name.dir/home/hadoop/dfs/nametrue

dfs.replication2

Medium Clusters: 10-40 Nodes

The single point of failure in a Hadoop cluster

is the NameNode

Hence, back up the NameNode metadata.

One machine in the cluster should be designated One machine in the cluster should be designated

as the NameNode's backup

It does not run the normal Hadoop daemons

it exposes a directory via NFS which is only

mounted on the NameNode

NameNodes backup

The cluster's hadoop-site.xml file should then

instruct the NameNode to write to this

directory as well:

dfs.name.dir

/home/hadoop/dfs/name,/mnt/namenode-backup

true

Backup NameNode

the backup machine can be used for is to

serve as the SecondaryNameNode

this is not a failover NameNode process

It takes periodic snapshots of its metadata It takes periodic snapshots of its metadata

conf/hadoop-site.xml

Nodes must be decommissioned on a schedule that permits replication of blocks being decommissioned.

conf/hadoop-site.xml

dfs.hosts.exclude/home/hadoop/excludes/home/hadoop/excludestrue

mapred.hosts.exclude/home/hadoop/excludestrue

create an empty file with this name: $ touch /home/hadoop/excludes

Replication Setting

dfs.replication

3

Disk & heap

dfs.datanode.du.reserved

1073741824

true

mapred.child.java.opts

-Xmx512m

Using multiple drives per machine

DataNodes can be configured to write blocks

out to multiple disks via the dfs.data.dir

property.

dfs.data.dirdfs.data.dir

/d1/dfs/data,/d2/dfs/data,/d3/dfs/data,/d4/dfs/data

true

Using multiple drives per machine..

mapred.local.dir

/d1/mapred/local,/d2/mapred/local,

/d3/mapred/local,/d4/mapred/local/d3/mapred/local,/d4/mapred/local

true

Tutorial

Configure Hadoop Cluster in two nodes.

Tutorial-Installed Hadoop in Cluster.docx


possibility of rack failure now exists

operational racks should be able to continue

even if entire other racks are disabled

the amount of metadata under the care of the the amount of metadata under the care of the

NameNode increases


The NameNode is responsible for managing metadata associated with each block in the HDFS

the amount of information in the rack scales the amount of information in the rack scales into the 10's or 100's of TB

dfs.block.size134217728


The NFS-mounted write-through backup

should be placed in a different rack from the

NameNode.

The SecondaryNameNode should be The SecondaryNameNode should be

instantiated on a separate rack


dfs.namenode.handler.count

40

mapred.job.tracker.handler.count

40


Property Range Description

io.file.buffer.size 32768-131072

Read/write buffer size used in

SequenceFiles (should be in multiples of

the hardware page size)

io.sort.factor 50-200Number of streams to merge concurrently

when sorting files during shuffling

io.sort.mb 50-200Amount of memory to use while sorting

datadata

mapred.reduce.parallel.copies 20-50

Number of concurrent connections a

reducer should use when fetching its input

from mappers

tasktracker.http.threads 40-50

Number of threads each TaskTracker uses

to provide intermediate map output to

reducers

mapred.tasktracker.map.tasks.maximum 1/2 * (cores/node) to 2 * (cores/node)Number of map tasks to deploy on each

machine.

mapred.tasktracker.reduce.tasks.maximum 1/2 * (cores/node) to 2 * (cores/node)Number of reduce tasks to deploy on each

machine.

Hadoop HP Day2

Documents

Transcript of Hadoop HP Day2