SE252:Lecture 12, Feb 18 ILO3:3:Algorithms and...

Post on 03-Apr-2018

229 views 8 download

Transcript of SE252:Lecture 12, Feb 18 ILO3:3:Algorithms and...

DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINESdream-lab.in | Indian Institute of Science, BangaloreDISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINESdream-lab.in | Indian Institute of Science, Bangalore

DREAM:LabDREAM:Lab

©DREAM:Lab, 2014This work is licensed under a Creative Commons Attribution 4.0 International License

DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINESdream-lab.in | Indian Institute of Science, Bangalore

DREAM:Lab

SE252:Lecture 13-14, Feb 24/25ILO3:Algorithms and Programming

Patterns for Cloud Applications (Hadoop)

DREAM:Lab

DREAM:LabDREAM:LabDREAM:Lab

ILO 3

DREAM:LabDREAM:LabDREAM:Lab

Patterns & Technologies

DREAM:LabDREAM:LabDREAM:Lab

MapReduce

DREAM:LabDREAM:LabDREAM:Lab

MapReduce Design Pattern

DREAM:LabDREAM:LabDREAM:Lab

MapReduce: Data-parallel Programming Model

map(ki, vi) List<km, vm>[]••

reduce(km, List<vm>[]) List<kr, vr>[]••

DREAM:LabDREAM:LabDREAM:Lab

MR Borrows from Functional Programming

••

MapReduce & MPI Scatter-Gather

DREAM:LabDREAM:LabDREAM:Lab

MapReduce: Programming Model

How nowBrown cow

How doesIt work now

brown 1cow 1does 1How 2it 1now 2work 1

<How,1><now,1><brown,1><cow,1><How,1><does,1><it,1><work,1><now,1>

<How,1 1><now,1 1><brown,1><cow,1><does,1><it,1><work,1>

Map(k1,v1) → list(k2,v2)Reduce(k2, list(v2)) → list(v2)

DREAM:LabDREAM:LabDREAM:Lab

Map

DREAM:LabDREAM:LabDREAM:Lab

Map

map(String input_key, String input_value):

// input_key: line number

// input_value: line of text

for each Word w in input_value.tokenize()

EmitIntermediate(w, "1");

(0, “How now brown cow”) →

[(“How”, 1), (“now”, 1), (“brown”, 1), (“cow”, 1)]

DREAM:LabDREAM:LabDREAM:Lab

let map(k, v) = emit(k.toUpper(), v.toUpper())

(“foo”, “bar”) → (“FOO”, “BAR”)

(“Foo”, “other”) → (“FOO”, “OTHER”)

(“key2”, “data”) → (“KEY2”, “DATA”)

let map(k, v) =

if (isPrime(v)) then emit(k, v)

(“foo”, 7) → (“foo”, 7)

(“test”, 10) → (nothing)

DREAM:LabDREAM:LabDREAM:Lab

Reduce

DREAM:LabDREAM:LabDREAM:Lab

Reduce

reduce(String output_key, Iterator intermediate_values)

// output_key: a word

// output_values: a list of counts

int sum = 0;

for each v in intermediate_values

sum += ParseInt(v);

Emit(output_key, AsString(sum));

(“A”, [1, 1, 1]) → (“A”, 3)

(“B”, [1, 1]) → (“B”, 2)

for each w

in value do

emit(w,1)

How nowBrown cow

How doesIt work now

for all w in

value do

emit(w,1)

<How,1><now,1><brown,1><cow,1>

<How,1><does,1><it,1><work,1><now,1>

<How,1 1><now,1 1>

<brown,1><cow,1>

<does,1><it,1><work,1>

How 2now 2

does 1it 1work 1

brown 1cow 1

sum =

sum + value

emit(key,sum)

DREAM:LabDREAM:LabDREAM:Lab

Anagram Example

public class AnagramMapper extends MapReduceBase implementsMapper<LongWritable, Text, Text, Text> {

private Text sortedText = new Text();private Text orginalText = new Text(); public void map(LongWritable key, Text value,

OutputCollector<Text, Text> outputCollector, Reporter reporter) {

String word = value.toString();char[] wordChars = word.toCharArray();Arrays.sort(wordChars);String sortedWord = new String(wordChars);sortedText.set(sortedWord);orginalText.set(word);// Sort word and emit <sorted word, word>outputCollector.collect(sortedText, orginalText);

}}

Anagram Example…public void reduce(Text anagramKey, Iterator<Text> anagramValues,

OutputCollector<Text, Text> results, Reporter reporter) {String output = "";while(anagramValues.hasNext()) {

Text anagram = anagramValues.next();output = output + anagram.toString() + "~";

}StringTokenizer outputTokenizer =

new StringTokenizer(output,"~");// if the values contain more than one word // we have spotted a anagram.if(outputTokenizer.countTokens()>=2) {

output = output.replace("~", ",");outputKey.set(anagramKey.toString());outputValue.set(output);results.collect(outputKey, outputValue);

}}

DREAM:LabDREAM:LabDREAM:Lab

5-min Assignment

DREAM:LabDREAM:LabDREAM:Lab

MapReduce for Histogram

int bucketWidth = 4Map(k, v) {

emit(v/bucketWidth, 1)}

Reduce(k, v[]){sum=0;foreach(w in v[]) sum++;emit(k, sum)

}

7296025

21103540

111162181

246810110

1,10,12,11,10,10,11,1

0,10,12,10,11,11,10,1

2,12,11,10,10,12,10,1

0,11,11,12,12,12,10,1

2,12,12,12,12,12,12,12,1

0,10,10,10,10,10,1

1,11,11,11,11,11,11,11,1

0,10,10,10,10,10,1

2,8 0,12 1,8

DREAM:LabDREAM:LabDREAM:Lab

MapReduce for Histogramint bucketWidth = 4Map(k, v) {

emit(v/bucketWidth, 1)}

Combine(k, v[]) {// same code as reducer

}

Reduce(k, v[]){sum=0;foreach(w in v[]) sum+=w;emit(k, sum)

}

8296025

21103540

111162181

246810110

2,10,12,11,1

0,10,12,10,1

2,12,11,10,1

0,11,11,12,1

0,10,11,1

1,11,10,1

0,12,10,1

2,12,10,1

2,3 1,4 0,7 2,6 1,3 0,5

2,12,12,1

1,11,11,11,1

0,10,10,10,1

0,10,10,1

2,12,12,1

1,11,11,1

0,10,10,10,10,1

2,12,12,1

2,32,6

1,41,3

0,70,5

2,9 1,7 0,12

DREAM:LabDREAM:LabDREAM:Lab

Hadoop Execution Model

DREAM:LabDREAM:LabDREAM:Lab

Hadoop MapReduce & HDFS

DREAM:LabDREAM:LabDREAM:Lab

HDFS Read/Write

DREAM:LabDREAM:LabDREAM:Lab

Scheduling a MR Job

DREAM:LabDREAM:LabDREAM:Lab

MapReduce w/ 1 & N Reducers

DREAM:LabDREAM:LabDREAM:Lab

Map only job

DREAM:LabDREAM:LabDREAM:Lab

Pipelining during Shuffle & Sort

DREAM:LabDREAM:LabDREAM:Lab

Sorting using MapReduce (Map Only)

DREAM:LabDREAM:LabDREAM:Lab

Sorting using MapReduce

MapReduce Execution Overview

DREAM:LabDREAM:LabDREAM:Lab

MapReduce Execution Overview

DREAM:LabDREAM:LabDREAM:Lab

MapReduce Execution Overview

DREAM:LabDREAM:LabDREAM:Lab

MapReduce Resources

DREAM:LabDREAM:LabDREAM:Lab

MapReduce Resources

DREAM:LabDREAM:LabDREAM:Lab

MapReduce Execution Overview

DREAM:LabDREAM:LabDREAM:Lab

MapReduce Execution Overview

DREAM:LabDREAM:LabDREAM:Lab

MapReduce Execution Overview

DREAM:LabDREAM:LabDREAM:Lab

MapReduce Execution Overview

DREAM:LabDREAM:LabDREAM:Lab

Locality

DREAM:LabDREAM:LabDREAM:Lab

Fault Tolerance

DREAM:LabDREAM:LabDREAM:Lab

Optimizations

DREAM:LabDREAM:LabDREAM:Lab

Reminder