2014 International Software Testing Conference in Seoul

Post on 20-Aug-2015

647 views 2 download

Tags:

Transcript of 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Testing Big Data: Unit Test in Hadoop (Part II)

Jongwook Woo (PhD)

High-Performance Internet Computing Center (HiPIC)Educational Partner with Cloudera and Grants Awardee of Amazon AWS

Computer Information Systems DepartmentCalifornia State University, Los Angeles

Seoul Software Testing Conference

Contents

Test in GeneralUse Cases: Big Data in Hadoop and EcosystemsUnit Test in Hadoop

Seoul Software Testing Conference

Test in general

Quality Assur-ance

TDD (Test Driven Devel-opment)

Unit TestTest func-tional units of the S/W

BDD (Behavior Driven Devel-opment)

Based on TDDTest behavior of the S/W

Integration Test: integrated components

Group of unit tests

CI (Continuous Integration) Server

Hudson, Jenkins etc

Seoul Software Testing Conference

CI Server

Continuous Inte-gration Server

TDD (Test Driven Devel-opment) based

All developers commit the update every-dayCI server com-pile and run the unit testsIf a test fails, all receive the failure email

Know who committed a bad code

Hudson, Jenkins etc

Supports SCM version control tools

CVS, Subver-sion, Git

Seoul Software Testing Conference

Test in Hadoop

Much harderJUnit cannot be used in HadoopClusterServerParallel Comput-ing

Seoul Software Testing Conference

Use Cases: Shopzilla

Hadoop’s Ele-phant In The Room

Hadoop testingQuality Assur-ance

Unit Test: functional units of the S/WIntegration Test: inte-grated com-ponentsBDD Test: Behavior of the S/W

Augmented Development

Use a dev cluster?

Too long per day

Hadoop-In-A-Box

Seoul Software Testing Conference

Use Cases: Shopzilla

Hadoop-In-A-Box

Fully compatible Mock Environ-ment

Without a clus-terMock cluster state

Test LocallySingle Node Pseudo ClusterMiniMRCluster=> can test HDFS, Pig

Seoul Software Testing Conference

Use Cases: Yahoo

DeveloperWants to run Hadoop codes in the local ma-chine

Does not want to run Hadoop codes at the Hadoop cluster

Yahoo HITHadoop Integra-tion TestRun Hadoop tests in the Hadoop Ecosys-tems

Deploy HIT on a Hadoop sin-gle or clusterRun tests in Hadoop, Pig, Hive, Oozie,…

Seoul Software Testing Conference

Unit Test in Hadoop

MRUnit testing framework

is based on JU-nit Cloudera do-nated to Apachecan test Map Reduce pro-grams

written on 0.20 , 0.23.x , 1.0.x , 2.x version of Hadoop

Can test Map-per, Reducer, MapperReducer

Seoul Software Testing Conference

Unit Test in Hadoop

WordCount Ex-ample

reads text files and counts how often words oc-cur.

The input and the output are text files,

Need three classes

WordCount.-java

Driver class with main function

WordMapper.-java

Mapper class with map method

SumReducer.-java

Reducer class with reduce method

Seoul Software Testing Conference

WordCount Example

WordMapper.-java

Mapper class with map func-tionFor the given sample input

assuming two map nodes

The sample input is dis-tributed to the maps

the first map emits:

<Hello, 1> <World, 1> <Bye, 1> <World, 1>

The second map emits:

<Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1>

Seoul Software Testing Conference

WordCount Example

SumReducer.javaReducer class with reduce functionFor the input from two Map-pers

the reduce method just sums up the values,

which are the occur-rence counts for each key

Thus the out-put of the job is:

<Bye, 1> <Goodbye, 1> <Hadoop, 2> <Hello, 2> <World, 2>

Seoul Software Testing Conference

WordCount.java (Driver)

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.-FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.Tex-tInputFormat;import org.apache.hadoop.mapreduce.lib.output.-FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.-TextOutputFormat;public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Check Input and Output files

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set output (key, value) types

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set Mapper/Reducer classes

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set Input/Output format classes

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set Input/Output paths

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set Driver class

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Submit the job to the master node

Seoul Software Testing Conference

WordMapper.java (Mapper class)

import java.io.IOException;import java.util.StringTokenizer; import org.apache.hadoop.io.In-tWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapre-duce.Mapper;

public class WordMapper extends Mapper<Object, Text, Text, In-tWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for process-ing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Extends mapper class with input/output keys and values

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Output (key, value) types

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Input (key, value) typesOutput as Context type

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Read words from each line of the input file

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Count each word

Seoul Software Testing Conference

Shuffler/Sorter

Maps emit (key, value) pairsShuffler/Sorter of Hadoop framework

Sort (key, value) pairs by keyThen, append the value to make (key, list of values) pairFor example,

The first, sec-ond maps emit:

<Hello, 1> <World, 1> <Bye, 1> <World, 1> <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1>

Shuffler pro-duces and it becomes the input of the reducer

<Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>>, <-World, <1,1>>

Seoul Software Testing Conference

SumReducer.java (Reducer class)

import java.io.IOException;import java.util.Iterator; import org.apache.hadoop.io.In-tWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapre-duce.Reducer; public class SumReducer extends Re-ducer<Text, IntWritable, Text, In-tWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, Interrupt-edException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); }}

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

Extends Reducer class with input/output keys and values

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

Set output value type

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

Set input (key, list of values) type and output as Context class

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

For each word, Count/sum the number of values

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

For each word, Total count becomes the value

Seoul Software Testing Conference

SumReducer

ReducerInput: Shuffler produces and it becomes the input of the re-ducer

<Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>>, <-World, <1,1>>

Output<Bye, 1>, <Goodbye, 1>, <Hadoop, 2>, <Hello, 2>, <World, 2>

Seoul Software Testing Conference

MRUnit Test

How to UnitTest in Hadoop

Extending JUnit test

With org.a-pache.hadoop.mrunit.* API

Needs to test Driver, Mapper, Reducer

MapRe-duceDriver, MapDriver, ReduceDriverAdd input with expected out-put

Seoul Software Testing Conference

MRUnit Test

import java.util.ArrayList;import java.util.List; import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mrunit.MapDriver;import org.apache.hadoop.mrunit.MapReduceDriver;import org.apache.hadoop.mrunit.ReduceDriver;import org.junit.Before;import org.junit.Test; public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, In-tWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, In-tWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); }

Seoul Software Testing Conference

MRUnit Test

@Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } @Test public void testMapReduce() { mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2)); mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1)); mapReduceDriver.runTest(); } }

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Using MRUnit API, declare MapReduce, Mapper, Reducer drivers with input/output (key, value)

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Run setUp() before executing each test method

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Instantiate WordCount Mapper, Reducer

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Instantiate and set Mapper driverwith input/output (key, value)

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Instantiate and set Reducer driverwith input/output (key, value)

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Instantiate and set MapperReducer driverwith input/output (key, value)

Seoul Software Testing Conference

MRUnit Test

@Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); }

Mapper test: Define sample input with expected output

Seoul Software Testing Conference

MRUnit Test

@Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); }

Reducer test: Define sample input with expected output

Seoul Software Testing Conference

MRUnit Test

@Test public void testMapReduce() { mapReduceDriver.with-Input(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.add-Output(new Text("cat"), new IntWritable(2)); mapReduceDriver.add-Output(new Text("dog"), new IntWritable(1)); mapReduceDriv-er.runTest(); } }

MapperReducer test: Define sample input with expected output

Seoul Software Testing Conference

MRUnit Test in real

Need to imple-ment unit tests

How many?all Map, Re-duce, Driver

Problems?Mostly work

But it does not support complicated Map, Re-duce APIs

How many problems you can detect

Depends on how well you implement MRUnit code

Seoul Software Testing Conference

Conclusion

MRUnit for Hadoop Unit TestDevelopmentIntegrate with QA site with CI serverNeed to use it

Seoul Software Testing Conference

Question?

Seoul Software Testing Conference

References

1.Hadoop WordCount example with new map reduce api (http://codesfusion.blogspot.com/2013/10/hadoop-wordcount-with-new-map-reduce-api.html)2.Hadoop Word Count Example (http://wiki.apache.org/hadoop/WordCount )3.Example: WordCount v1.0, Cloudera Hadoop Tutorial (http://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_walk_through.html )4.Testing Word Count (https://cwiki.apache.org/confluence/display/MRUNIT/Testing+Word+Count)5.Apache MRUnit Tutorial (https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial )6.Hadoop Integration Test Suite, Shopzilla (https://github.com/shopzilla/hadoop-integration-test-suite )7.Hadoop’s Elepahnt in the Room, Jeremy Lucas, Shopzilla (http://tech.shopzilla.com/2013/04/hadoops-elephant-in-the-room/ )8.Facebook Test MapReduce Local (https://github.com/facebook/hadoop-20/blob/master/src/test/org/apache/hadoop/mapreduce/TestMapReduceLocal.java )9.Yahoo HIT Hadoop Integrated Testing (http://www.slideshare.net/ydn/hi-tv3?from_search=1 )

Seoul Software Testing Conference

Seoul Software Testing Conference