2014 International Software Testing Conference in Seoul

52
Seoul Software Testing Conference Testing Big Data: Unit Test in Hadoop (Part II) Jongwook Woo (PhD) High-Performance Internet Computing Center (HiPIC) Educational Partner with Cloudera and Grants Awardee of Amazon AWS Computer Information Systems Department California State University, Los Angeles

Transcript of 2014 International Software Testing Conference in Seoul

Page 1: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Testing Big Data: Unit Test in Hadoop (Part II)

Jongwook Woo (PhD)

High-Performance Internet Computing Center (HiPIC)Educational Partner with Cloudera and Grants Awardee of Amazon AWS

Computer Information Systems DepartmentCalifornia State University, Los Angeles

Page 2: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Contents

Test in GeneralUse Cases: Big Data in Hadoop and EcosystemsUnit Test in Hadoop

Page 3: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Test in general

Quality Assur-ance

TDD (Test Driven Devel-opment)

Unit TestTest func-tional units of the S/W

BDD (Behavior Driven Devel-opment)

Based on TDDTest behavior of the S/W

Integration Test: integrated components

Group of unit tests

CI (Continuous Integration) Server

Hudson, Jenkins etc

Page 4: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

CI Server

Continuous Inte-gration Server

TDD (Test Driven Devel-opment) based

All developers commit the update every-dayCI server com-pile and run the unit testsIf a test fails, all receive the failure email

Know who committed a bad code

Hudson, Jenkins etc

Supports SCM version control tools

CVS, Subver-sion, Git

Page 5: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Test in Hadoop

Much harderJUnit cannot be used in HadoopClusterServerParallel Comput-ing

Page 6: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Use Cases: Shopzilla

Hadoop’s Ele-phant In The Room

Hadoop testingQuality Assur-ance

Unit Test: functional units of the S/WIntegration Test: inte-grated com-ponentsBDD Test: Behavior of the S/W

Augmented Development

Use a dev cluster?

Too long per day

Hadoop-In-A-Box

Page 7: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Use Cases: Shopzilla

Hadoop-In-A-Box

Fully compatible Mock Environ-ment

Without a clus-terMock cluster state

Test LocallySingle Node Pseudo ClusterMiniMRCluster=> can test HDFS, Pig

Page 8: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Use Cases: Yahoo

DeveloperWants to run Hadoop codes in the local ma-chine

Does not want to run Hadoop codes at the Hadoop cluster

Yahoo HITHadoop Integra-tion TestRun Hadoop tests in the Hadoop Ecosys-tems

Deploy HIT on a Hadoop sin-gle or clusterRun tests in Hadoop, Pig, Hive, Oozie,…

Page 9: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Unit Test in Hadoop

MRUnit testing framework

is based on JU-nit Cloudera do-nated to Apachecan test Map Reduce pro-grams

written on 0.20 , 0.23.x , 1.0.x , 2.x version of Hadoop

Can test Map-per, Reducer, MapperReducer

Page 10: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Unit Test in Hadoop

WordCount Ex-ample

reads text files and counts how often words oc-cur.

The input and the output are text files,

Need three classes

WordCount.-java

Driver class with main function

WordMapper.-java

Mapper class with map method

SumReducer.-java

Reducer class with reduce method

Page 11: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount Example

WordMapper.-java

Mapper class with map func-tionFor the given sample input

assuming two map nodes

The sample input is dis-tributed to the maps

the first map emits:

<Hello, 1> <World, 1> <Bye, 1> <World, 1>

The second map emits:

<Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1>

Page 12: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount Example

SumReducer.javaReducer class with reduce functionFor the input from two Map-pers

the reduce method just sums up the values,

which are the occur-rence counts for each key

Thus the out-put of the job is:

<Bye, 1> <Goodbye, 1> <Hadoop, 2> <Hello, 2> <World, 2>

Page 13: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount.java (Driver)

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.-FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.Tex-tInputFormat;import org.apache.hadoop.mapreduce.lib.output.-FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.-TextOutputFormat;public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Page 14: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Check Input and Output files

Page 15: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set output (key, value) types

Page 16: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set Mapper/Reducer classes

Page 17: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set Input/Output format classes

Page 18: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set Input/Output paths

Page 19: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Set Driver class

Page 20: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordCount.java

public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }

Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}

Submit the job to the master node

Page 21: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordMapper.java (Mapper class)

import java.io.IOException;import java.util.StringTokenizer; import org.apache.hadoop.io.In-tWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapre-duce.Mapper;

public class WordMapper extends Mapper<Object, Text, Text, In-tWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for process-ing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Page 22: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Extends mapper class with input/output keys and values

Page 23: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Output (key, value) types

Page 24: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Input (key, value) typesOutput as Context type

Page 25: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Read words from each line of the input file

Page 26: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

WordMapper.java

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}

Count each word

Page 27: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Shuffler/Sorter

Maps emit (key, value) pairsShuffler/Sorter of Hadoop framework

Sort (key, value) pairs by keyThen, append the value to make (key, list of values) pairFor example,

The first, sec-ond maps emit:

<Hello, 1> <World, 1> <Bye, 1> <World, 1> <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1>

Shuffler pro-duces and it becomes the input of the reducer

<Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>>, <-World, <1,1>>

Page 28: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

SumReducer.java (Reducer class)

import java.io.IOException;import java.util.Iterator; import org.apache.hadoop.io.In-tWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapre-duce.Reducer; public class SumReducer extends Re-ducer<Text, IntWritable, Text, In-tWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, Interrupt-edException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); }}

Page 29: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

Extends Reducer class with input/output keys and values

Page 30: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

Set output value type

Page 31: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

Set input (key, list of values) type and output as Context class

Page 32: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

For each word, Count/sum the number of values

Page 33: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

SumReducer.java

public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}

For each word, Total count becomes the value

Page 34: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

SumReducer

ReducerInput: Shuffler produces and it becomes the input of the re-ducer

<Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>>, <-World, <1,1>>

Output<Bye, 1>, <Goodbye, 1>, <Hadoop, 2>, <Hello, 2>, <World, 2>

Page 35: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

How to UnitTest in Hadoop

Extending JUnit test

With org.a-pache.hadoop.mrunit.* API

Needs to test Driver, Mapper, Reducer

MapRe-duceDriver, MapDriver, ReduceDriverAdd input with expected out-put

Page 36: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

import java.util.ArrayList;import java.util.List; import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mrunit.MapDriver;import org.apache.hadoop.mrunit.MapReduceDriver;import org.apache.hadoop.mrunit.ReduceDriver;import org.junit.Before;import org.junit.Test; public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, In-tWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, In-tWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); }

Page 37: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

@Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } @Test public void testMapReduce() { mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2)); mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1)); mapReduceDriver.runTest(); } }

Page 38: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Using MRUnit API, declare MapReduce, Mapper, Reducer drivers with input/output (key, value)

Page 39: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Run setUp() before executing each test method

Page 40: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Instantiate WordCount Mapper, Reducer

Page 41: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Instantiate and set Mapper driverwith input/output (key, value)

Page 42: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Instantiate and set Reducer driverwith input/output (key, value)

Page 43: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);

reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);

mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }

Instantiate and set MapperReducer driverwith input/output (key, value)

Page 44: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

@Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); }

Mapper test: Define sample input with expected output

Page 45: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

@Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); }

Reducer test: Define sample input with expected output

Page 46: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test

@Test public void testMapReduce() { mapReduceDriver.with-Input(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.add-Output(new Text("cat"), new IntWritable(2)); mapReduceDriver.add-Output(new Text("dog"), new IntWritable(1)); mapReduceDriv-er.runTest(); } }

MapperReducer test: Define sample input with expected output

Page 47: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

MRUnit Test in real

Need to imple-ment unit tests

How many?all Map, Re-duce, Driver

Problems?Mostly work

But it does not support complicated Map, Re-duce APIs

How many problems you can detect

Depends on how well you implement MRUnit code

Page 48: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Conclusion

MRUnit for Hadoop Unit TestDevelopmentIntegrate with QA site with CI serverNeed to use it

Page 49: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Question?

Page 50: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

References

1.Hadoop WordCount example with new map reduce api (http://codesfusion.blogspot.com/2013/10/hadoop-wordcount-with-new-map-reduce-api.html)2.Hadoop Word Count Example (http://wiki.apache.org/hadoop/WordCount )3.Example: WordCount v1.0, Cloudera Hadoop Tutorial (http://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_walk_through.html )4.Testing Word Count (https://cwiki.apache.org/confluence/display/MRUNIT/Testing+Word+Count)5.Apache MRUnit Tutorial (https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial )6.Hadoop Integration Test Suite, Shopzilla (https://github.com/shopzilla/hadoop-integration-test-suite )7.Hadoop’s Elepahnt in the Room, Jeremy Lucas, Shopzilla (http://tech.shopzilla.com/2013/04/hadoops-elephant-in-the-room/ )8.Facebook Test MapReduce Local (https://github.com/facebook/hadoop-20/blob/master/src/test/org/apache/hadoop/mapreduce/TestMapReduceLocal.java )9.Yahoo HIT Hadoop Integrated Testing (http://www.slideshare.net/ydn/hi-tv3?from_search=1 )

Page 51: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference

Page 52: 2014 International Software Testing Conference in Seoul

Seoul Software Testing Conference