Adaptive Algorithms and Parallel Computing -...

36
Adaptive Algorithms and Parallel Computing Multicore and Multimachine Computing with MATLAB (Part 2) Academic year 2014-2015 Simone Scardapane

Transcript of Adaptive Algorithms and Parallel Computing -...

Page 1: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

Adaptive Algorithms andParallel Computing

Multicore and Multimachine Computing with MATLAB (Part 2)

Academic year 2014-2015

Simone Scardapane

Page 2: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

Content of the Lecture

Main points:• The MapReduce paradigm.• Programming MapReduce in MATLAB.• Hadoop interfacing.

In the next lecture(s):• Introduction to CUDA.• GPU acceleration in MATLAB.

Page 3: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

Overview

1 MAPREDUCECommodity computing platformsHadoop overview

2 MAPREDUCE IN MATLABOverviewExample: counting words

3 EXAMPLE: LINEAR REGRESSIONLinear regression conceptsLinear regression in MapReduce

4 REFERENCES

Page 4: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Commodity computing platforms

Commodity Computing

Commodity computing (CC) is the use of a large number of “commod-ity” (i.e. low performance) components for the purpose of paralleliza-tion.

A CC cluster is made of standard computing elements, using classi-cal operating systems and protocols. It is driven by decreasing costsfrom the vendors.

It is opposed to other parallel paradigms such as supercomputers,high performance clusters and grid computing.

Page 5: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Commodity computing platforms

Challenges of CC clusters

Faults Components might fail while performingtheir tasks. Data duplication is needed to pre-vent loss of information.

Changing topology New computers can connect/disconnect atany time.

Performance Each component has its own performance(e.g. high-end against low-end computers).

Distributed data Data to the problem is distributed throughoutthe network.

... ...

Page 6: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

Overview

1 MAPREDUCECommodity computing platformsHadoop overview

2 MAPREDUCE IN MATLABOverviewExample: counting words

3 EXAMPLE: LINEAR REGRESSIONLinear regression conceptsLinear regression in MapReduce

4 REFERENCES

Page 7: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Hadoop overview

Google and MapReduce

MapReduce (MR) is a programming framework developed by Googleto address the previous problems. An MR program requires (at least)two components:

1 A mapper is used to filter the input data.2 A reducer performs a summary of the information provided by

the mapper.

The MR framework takes charge of running in parallel multiple map-pers/reducers, handles data redundancy, faults, etc.

Page 8: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Hadoop overview

A preliminary example

Figure: Example of running MapReduce for counting words. Taken from:[Image source]

Page 9: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Hadoop overview

Formal definition

Input data to the problem must be composed of key/value pairs (k, v),belonging to two generic domains k ∈ Min and v ∈ Vin. The data isinitially filtered according to a function:

MAP(k, v) = list(k2, v2) , (1)

where the output data can belong to different domains k2 ∈Mmap andv2 ∈ Vmap. The results from the map operations can be shuffled andcollected, and finally reduced using a different function:

REDUCE(k2, list(v2)) = (k2, list(v3)) , (2)

with v3 ∈ Vout.

Page 10: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Hadoop overview

History of Hadoop

• A white paper on MR by Google was published in 2004.• This inspired Doug Cutting (a Yahoo employee) and Mike

Cafarella to develop an open source version in Java underApache license.

• The first version of Hadoop was released in 2011.• Today, the use of Hadoop is widespread (more than half the

Fortune 50 companies use it).• Hadoop clusters can be bought from most cloud vendors.• Additionally, several modules can be installed on top of Hadoop

(e.g. Pig, Spark, etc.).

Page 11: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Hadoop overview

Hadoop layers

Figure: Layered architecture of Hadoop. Taken from: [Image source]

Page 12: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

Overview

1 MAPREDUCECommodity computing platformsHadoop overview

2 MAPREDUCE IN MATLABOverviewExample: counting words

3 EXAMPLE: LINEAR REGRESSIONLinear regression conceptsLinear regression in MapReduce

4 REFERENCES

Page 13: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Overview

MapReduce and MATLAB

It is possible to easily build a CC cluster using the MATLAB Dis-tributed Computing Server (see previous lecture). The only require-ment is for each component to run the same version of MATLAB.

The widespread use of Hadoop pushed Mathworks to release a cus-tom implementation of the MR framework in MATLAB R2014b, viathe mapreduce function.

MR programs implemented in MATLAB can be run from thedesktop on a pre-existing Hadoop cluster, by changing the de-fault job scheduler (more on this next).

Page 14: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Overview

Overview of MR in MATLAB

Figure: Overview of MR components in MATLAB. Taken from: [Imagesource]

Page 15: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Overview

MR workflow in MATLAB

1 The input data is saved in a particular object called datastore,which handles data distribution and partitioning in chunks.

2 Each data chunk is processed by a different map function, andthe result is stored in an intermediated object of classKeyValueStore.

3 The intermediate outputs are grouped by key (i.e. by k2 in ourformal definition).

4 Each group of KeyValueStore elements is processed by a reducefunction.

5 Final results are saved in an output datastore object.

Page 16: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

Overview

1 MAPREDUCECommodity computing platformsHadoop overview

2 MAPREDUCE IN MATLABOverviewExample: counting words

3 EXAMPLE: LINEAR REGRESSIONLinear regression conceptsLinear regression in MapReduce

4 REFERENCES

Page 17: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Example: counting words

Input data definition

Our input data is given by different text files:

Input1.txt:

Thisisanexample

Input2.txt:

Anotherexample

Input3.txt:

YetAnotherexample

The objective is writing an MR program that counts words fromall the files.

Page 18: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Example: counting words

Loading data in the datastore

As a first step, we load the data in the datastore object:

ds = datastore({ ' input1 . t x t ' , ' input2 . t x t ' , ' input3 . t x t ' } , '←↩ReadVariableNames ' , false , ' VariableNames ' , { 'Word ' }) ;

MATLAB provides two different datastores: a TabularTextDatastore

for text files, and a KeyValueDatastore for key/value pairs. MATLABautomatically recognizes the correct type of datastore to use:

>> whosName Size Bytes Class Attributes

ds 1x1 112 matlab.io.datastore.TabularTextDatastore

Page 19: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Example: counting words

Writing a map function

A map function must take three arguments:

1 data: this contains a chunk of data saved in the datastore, whichcan be accessed with the standard dot notation.

2 info: additional information on the data.3 intermKVStore: intermediate KeyValueStore object that will

store the output of the mapper.

The KeyValueStore has only two possible methods:

1 add to add a single key/value pair.2 addmulti to add multiple key/value pairs.

Page 20: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Example: counting words

Our map function

func t ion mapCountWords (data , info , intermKVStore )

% Get unique wordswords = unique (data .Word ) ;

% Get i n d i c e s[ ˜ , idxs ] = ismember (data .Word , words ) ;

% Count the occurenceswords_count = arrayfun (@ (x ) sum(idxs == x ) , 1 :numel (words ) ) ;

% Add to KeyValueStoreaddmulti (intermKVStore , words , num2cell (words_count ) ) ;

end

Page 21: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Example: counting words

Writing a reduce function

The reduce function is similar to the map function and takes threearguments:

1 intermKey is the key associated to the values.2 intermValIter is a ValueIterator object for accessing the

value associated to the key.3 outKVStore is the store for saving the output results.

The ValueIterator has only two functions:

1 hasnext returns a boolean indicating whether there is anothervalue available.

2 getnext accesses the next value.

Page 22: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Example: counting words

Our reduce function

func t ion reduceCountWords (intermKey , intermValIter , outKVStore )

% I n i t i a l i z e the sumsum_occurences = 0 ;

% Compute the sumwhile (hasnext (intermValIter ) )

sum_occurences = sum_occurences + getnext (intermValIter ) ;end

% Add to the ouput KeyValueStoreadd (outKVStore , intermKey , sum_occurences ) ;

end

Page 23: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Example: counting words

Running the MR job

>> outds = mapreduce(ds, @mapCountWords, @reduceCountWords);

Starting parallel pool (parpool) using the ’local’ profile ...connected to 2 workers.

Parallel mapreduce execution on the parallel pool:

********************************* MAPREDUCE PROGRESS *********************************Map 0% Reduce 0%Map 25% Reduce 0%Map 50% Reduce 0%Map 75% Reduce 0%Map 100% Reduce 50%Map 100% Reduce 100%

Page 24: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Example: counting words

Visualizing the results

>> readall(outds)

ans =Key Value

_________ _____

’Example’ [3]’Yet’ [1]’An’ [1]’Another’ [2]’Is’ [1]’This’ [1]

Page 25: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

Overview

1 MAPREDUCECommodity computing platformsHadoop overview

2 MAPREDUCE IN MATLABOverviewExample: counting words

3 EXAMPLE: LINEAR REGRESSIONLinear regression conceptsLinear regression in MapReduce

4 REFERENCES

Page 26: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Linear regression concepts

Linear regression

We want to learn an unknown function given by:

y = wTo x , (3)

with x ∈ Rd and y ∈ R. We are provided with a set of N examples of it:

(x1, y1), (x2, y2), . . . , (xN, yN) (4)

The aim is to find a w such that the mean-squared error is minimized:

w∗ = minw∈Rd

N∑i=1

(wTxi − yi)2 . (5)

Page 27: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Linear regression concepts

Least-square solution

We build the input matrix X by stacking the inputs:

X =

xT1

. . .xT

N

(6)

and the output vector y = [y1, . . . , yN]T. It is easy to show that the solu-

tion to the previous optimization problem can be expressed in closedform as:

w∗ =(XTX

)−1XTy . (7)

Page 28: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

Overview

1 MAPREDUCECommodity computing platformsHadoop overview

2 MAPREDUCE IN MATLABOverviewExample: counting words

3 EXAMPLE: LINEAR REGRESSIONLinear regression conceptsLinear regression in MapReduce

4 REFERENCES

Page 29: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Linear regression in MapReduce

Formulation of the problem

In an MR framework, each mapper has access to a subset of X and y,such that:

X =

X1. . .XL

(8)

where L is the number of mappers, and similarly for y. The final solu-tion is given by:

w∗ =

(L∑

k=1

XTk Xk

)−1( L∑k=1

XTk yk

). (9)

In a straightforward implementation, each mapper will compute itsportion of the sum, and the reducer will aggregate them.

Page 30: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Linear regression in MapReduce

Preparing the data

We will use the abalone dataset, which has 8 input variables and a sin-gle output variable. The dataset is stored as two variables in a .mat file,so we first need to transform it as a .csv and load it into the datastore:

% Load the abalone d a t a s e tload abalone_dataset .mat

% Write to csv and open with d a t a s t o r ecsvwri te ( ' abalone . csv ' , [abaloneInputs ' abaloneTargets ' ] ) ;ds = datastore({ ' abalone . csv ' } , ' ReadVariableNames ' , false ) ;

[Learn more about the Abalone dataset]

Page 31: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Linear regression in MapReduce

Map function

func t ion mapLinearRegression (data , info , intermKVStore )

% I n i t i a l i z evars = { ' Var1 ' , ' Var2 ' , ' Var3 ' , ' Var4 ' , ' Var5 ' , ' Var6 ' , ' Var7 ' , ' Var8 '←↩

} ;X = zeros ( length (data .Var1 ) , 8 ) ;y = data .Var9 ;

% Get each columnf o r i = 1 : 8

X ( : , i ) = data . ( vars{i}) ;end

% Add to KeyValueStoreadd (intermKVStore , ' key ' , {X ' * X , X ' * y}) ;

end

Page 32: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Linear regression in MapReduce

Reduce function

func t ion reduceLinearRegression (intermKey , intermValIter , outKVStore )

% I n i t i a l i z e the sumXX = 0 ;Xy = 0 ;

% Compute the sumwhile (hasnext (intermValIter ) )

curr = getnext (intermValIter ) ;XX = XX + curr{1} ;Xy = Xy + curr{2} ;

end

% Add to the ouput KeyValueStoreadd (outKVStore , intermKey , XX\Xy ) ;

end

Page 33: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Linear regression in MapReduce

Running the job

% Cal l the mapreduce joboutds = mapreduce (ds , @mapLinearRegression , @reduceLinearRegression ) ;wmr = readall (outds ) ;wmr = wmr{1 ,2}{1} ;

% Run a c e n t r a l i z e d r e g r e s s i o nwcent = (abaloneInputs*abaloneInputs ' ) \abaloneInputs*abaloneTargets ' ;

% Check e q u a l i t yassert ( a l l ( abs (wmr − wcent ) < 10ˆ−6) ) ;

Page 34: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Linear regression in MapReduce

The result

0 5 10 15 20 25 30

6

8

10

12

14

16

18

20

Figure: Result of linear regression estimation on 30 selected training samples.

Page 35: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Learn more

1 - Process big data in MATLAB with mapreduce

2 - MATLAB mapreduce APIs

3 - Using mapreduce to fit a logistic regression model

4 - Getting started with mapreduce

5 - Running mapreduce on a Hadoop cluster

Page 36: Adaptive Algorithms and Parallel Computing - ISPAMMispac.diet.uniroma1.it/.../uploads/2015/04/MapReduce-in-MATLAB.pdf · Adaptive Algorithms and Parallel Computing ... MapReduce (MR)

MAPREDUCE MAPREDUCE IN MATLAB EXAMPLE: LINEAR REGRESSION REFERENCES

Bibliography I

C. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, andK. Olukotun, Map-reduce for machine learning on multicore,Advances in neural information processing systems 19 (2007),281.

J. Dean and S. Ghemawat, Mapreduce: simplified data processing onlarge clusters, Communications of the ACM 51 (2008), no. 1,107–113.

T. White, Hadoop: the definitive guide: the definitive guide, O’ReillyMedia, Inc., 2009.