Building a Hadoop Connector

Post on 19-Jun-2015

58 views 0 download

Tags:

description

This presentation was made during the HUG London Meetup: SQL and NoSQL on Hadoop – A look at performance. Speakers: Alex Bordei- Techie Product Manager at Bigstep, Calin Burloiu- Big Data Engineer at Avira and Radu Pastia - Big Data Team Leader at Avira. We worked with Avira to show how much throughput that can be squeezed from a Hadoop connector. Together we have benchmarked Couchdoop for performance and talked about the behavior you can expect and tweaks that can improve the performance of your big data setup. If you have any questions, we will be glad to provide you with any additional information.

Transcript of Building a Hadoop Connector

pastiaro.wordpress.com

@rpastia

Building a connector – The Wrong Way

Mapper Reducer

Building a connector – The Right Way

Mapper ReducerPartitioner

InputSplit

InputFormat

RecordReader

RecordWriter

OutputFormat

The InputFormat: From Input to Mapper--range 2014-09-01;2014-09-20

--number_of_mappers 4

2014-09-01 2014-09-022014-09-03

2014-09-04

2014-09-05

… … …

2014-09-06

2014-09-20

2014-09-01

2014-09-02

2014-09-05

.

.

.

Input Split 1

(2014-09-01-A; record A)

(2014-09-01-B; record B)

(2014-09-01-…; record …)

(2014-09-02-A; record A)

(2014-09-02-B; record B)

(2014-09-02-…; record …)

(2014-09-05-A; record A)

(2014-09-05-B; record B)

(2014-09-05-…; record …)

Record Reader 1

Mapper

The InputFormat: From Input to Mapper

--range 2014-09-01;2014-09-20

--number_of_mappers 4

2014-09-01 2014-09-022014-09-03

2014-09-04

2014-09-05

… … …

2014-09-06

2014-09-20

2014-09-01

2014-09-02

2014-09-05

.

.

.

Input Split 1

(2014-09-01-A; record A)

(2014-09-01-B; record B)

(2014-09-01-…; record …)

(2014-09-02-A; record A)

(2014-09-02-B; record B)

(2014-09-02-…; record …)

(2014-09-05-A; record A)

(2014-09-05-B; record B)

(2014-09-05-…; record …)

Record Reader 1

Mapper