Map/Reduce & Hadoop - Rijksuniversiteit Groningenbelikov/VO2012/Lectures/lecturehadoop.pdf · Why...

Post on 13-Aug-2020

4 views 0 download

Transcript of Map/Reduce & Hadoop - Rijksuniversiteit Groningenbelikov/VO2012/Lectures/lecturehadoop.pdf · Why...

Virtual Observations 2012

Virtual Observations 2012

Map/Reduce & Hadoop

2012/10/25 Hugo Buddelmeijer

Outline

• Why Map/Reduce or Hadoop?

• The Map/Reduce paradigm

• The Hadoop implementation

• Map/Reduce & Astronomy

• Hadoop @ RUG

Why use Map/Reduce

• Need problems that

• - require large data sets

• - have highly parallelisable algorithms

• - result in (relatively) small solutions

Why learn Map/Reduce

• Declarative: specify what not how

• Let the system optimize the how

e.g.: input is ‘delivered’ to the program, not ‘retrieved’ by the program.

Alan Perlis’ Epigram 19 (1982):

“A language that doesn't affect the way you think about programming, is not worth knowing.”

The Map/Reduce Paradigm

• Mapper: (k1, v1) -> list(k2, v2)

• Reducer: (k2, list(v2)) -> list(k3,v3)

‘Hello World’: Word Count

A Change of Thought

• Separate data handling from processing

• No side effects in algorithms

• No POSIX! (e.g. no seeking in files)

The Hadoop Implementation

• Open source Map/Reduce implementationfrom Apache

• HDFS: Hadoop Distributed File System– Bring processing to the data

• Many additional software, e.g.– HBase: Database based on M/R

Map/Reduce & Astronomy

• Wiley et al. 2011: Coaddition– Map: select &

align frames

– Reduce:

stack frames

• Starr et al. 2012: Classifying Transients

Future Astronomy

• Query large catalogs

• Perform full sky image processing

• Instrument calibration

More Hadoop Concepts

• Sequence Files

• Java / Pipes / Streaming

• Pydoop (self contained w/ pyfits)

• Tadoop = Target (/Astro-WISE) + Hadoop

Hadoop @ RUG

• Experimental 6 node cluster @CIT

• Maintained by Fokke Dijkstra & Bob Droge

• Cloudera CDH3u4 installation

• More Information:

• Cloudera.com

• Tom White – Hadoop - O’Reilly