Map/Reduce & Hadoop - Rijksuniversiteit Groningenbelikov/VO2012/Lectures/lecturehadoop.pdf · Why...
Transcript of Map/Reduce & Hadoop - Rijksuniversiteit Groningenbelikov/VO2012/Lectures/lecturehadoop.pdf · Why...
Virtual Observations 2012
Virtual Observations 2012
Map/Reduce & Hadoop
2012/10/25 Hugo Buddelmeijer
Outline
• Why Map/Reduce or Hadoop?
• The Map/Reduce paradigm
• The Hadoop implementation
• Map/Reduce & Astronomy
• Hadoop @ RUG
Why use Map/Reduce
• Need problems that
• - require large data sets
• - have highly parallelisable algorithms
• - result in (relatively) small solutions
Why learn Map/Reduce
• Declarative: specify what not how
• Let the system optimize the how
e.g.: input is ‘delivered’ to the program, not ‘retrieved’ by the program.
Alan Perlis’ Epigram 19 (1982):
“A language that doesn't affect the way you think about programming, is not worth knowing.”
The Map/Reduce Paradigm
• Mapper: (k1, v1) -> list(k2, v2)
• Reducer: (k2, list(v2)) -> list(k3,v3)
‘Hello World’: Word Count
A Change of Thought
• Separate data handling from processing
• No side effects in algorithms
• No POSIX! (e.g. no seeking in files)
The Hadoop Implementation
• Open source Map/Reduce implementationfrom Apache
• HDFS: Hadoop Distributed File System– Bring processing to the data
• Many additional software, e.g.– HBase: Database based on M/R
Map/Reduce & Astronomy
• Wiley et al. 2011: Coaddition– Map: select &
align frames
– Reduce:
stack frames
• Starr et al. 2012: Classifying Transients
Future Astronomy
• Query large catalogs
• Perform full sky image processing
• Instrument calibration
More Hadoop Concepts
• Sequence Files
• Java / Pipes / Streaming
• Pydoop (self contained w/ pyfits)
• Tadoop = Target (/Astro-WISE) + Hadoop
Hadoop @ RUG
• Experimental 6 node cluster @CIT
• Maintained by Fokke Dijkstra & Bob Droge
• Cloudera CDH3u4 installation
• More Information:
• Cloudera.com
• Tom White – Hadoop - O’Reilly