Hadoop Overview & Architecture

91
Hadoop Overview & Architecture Milind Bhandarkar Chief Scientist, Machine Learning Platforms, Greenplum, A Division of EMC (Twitter: @techmilind)

description

Hadoop Overview & Architecture

Transcript of Hadoop Overview & Architecture

Page 1: Hadoop Overview & Architecture

Hadoop Overview & Architecture

Milind Bhandarkar Chief Scientist, Machine Learning Platforms,

Greenplum, A Division of EMC (Twitter: @techmilind)

Page 2: Hadoop Overview & Architecture

About Me •  http://www.linkedin.com/in/milindb

•  Founding member of Hadoop team at Yahoo! [2005-2010]

•  Contributor to Apache Hadoop since v0.1

•  Built and led Grid Solutions Team at Yahoo! [2007-2010]

•  Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)

•  Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and EMC-Greenplum

Page 3: Hadoop Overview & Architecture

2

Agenda • Motivation

• Hadoop

• Map-Reduce

• Distributed File System

• Hadoop Architecture

• Next Generation MapReduce

• Q & A

Page 4: Hadoop Overview & Architecture

3

Hadoop At Scale���(Some Statistics)

• 40,000 + machines in 20+ clusters

• Largest cluster is 4,000 machines

• 170 Petabytes of storage

• 1000+ users

• 1,000,000+ jobs/month

Page 5: Hadoop Overview & Architecture

EVERY CLICK BEHIND

Page 6: Hadoop Overview & Architecture

Hadoop Workflow

Page 7: Hadoop Overview & Architecture

Who Uses Hadoop ?

Page 8: Hadoop Overview & Architecture

7

Why Hadoop ?

Page 9: Hadoop Overview & Architecture

Big Datasets���(Data-Rich Computing theme proposal. J. Campbell, et al, 2007)

Page 10: Hadoop Overview & Architecture

Cost Per Gigabyte���(http://www.mkomo.com/cost-per-gigabyte)

Page 11: Hadoop Overview & Architecture

Storage Trends ���(Graph by Adam Leventhal, ACM Queue, Dec 2009)

Page 12: Hadoop Overview & Architecture

11

Motivating Examples

Page 13: Hadoop Overview & Architecture

Yahoo! Search Assist

Page 14: Hadoop Overview & Architecture

13

Search Assist • Insight: Related concepts appear close

together in text corpus

• Input: Web pages

• 1 Billion Pages, 10K bytes each

• 10 TB of input data

• Output: List(word, List(related words))

Page 15: Hadoop Overview & Architecture

// Code Samples

14

// Input: List(URL, Text)foreach URL in Input : Words = Tokenize(Text(URL)); foreach word in Tokens : Insert (word, Next(word, Tokens)) in Pairs; Insert (word, Previous(word, Tokens)) in Pairs;// Result: Pairs = List (word, RelatedWord)Group Pairs by word;// Result: List (word, List(RelatedWords)foreach word in Pairs : Count RelatedWords in GroupedPairs;// Result: List (word, List(RelatedWords, count))foreach word in CountedPairs : Sort Pairs(word, *) descending by count; choose Top 5 Pairs;// Result: List (word, Top5(RelatedWords))

Search Assist

Page 16: Hadoop Overview & Architecture

People You May Know

Page 17: Hadoop Overview & Architecture

16

People You May Know • Insight: You might also know Joe Smith if a

lot of folks you know, know Joe Smith

• if you don’t know Joe Smith already

• Numbers:

• 100 MM users

• Average connections per user is 100

Page 18: Hadoop Overview & Architecture

// Code Samples

17

// Input: List(UserName, List(Connections)) foreach u in UserList : // 100 MM foreach x in Connections(u) : // 100 foreach y in Connections(x) : // 100 if (y not in Connections(u)) : Count(u, y)++; // 1 Trillion Iterations Sort (u,y) in descending order of Count(u,y); Choose Top 3 y; Store (u, {y0, y1, y2}) for serving;

People You May Know

Page 19: Hadoop Overview & Architecture

18

Performance

• 101 Random accesses for each user

• Assume 1 ms per random access

• 100 ms per user

• 100 MM users

• 100 days on a single machine

Page 20: Hadoop Overview & Architecture

19

MapReduce Paradigm

Page 21: Hadoop Overview & Architecture

20

Map & Reduce

• Primitives in Lisp (& Other functional languages) 1970s

• Google Paper 2004

• http://labs.google.com/papers/mapreduce.html

Page 22: Hadoop Overview & Architecture

21

Output_List = Map (Input_List)

Square (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) =(1, 4, 9, 16, 25, 36,49, 64, 81, 100)

Map

Page 23: Hadoop Overview & Architecture

22

Output_Element = Reduce (Input_List)

Sum (1, 4, 9, 16, 25, 36,49, 64, 81, 100) = 385

Reduce

Page 24: Hadoop Overview & Architecture

23

Parallelism

• Map is inherently parallel

• Each list element processed independently

• Reduce is inherently sequential

• Unless processing multiple lists

• Grouping to produce multiple lists

Page 25: Hadoop Overview & Architecture

// Example

24

// Input: http://hadoop.apache.org Pairs = Tokenize_And_Pair ( Text ( Input ) )

Search Assist Map

Page 26: Hadoop Overview & Architecture

25

// Input: GroupedList (word, GroupedList(words))CountedPairs = CountOccurrences (word, RelatedWords)

Output = {(hadoop, apache, 7) (hadoop, DFS, 3) (hadoop, streaming, 4) (hadoop, mapreduce, 9) ...}

Search Assist Reduce

Page 27: Hadoop Overview & Architecture

26

Issues with Large Data

• Map Parallelism: Chunking input data

• Reduce Parallelism: Grouping related data

• Dealing with failures & load imbalance

Page 28: Hadoop Overview & Architecture
Page 29: Hadoop Overview & Architecture

28

Apache Hadoop

• January 2006: Subproject of Lucene

• January 2008: Top-level Apache project

• Stable Version: 1.0.3

• Latest Version: 2.0.0 (Alpha)

Page 30: Hadoop Overview & Architecture

29

Apache Hadoop

• Reliable, Performant Distributed file system

• MapReduce Programming framework

• Ecosystem: HBase, Hive, Pig, Howl, Oozie, Zookeeper, Chukwa, Mahout, Cascading, Scribe, Cassandra, Hypertable, Voldemort, Azkaban, Sqoop, Flume, Avro ...

Page 31: Hadoop Overview & Architecture

30

Problem: Bandwidth to Data

• Scan 100TB Datasets on 1000 node cluster

• Remote storage @ 10MB/s = 165 mins

• Local storage @ 50-200MB/s = 33-8 mins

• Moving computation is more efficient than moving data

• Need visibility into data placement

Page 32: Hadoop Overview & Architecture

31

Problem: Scaling Reliably • Failure is not an option, it’s a rule !

• 1000 nodes, MTBF < 1 day

• 4000 disks, 8000 cores, 25 switches, 1000 NICs, 2000 DIMMS (16TB RAM)

• Need fault tolerant store with reasonable availability guarantees

• Handle hardware faults transparently

Page 33: Hadoop Overview & Architecture

32

Hadoop Goals

• Scalable: Petabytes (1015 Bytes) of data on thousands on nodes

• Economical: Commodity components only

• Reliable

• Engineering reliability into every application is expensive

Page 34: Hadoop Overview & Architecture

33

Hadoop MapReduce

Page 35: Hadoop Overview & Architecture

34

Think MapReduce

• Record = (Key, Value)

• Key : Comparable, Serializable

• Value: Serializable

• Input, Map, Shuffle, Reduce, Output

Page 36: Hadoop Overview & Architecture

// Code Samples

35

cat /var/log/auth.log* | \ grep “session opened” | cut -d’ ‘ -f10 | \sort | \uniq -c > \~/userlist

Seems Familiar ?

Page 37: Hadoop Overview & Architecture

36

Map

• Input: (Key1, Value1)

• Output: List(Key2, Value2)

• Projections, Filtering, Transformation

Page 38: Hadoop Overview & Architecture

37

Shuffle

• Input: List(Key2, Value2)

• Output

• Sort(Partition(List(Key2, List(Value2))))

• Provided by Hadoop

Page 39: Hadoop Overview & Architecture

38

Reduce

• Input: List(Key2, List(Value2))

• Output: List(Key3, Value3)

• Aggregation

Page 40: Hadoop Overview & Architecture

39

Hadoop Streaming • Hadoop is written in Java

• Java MapReduce code is “native” • What about Non-Java Programmers ?

• Perl, Python, Shell, R

• grep, sed, awk, uniq as Mappers/Reducers

• Text Input and Output

Page 41: Hadoop Overview & Architecture

40

Hadoop Streaming • Thin Java wrapper for Map & Reduce Tasks

• Forks actual Mapper & Reducer

• IPC via stdin, stdout, stderr

• Key.toString() \t Value.toString() \n

• Slower than Java programs

• Allows for quick prototyping / debugging

Page 42: Hadoop Overview & Architecture

// Code Samples

41

$ bin/hadoop jar hadoop-streaming.jar \ -input in-files -output out-dir \ -mapper mapper.sh -reducer reducer.sh# mapper.shsed -e 's/ /\n/g' | grep .# reducer.shuniq -c | awk '{print $2 "\t" $1}'

Hadoop Streaming

Page 43: Hadoop Overview & Architecture

42

Hadoop Distributed File System (HDFS)

Page 44: Hadoop Overview & Architecture

43

HDFS

• Data is organized into files and directories

• Files are divided into uniform sized blocks (default 128MB) and distributed across cluster nodes

• HDFS exposes block placement so that computation can be migrated to data

Page 45: Hadoop Overview & Architecture

44

HDFS

• Blocks are replicated (default 3) to handle hardware failure

• Replication for performance and fault tolerance (Rack-Aware placement)

• HDFS keeps checksums of data for corruption detection and recovery

Page 46: Hadoop Overview & Architecture

45

HDFS

• Master-Worker Architecture

• Single NameNode

• Many (Thousands) DataNodes

Page 47: Hadoop Overview & Architecture

46

HDFS Master���(NameNode)

• Manages filesystem namespace

• File metadata (i.e. “inode”) • Mapping inode to list of blocks + locations

• Authorization & Authentication

• Checkpoint & journal namespace changes

Page 48: Hadoop Overview & Architecture

47

Namenode

• Mapping of datanode to list of blocks

• Monitor datanode health

• Replicate missing blocks

• Keeps ALL namespace in memory

• 60M objects (File/Block) in 16GB

Page 49: Hadoop Overview & Architecture

48

Datanodes • Handle block storage on multiple volumes &

block integrity

• Clients access the blocks directly from data nodes

• Periodically send heartbeats and block reports to Namenode

• Blocks are stored as underlying OS’s files

Page 50: Hadoop Overview & Architecture

HDFS Architecture

Page 51: Hadoop Overview & Architecture

Example: Unigrams

• Input: Huge text corpus

• Wikipedia Articles (40GB uncompressed)

• Output: List of words sorted in descending order of frequency

Page 52: Hadoop Overview & Architecture

$ cat ~/wikipedia.txt | \sed -e 's/ /\n/g' | grep . | \sort | \uniq -c > \~/frequencies.txt$ cat ~/frequencies.txt | \# cat | \sort -n -k1,1 -r |# cat > \~/unigrams.txt

Unigrams

Page 53: Hadoop Overview & Architecture

mapper (filename, file-contents):for each word in file-contents: emit (word, 1)

reducer (word, values):

sum = 0for each value in values: sum = sum + valueemit (word, sum)

MR for Unigrams

Page 54: Hadoop Overview & Architecture

mapper (word, frequency):emit (frequency, word)

reducer (frequency, words):

for each word in words: emit (word, frequency)

MR for Unigrams

Page 55: Hadoop Overview & Architecture

public static class MapClass extends MapReduceBaseimplements Mapper<LongWritable, Text, Text, IntWritable> {

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { Text word = new Text(itr.nextToken()); output.collect(word, new IntWritable(1)); }}

}

Unigrams: Java Mapper

Page 56: Hadoop Overview & Architecture

public static class Reduce extends MapReduceBaseimplements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key,Iterator<IntWritable> values,

OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum));}

}

Unigrams: Java Reducer

Page 57: Hadoop Overview & Architecture

public void run(String inputPath, String outputPath) throws Exception{

JobConf conf = new JobConf(WordCount.class);conf.setJobName("wordcount");conf.setMapperClass(MapClass.class);

conf.setReducerClass(Reduce.class); FileInputFormat.addInputPath(conf, new Path(inputPath));

FileOutputFormat.setOutputPath(conf, new Path(outputPath));JobClient.runJob(conf);

}

Unigrams: Driver

Page 58: Hadoop Overview & Architecture

Configuration •  Unified Mechanism for

•  Configuring Daemons

•  Runtime environment for Jobs/Tasks

•  Defaults: *-default.xml

•  Site-Specific: *-site.xml

•  final parameters

Page 59: Hadoop Overview & Architecture

<configuration><property> <name>mapred.job.tracker</name> <value>head.server.node.com:9001</value></property><property> <name>fs.default.name</name> <value>hdfs://head.server.node.com:9000</value></property><property><name>mapred.child.java.opts</name><value>-Xmx512m</value><final>true</final></property>

....</configuration>

Example

Page 60: Hadoop Overview & Architecture

Running Hadoop Jobs

Page 61: Hadoop Overview & Architecture

[milindb@gateway ~]$ hadoop jar \$HADOOP_HOME/hadoop-examples.jar wordcount \/data/newsarchive/20080923 /tmp/newsoutinput.FileInputFormat: Total input paths to process : 4mapred.JobClient: Running job: job_200904270516_5709mapred.JobClient: map 0% reduce 0%mapred.JobClient: map 3% reduce 0%mapred.JobClient: map 7% reduce 0%....mapred.JobClient: map 100% reduce 21%mapred.JobClient: map 100% reduce 31%mapred.JobClient: map 100% reduce 33%mapred.JobClient: map 100% reduce 66%mapred.JobClient: map 100% reduce 100%mapred.JobClient: Job complete: job_200904270516_5709

Running a Job

Page 62: Hadoop Overview & Architecture

mapred.JobClient: Counters: 18mapred.JobClient: Job Countersmapred.JobClient: Launched reduce tasks=1mapred.JobClient: Rack-local map tasks=10mapred.JobClient: Launched map tasks=25mapred.JobClient: Data-local map tasks=1mapred.JobClient: FileSystemCountersmapred.JobClient: FILE_BYTES_READ=491145085mapred.JobClient: HDFS_BYTES_READ=3068106537mapred.JobClient: FILE_BYTES_WRITTEN=724733409mapred.JobClient: HDFS_BYTES_WRITTEN=377464307

Running a Job

Page 63: Hadoop Overview & Architecture

mapred.JobClient: Map-Reduce Frameworkmapred.JobClient: Combine output records=73828180mapred.JobClient: Map input records=36079096mapred.JobClient: Reduce shuffle bytes=233587524mapred.JobClient: Spilled Records=78177976mapred.JobClient: Map output bytes=4278663275mapred.JobClient: Combine input records=371084796mapred.JobClient: Map output records=313041519mapred.JobClient: Reduce input records=15784903

Running a Job

Page 64: Hadoop Overview & Architecture

// Code Samples

JobTracker WebUI

Page 65: Hadoop Overview & Architecture

JobTracker Status

Page 66: Hadoop Overview & Architecture

Jobs Status

Page 67: Hadoop Overview & Architecture

Job Details

Page 68: Hadoop Overview & Architecture

Job Counters

Page 69: Hadoop Overview & Architecture

Job Progress

Page 70: Hadoop Overview & Architecture

All Tasks

Page 71: Hadoop Overview & Architecture

Task Details

Page 72: Hadoop Overview & Architecture

Task Counters

Page 73: Hadoop Overview & Architecture

Task Logs

Page 74: Hadoop Overview & Architecture

MapReduce Dataflow

Page 75: Hadoop Overview & Architecture

MapReduce

Page 76: Hadoop Overview & Architecture

Job Submission

Page 77: Hadoop Overview & Architecture

Initialization

Page 78: Hadoop Overview & Architecture

Scheduling

Page 79: Hadoop Overview & Architecture

Execution

Page 80: Hadoop Overview & Architecture

Map Task

Page 81: Hadoop Overview & Architecture

Sort Buffer

Page 82: Hadoop Overview & Architecture

Reduce Task

Page 83: Hadoop Overview & Architecture

82

Next Generation MapReduce

Page 84: Hadoop Overview & Architecture

MapReduce Today (Courtesy: Arun Murthy, Hortonworks)

Page 85: Hadoop Overview & Architecture

84

Why ? • Scalability Limitations today

• Maximum cluster size: 4000 nodes

• Maximum Concurrent tasks: 40,000

• Job Tracker SPOF

• Fixed map and reduce containers (slots)

• Punishes pleasantly parallel apps

Page 86: Hadoop Overview & Architecture

85

Why ? (contd) • MapReduce is not suitable for every

application

• Fine-Grained Iterative applications

• HaLoop: Hadoop in a Loop

• Message passing applications

• Graph Processing

Page 87: Hadoop Overview & Architecture

86

Requirements

• Need scalable cluster resources manager

• Separate scheduling from resource management

• Multi-Lingual Communication Protocols

Page 88: Hadoop Overview & Architecture

87

Bottom Line

• @techmilind #mrng (MapReduce, Next Gen) is in reality, #rmng (Resource Manager, Next Gen)

• Expect different programming paradigms to be implemented

• Including MPI (soon)

Page 89: Hadoop Overview & Architecture

Architecture (Courtesy: Arun Murthy, Hortonworks)

Page 90: Hadoop Overview & Architecture

89

The New World •  Resource Manager

•  Allocates resources (containers) to applications

•  Node Manager

•  Manages containers on nodes

•  Application Master

•  Specific to paradigm e.g. MapReduce application master, MPI application master etc

Page 91: Hadoop Overview & Architecture

90

Container

• In current terminology: A Task Slot

• Slice of the node’s hardware resources

• #of cores, virtual memory, disk size, disk and network bandwidth etc

• Currently, only memory usage is sliced