Hadoop 101: North East Wisconsin Code Camp
-
Upload
jim-argeropoulos -
Category
Technology
-
view
121 -
download
1
Transcript of Hadoop 101: North East Wisconsin Code Camp
HADOOP
101Cluster Computing Made Easy
Show of Hands
Big Data
Big Data
Volume
Variety
Velocity
Common Types of Analysis
Text mining
Index building
Graph creation and analysis
Pattern recognition
Collaborative filtering
Prediction Models
Sentiment Analysis
Risk Assessment
Hadoop
Hadoop is a cluster storage and computing
framework.
Changing of the Guard
“Scale out guarantees that
hardware and software will
fail”
“I don’t want to see anymore
2001 papers about awesome
my IT team was because they
could reshard my database
on demand.”
Storage
A
B
A
A
A
B
B
B
Storage
A
B
A
A
A
B
B
B
Tunneling Through the Cost
Barrier
Solutions
Solutions
Solutions
“In pioneer days they
used oxen for heavy
pulling, and when one ox
couldn’t budge a log, we
didn’t try to grow a larger
ox. We shouldn’t be trying
for bigger computers, but
for more systems of
computers.”
Cluster Computing
Complexities
Process management
Communication
Data movement
Task coordination
Partial failures
Scheduling
Tracking
Cluster Computing
Complexities
Process management
Communication
Data movement
Task coordination
Partial failures
Scheduling
Tracking
RobustnessResiliencePerformanceSimplicity
Where Do You Fit?
Input Split 1
Shuffle and Sort
Record
Reader
Output Format
Reducer
Mapper
Partitioner
Output File
Input Split 2
Record
Reader
Mapper
Partitioner
Input Split n
Record
Reader
Mapper
Partitioner
Output Format
Reducer
Output File
Storage
A
B
A
A
A
B
B
B
Where Do You Fit?
Input Split A
Shuffle and Sort
Record
Reader
Output Format
Reducer
Mapper
Partitioner
Output File
Input Split B
Record
Reader
Mapper
Partitioner
Output Format
Reducer
Output File
Mapper Purpose
Sanitize Data
Select Subsets
Convert
Input Split A
Record
Reader
Mapper
Partitioner
Mapper
Input:
Key
Value
Context
Output:
Key
Value
Input Split A
Record
Reader
Mapper
Partitioner
Mapper
Word Count Mapper
Input: (Long, Text)
Key: 0
Value: “the cat sat on the mat”
Output: (Text, Long)
Key Value
the 1
cat 1
sat 1
on 1
the 1
mat 1
Where Do You Fit?
Input Split A
Shuffle and Sort
Record
Reader
Output Format
Reducer
Mapper
Partitioner
Output File
Input Split B
Record
Reader
Mapper
Partitioner
Output Format
Reducer
Output File
Reducer
Input:
Key
Values // This is an iterable
Context
Output:
Key
Value
Reducer
Key Values
cat 1
mat 1
on 1
sat 1
the 1, 1
cat 1
mat 1
on 1
sat 1
the 2
Reducer
reduce(){
}
part-r-00001
Demo
MRUnit
Mapper
Reducer
Run the whole cycle
Platform
Bibliography
Rear Admiral Hopper http://www.youtube.com/watch?v=1-
vcErOPofQ
Mike Olson talk http://web.archive.org/web/20130729201323id_/http://itc.conversationsnetw
ork.org/shows/detail4868.html
Large Scale C++ by John Lakos http://www.amazon.com/Large-
Scale-Software-Design-John-Lakos/dp/0201633620
Jim Argeropoulos
@exploremqt
https://github.com/exploremqt