The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... ·...

29
The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User Group, July 11th, 2012 Freitag, 13. Juli 12

Transcript of The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... ·...

Page 1: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

The Hadoop Ecosystem& HBase

Kai Voigt, Cloudera Inc.Warsaw Hadoop User Group, July 11th, 2012

Freitag, 13. Juli 12

Page 2: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

A Hadoop ClusterFreitag, 13. Juli 12

Page 3: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Part 1:Hadoop Ecosystem

Freitag, 13. Juli 12

Page 4: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Freitag, 13. Juli 12

Page 5: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

Freitag, 13. Juli 12

Page 6: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduce

Freitag, 13. Juli 12

Page 7: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduceJava

Java

Freitag, 13. Juli 12

Page 8: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduceJava

Java

hadoop fs

CmdLine

Freitag, 13. Juli 12

Page 9: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduceJava

Java

hadoop fs

CmdLine

FUSE

Posix

Freitag, 13. Juli 12

Page 10: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduceJava

Java

Sqoop

RDBMS

hadoop fs

CmdLine

FUSE

Posix

Freitag, 13. Juli 12

Page 11: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduceJava

Java

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Freitag, 13. Juli 12

Page 12: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduceJava

Java

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

Page 13: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduce

Hive

SQL

Java

Java

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

Page 14: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduce

Hive Pig

SQL

Java

Java

Script

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

Page 15: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

MapReduce

Hive Pig Mahout

SQL

Java

Java

Script Java

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

Page 16: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

HBaseMapReduce

Hive Pig Mahout

SQL

Java

Java

Script Java

Sqoop

RDBMS

Flume

Events

Java

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

Page 17: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

HDFS

HBaseMapReduce

Hive Pig Mahout

SQL

Java

Java

Script Java

Sqoop

RDBMS

Flume

Events

Java

Oozie

Whirr

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Hue

Freitag, 13. Juli 12

Page 18: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

CDH 4.0

• Cloudera's Distribution Including Hadoop

• http://www.cloudera.com/

• Packages and Virtual Machines

• True Apache

HDFS HMapReH P MSJJ

S J

SR FlE

JOW

Freitag, 13. Juli 12

Page 19: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Part 2:Apache HBase

Freitag, 13. Juli 12

Page 20: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Data ModelRowID Col1 Col2 Col3 Col4 Col56289121219328342

aaa bbb cccddd eee 111

fff 222ggg hhh

iii jjj kkk lll 333mmm nnn

Freitag, 13. Juli 12

Page 21: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

RegionsRowID Col1 Col2 Col3 Col4 Col56289121

aaa bbb cccddd eee 111

fff 222

RowID Col1 Col2 Col3 Col4 Col5219328342

ggg hhhiii jjj kkk lll 333

mmm nnn

Freitag, 13. Juli 12

Page 22: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Column FamiliesRowID Col1 Col26289121

aaa bbbddd

RowID Col3 Col4 Col56289121

ccceee 111

fff 222

RowID Col1 Col2219328342

gggiii jjj

mmm

RowID Col3 Col4 Col5219328342

hhhkkk lll 333nnn

Freitag, 13. Juli 12

Page 23: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Multiple Versions

Foo21:09RowID: 627

ColumnName: Col7

Freitag, 13. Juli 12

Page 24: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Multiple Versions

Foo21:09RowID: 627

ColumnName: Col7

Bar22:34

Freitag, 13. Juli 12

Page 25: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Multiple Versions

Foo21:09RowID: 627

ColumnName: Col7

Bar22:34'DEL'

23:12

Freitag, 13. Juli 12

Page 26: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Multiple Versions

Foo21:09RowID: 627

ColumnName: Col7

Bar22:34'DEL'

23:12

(RowID, Columnname, Timestamp) -> Value

Freitag, 13. Juli 12

Page 27: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Simple API

• PUT 'table', 'rowid', 'column', 'value'

• GET 'table', 'rowid', 'column'

• GET 'table', 'rowid'

• DELETE 'table', 'rowid', 'column'

• DELETE 'table', 'rowid'

• SCAN 'table'

Freitag, 13. Juli 12

Page 28: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Additional Features

• MapReduce Input/Output Format

• Hive Interface

• Thrift API

• RESTful API

• Sqoop Connector

• Flume Sink

Freitag, 13. Juli 12

Page 29: The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... · 2012-07-13 · The Hadoop Ecosystem & HBase Kai Voigt, Cloudera Inc. Warsaw Hadoop User

Thank You!

Freitag, 13. Juli 12