Original_1431303662_10. Hadoop Stack (1)
-
Upload
praneeth-reddy -
Category
Documents
-
view
215 -
download
0
Transcript of Original_1431303662_10. Hadoop Stack (1)
-
7/23/2019 Original_1431303662_10. Hadoop Stack (1)
1/10
1
Copyright 2012 Tata Consultancy Services Limited
Hadoop Stack
-
7/23/2019 Original_1431303662_10. Hadoop Stack (1)
2/10
2
Significance
Analyzing large data sets involve many processes
o single component can handle all o! that
"cosystem tools together complete the analysis
Tools run on top o! Hadoop
-
7/23/2019 Original_1431303662_10. Hadoop Stack (1)
3/10
#
Pig
$s a plat!orm !or analyzing large data sets that consists o! a high%level
language !or e&pressing data analysis programs
Takes commands 'ritten in a language called (ig Latin )*ata +lo'
Language, and converts those commands into -ap.educe
Consists o! /
(ig Latin / The high%level language
A .un%time environment 'here (ig Latin programs are e&ecuted
*o not have to 'rite comple& -ap.educe algorithms using a lo'er level
computer language such as ava
Can e&ecute one (ig Latin command at a time it is !ar more common to
'rite a script o! (ig Latin commands that accomplish a complete task
-
7/23/2019 Original_1431303662_10. Hadoop Stack (1)
4/10
3
Hive
A data 'arehouse in!rastructure 4uilt on top o! Hadoop
(rovides tools to ena4le /
"asy data summarization
Ad hoc 5uerying
Analysis o! large datasets stored in Hadoop !iles
6ses an S7L%like language called Hive 7L )5uery language, that a4stracts the
-ap.educe programming model and supports typical data 'arehouse
interactions
"na4les you to avoid the comple&ities o! 'riting -ap.educe programs in a lo'erlevel computer language such as ava
-
7/23/2019 Original_1431303662_10. Hadoop Stack (1)
5/10
8
HBase
Column%oriented data4ase management system on top o! H*+S
"!!icient 'ay o! storing large 5uantities o! sparse data
(rovides !ast lookup o! data 4ecause data is stored in%memory instead o! on
disk
9ptimized !or se5uential 'rite operations and is highly e!!icient !or 4atch
inserts updates and deletes
H:ase system comprises a set o! ta4les
Allo's !or attri4utes to 4e grouped together into ;column !amiliesenerates ava classes to allo' you to interact 'ith imported data
$mports !rom S7L data4ases straight to Hive data 'arehouse
-
7/23/2019 Original_1431303662_10. Hadoop Stack (1)
7/10
?
Oozie
$s a 'ork!lo' @ coordination service to manage data processing o4s !or ApacheHadoop
Supports all types o! Hadoop o4s and is integrated 'ith the Hadoop stack
6sers can speci!y e&ecution !re5uency and can 'ait !or data arrival to trigger an
action in the 'ork!lo'
-
7/23/2019 Original_1431303662_10. Hadoop Stack (1)
8/10
B
Pentaho
$s an open source suite 'ith integrated reporting dash4oard data mining'ork!lo' and "TL capa4ilities
6sing it along 'ith Hadoop allo's/
7uick easy analytics against 4ig data
Support !or multiple Hadoop distri4utions
"asier maintenance o! solutions
"TL/ "&tract Trans!orm and Load "TL tools e&tract data !rom multiple sources
trans!orms it into ne' !ormat and loads it into target data structures
Lo'ers Technical 4arriers !or Hadoop developers
.apidly integrate Hadoop data 'ith other datatypes
$s designed !or !le&i4le deployment any'here
-
7/23/2019 Original_1431303662_10. Hadoop Stack (1)
9/10
References
http/@@developeryahoocom
Hadoop/ The *e!initive >uide%9D.eilly
http://developer.yahoo.com/http://developer.yahoo.com/ -
7/23/2019 Original_1431303662_10. Hadoop Stack (1)
10/10
Thank Eou