Original_1431303662_10. Hadoop Stack (1)

download Original_1431303662_10. Hadoop Stack (1)

of 10

Transcript of Original_1431303662_10. Hadoop Stack (1)

  • 7/23/2019 Original_1431303662_10. Hadoop Stack (1)

    1/10

    1

    Copyright 2012 Tata Consultancy Services Limited

    Hadoop Stack

  • 7/23/2019 Original_1431303662_10. Hadoop Stack (1)

    2/10

    2

    Significance

    Analyzing large data sets involve many processes

    o single component can handle all o! that

    "cosystem tools together complete the analysis

    Tools run on top o! Hadoop

  • 7/23/2019 Original_1431303662_10. Hadoop Stack (1)

    3/10

    #

    Pig

    $s a plat!orm !or analyzing large data sets that consists o! a high%level

    language !or e&pressing data analysis programs

    Takes commands 'ritten in a language called (ig Latin )*ata +lo'

    Language, and converts those commands into -ap.educe

    Consists o! /

    (ig Latin / The high%level language

    A .un%time environment 'here (ig Latin programs are e&ecuted

    *o not have to 'rite comple& -ap.educe algorithms using a lo'er level

    computer language such as ava

    Can e&ecute one (ig Latin command at a time it is !ar more common to

    'rite a script o! (ig Latin commands that accomplish a complete task

  • 7/23/2019 Original_1431303662_10. Hadoop Stack (1)

    4/10

    3

    Hive

    A data 'arehouse in!rastructure 4uilt on top o! Hadoop

    (rovides tools to ena4le /

    "asy data summarization

    Ad hoc 5uerying

    Analysis o! large datasets stored in Hadoop !iles

    6ses an S7L%like language called Hive 7L )5uery language, that a4stracts the

    -ap.educe programming model and supports typical data 'arehouse

    interactions

    "na4les you to avoid the comple&ities o! 'riting -ap.educe programs in a lo'erlevel computer language such as ava

  • 7/23/2019 Original_1431303662_10. Hadoop Stack (1)

    5/10

    8

    HBase

    Column%oriented data4ase management system on top o! H*+S

    "!!icient 'ay o! storing large 5uantities o! sparse data

    (rovides !ast lookup o! data 4ecause data is stored in%memory instead o! on

    disk

    9ptimized !or se5uential 'rite operations and is highly e!!icient !or 4atch

    inserts updates and deletes

    H:ase system comprises a set o! ta4les

    Allo's !or attri4utes to 4e grouped together into ;column !amiliesenerates ava classes to allo' you to interact 'ith imported data

    $mports !rom S7L data4ases straight to Hive data 'arehouse

  • 7/23/2019 Original_1431303662_10. Hadoop Stack (1)

    7/10

    ?

    Oozie

    $s a 'ork!lo' @ coordination service to manage data processing o4s !or ApacheHadoop

    Supports all types o! Hadoop o4s and is integrated 'ith the Hadoop stack

    6sers can speci!y e&ecution !re5uency and can 'ait !or data arrival to trigger an

    action in the 'ork!lo'

  • 7/23/2019 Original_1431303662_10. Hadoop Stack (1)

    8/10

    B

    Pentaho

    $s an open source suite 'ith integrated reporting dash4oard data mining'ork!lo' and "TL capa4ilities

    6sing it along 'ith Hadoop allo's/

    7uick easy analytics against 4ig data

    Support !or multiple Hadoop distri4utions

    "asier maintenance o! solutions

    "TL/ "&tract Trans!orm and Load "TL tools e&tract data !rom multiple sources

    trans!orms it into ne' !ormat and loads it into target data structures

    Lo'ers Technical 4arriers !or Hadoop developers

    .apidly integrate Hadoop data 'ith other datatypes

    $s designed !or !le&i4le deployment any'here

  • 7/23/2019 Original_1431303662_10. Hadoop Stack (1)

    9/10

    References

    http/@@developeryahoocom

    Hadoop/ The *e!initive >uide%9D.eilly

    http://developer.yahoo.com/http://developer.yahoo.com/
  • 7/23/2019 Original_1431303662_10. Hadoop Stack (1)

    10/10

    Thank Eou