Big Data

25
Oleh : Andrew B. Osmond

description

Pengantar tentang teknologi Big Data, Hadoop, dan data.

Transcript of Big Data

  • Oleh :

    Andrew B. Osmond

  • About Me

    FB : http://facebook.com/ab.osmond

    Kantor : Ged. N202, Fakultas Teknik Elektro, Universitas Telkom

    Gmail : [email protected]

    Tel-U : [email protected]

  • Why Data is so Big?

  • Why Data is so Big?

  • Why Data is so Big?

  • Data Anywhere

    Big Data refers to massive, often unstructured data that is beyond the processing capabilities of traditional data management tools.

    Big Data can take up terabytes and petabytes of storage space in diverse formats including text, video, sound, images etc.

    Traditional relational database management systems cannot deal with such large masses of data.

  • Data Anywhere

  • What can we do with the big data?

  • Big Data Architecture

  • Nature of Data

  • Working With Data

    Datasource

    Data Scrubbing

    Data Formats

  • Datasource

  • Open Data

    Open data is data that can be used, re-use, and redistributed freely by anyone for any purpose.

    Example : World Health Organization is available at

    http://www.who.int/research/en/

    Machine Learning Datasets is available at http://bitly.com/bundles/bigmlcom/2

    The World Bank is available at http://data.worldbank.org/

    Hilary Mason research-quality datasets is available at https://bitly.com/bundles/hmason/1

  • Text Files

    commonly used for storage of data, because it is easy to transform into different formats, and it is often easier to recover and continue processing the remaining contents than with other formats.

  • SQL Database

  • NoSQL Database

    Document Store

    http://www.mongodb.com mongodb, http://couchdb.apache.org/ couchdb

    Key value store

    Apache Cassandra, Dynamo, Hbase, Amazon SimpleDB

    Graph-based store

    Neo4j, InfoGrid, Horton

  • Document Store

  • Key Value Store

  • Graph Store

  • Leading Technologies

    Relational databases failed to store and process Big Data.

    As a result, a new class of big data technology has emerged and is being used in many big data analytics environments.

    The technology include : Hadoop, MapReduce, NoSQL

  • Hadoop

    Opensource framework

    Java based programming framework

    Processing and storing large of datasets

    Distributed Computing Environment

    Components : HDFS, MapReduce

  • Hadoop SQL

    Data is stored in form of compressed files across n number of commodity servers

    Data is stored in form of tables and columns with relation in them

    Fault tolerant if one node fails ,system still work

    If any one node crashes ,it gives error so as to maintain consistency

  • Map Reduce

    programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

    Hadoop is the physical implementation of Mapreduce .

    It is combination of 2 java functions : Mapper() and Reducer()

  • Map Reduce Algorithm