HDFS and Oracle

download HDFS and Oracle

of 17

Transcript of HDFS and Oracle

  • 8/10/2019 HDFS and Oracle

    1/17

    HDFS Hadoop Distributed File SystemIntroduction

    Johan Louwers Lead Architect Oracle Technology

  • 8/10/2019 HDFS and Oracle

    2/17

    2Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS Hadoop Distributed File System

    The Hadoop Dis tr ibu ted Fi le Sys tem (HDFS) is a distributed file system designed to run on commodityhardware. It has many similarities with existing distributed file systems. However, the differences fromother distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployedon low-cost hardware. HDFS provides high throughput access to application data and is suitable forapplications that have large data sets. HDFS relaxes a few POSIX requirements to enable streamingaccess to file system data. HDFS was originally built as infrastructure for the Apache Nutch web searchengine project. HDFS is now an Apache Hadoop subproject. The project URLis http://hadoop.apache.org/hdfs/ .

    http://hadoop.apache.org/hdfs/http://hadoop.apache.org/hdfs/
  • 8/10/2019 HDFS and Oracle

    3/17

    3Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS Simple Cluster Setup

    Simple HDFS Cluster Setup

    A) HDFS cluster consisting out of a number ofcommodity servers.

    B) A single server containing both a namenode and a data node

    C) Multiple servers containing a data node

    B

    C

    A

  • 8/10/2019 HDFS and Oracle

    4/17

  • 8/10/2019 HDFS and Oracle

    5/17

    5Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS introduction

    HDFS Storage

    A (large) file is chopped into blocks.

    Blocks are written to the different data nodesin the cluster.

    The name node keeps track of which block iswritten to which node.

  • 8/10/2019 HDFS and Oracle

    6/17

    6Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS introduction

    On startup, the NameNode enters a special statecalled Safemode. Replication of data blocks doesnot occur when the NameNode is in the Safemodestate.

    HDFS Storage

    Data blocks are replicated over different nodesin the cluster to ensure availability when a nodefails.

    Level of replication is by default 3. Configuredwith the dfs.replication variable in the HDFSconfiguration

  • 8/10/2019 HDFS and Oracle

    7/177Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS introduction

    HDFS Storage

    When operating a large cluster ensure thatyou have enabled the rack aware option.

    Refer to the HADOOP-692 improvement formore details: http://goo.gl/dQ012n

    Thanks to ChrisDag for the image

    Typically large Hadoop clusters are arranged in racksand network traffic between different nodes with in thesame rack is much more desirable than network trafficacross the racks. In addition NameNode tries to placereplicas of block on multiple racks for improved faulttolerance.

    http://goo.gl/dQ012nhttps://www.flickr.com/photos/chrisdag/https://www.flickr.com/photos/chrisdag/http://goo.gl/dQ012n
  • 8/10/2019 HDFS and Oracle

    8/178Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS Oracle & Big Data

    Oracle Big Data Appliance Introduction

    Oracle Big Data Appliance is a high-performance, secure platform for runningdiverse workloads on Hadoop and NoSQLsystems.

  • 8/10/2019 HDFS and Oracle

    9/179Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS Oracle & Big Data

    Oracle Big Data Appliance Introduction

    Oracle Big Data Appliance includes (almostwithout the need to say it) a HDFS storagecomponent for storing data.

  • 8/10/2019 HDFS and Oracle

    10/1710Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS Oracle & Big Data

    Oracle & Hadoop

    Oracle XQuery for Hadoop

  • 8/10/2019 HDFS and Oracle

    11/1711Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS Oracle & Big Data

    Oracle & Hadoop

    Oracle SQL connector for HDFS

  • 8/10/2019 HDFS and Oracle

    12/1712Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS Oracle & Big Data

    Oracle & Hadoop

    Oracle Loader for HadoopOnline modeOffline mode

  • 8/10/2019 HDFS and Oracle

    13/17

  • 8/10/2019 HDFS and Oracle

    14/1714Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS Oracle & Big Data

    Oracle & Hadoop

    Oracle Big Data SQL

  • 8/10/2019 HDFS and Oracle

    15/1715Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    HDFS Oracle & Big Data

    Oracle & Hadoop

    Oracle Big Data SQL

  • 8/10/2019 HDFS and Oracle

    16/1716Copyright 2014 Capgemini. All rights reserved.

    Hadoop HDFS introduction

    Contact me

    Johan LouwersCapgemini Lead Architect Oracle Technology

    Mail : [email protected] Twitter : @johanlouwers Blog 1 : http://www.capgemini.com/blog/capgemini-oracle-blog Blog 2 : http://johanlouwers.blogspot.com

  • 8/10/2019 HDFS and Oracle

    17/17

    The information contained in this presentation is proprietary. 2014 Capgemini. All rights reserved.

    Rightshore is a trademark belonging to Capgemini.

    www.capgemini.com

    About Capgemini With almost 140,000 people in over 40 countries, Capgemini isone of the world's foremost providers of consulting, technologyand outsourcing services. The Group reported 2013 globalrevenues of EUR 10.1 billion.

    Together with its clients, Capgemini creates and deliversbusiness and technology solutions that fit their needs and drivethe results they want. A deeply multicultural organization,Capgemini has developed its own way of working, theCollaborative Business Experience , and draws onRightshore , its worldwide delivery model.

    Learn more about us at www.capgemini.com .

    http://www.capgemini.com/http://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.capgemini.com/about/how-we-work/rightshorerhttp://www.capgemini.com/about/how-we-work/rightshorerhttp://www.capgemini.com/http://www.capgemini.com/http://www.capgemini.com/http://www.capgemini.com/http://www.capgemini.com/http://www.capgemini.com/http://www.capgemini.com/about/how-we-work/rightshorerhttp://www.capgemini.com/about/how-we-work/rightshorerhttp://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.capgemini.com/about/how-we-work/the-collaborative-business-experiencetmhttp://www.slideshare.net/capgeminihttp://www.youtube.com/capgeminihttp://www.twitter.com/capgeminihttp://www.linkedin.com/company/capgeminihttp://www.facebook.com/Capgeminihttp://www.capgemini.com/