Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Post on 26-Jun-2015

903 views 3 download

Tags:

description

Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Transcript of Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 1

avkash@bigdataperspective.com

https://www.linkedin.com/in/avkashchauhan

Lets Start and Define Big

Data

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 2

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 3

Lets Start and

Define Big Data

How Hadoop

Fits in this scenario

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 4

http://www.packtpub.com/using-cloudera-impala/book http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802

http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx

https://www.linkedin.com/in/avkashchauhan

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 5

Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage

& processing, distributed across machines.

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 6

Flexibility A Single Repo for

storing and analyzing any kind of data not bounded by schema

Scalability Scale-out architecture

divides workload across multiple nodes using flexible

distributed file system

Low Cost Deployed on commodity

hardware & open source platform

Fault Tolerant Continue working event if node(s) go

down

A system to move computation, where the data is.

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 7

Lets Start and Define Big Data

How Hadoop

Fits in this scenario

Hadoop Landscape

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 8

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 9

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 10

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 11

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core

Components

Data Storage

Data Processing

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 12

Hadoop Common

HDFS MapReduce

/YARN

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 13

Cloud

Cloudera Impala Hortonworks Tez

Impala uses C++ based in-memory processing of HDFS data through SQL like statements to expedite the data processing

Use cases include user collaborative filtering, user recommendations, clustering and classification.

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 14

Lets Start and Define Big Data

How Hadoop Fits

in this scenario

Hadoop Landscape

Hadoop Core

Components

Applying Hadoop to Save $$

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 15

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 16

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 17

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 18

Lets Start and Define Big Data

How Hadoop Fits

in this scenario

Hadoop Landscape

Hadoop Core

Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 19

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 20

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 21

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 22

EDW

OLAP

ODS

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 23

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 24

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Big Data Analytics

With Hadoop

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 25

Amazon HDInsight Directives Data Storage S3 Azure Blobs Direct access to compute

machine to super fast data delivery

Processing EC2

Azure Compute Dedicated Machines ready to turn with specific version of Hadoop runtime

Processing Libraries Java based or any other language supported through Hadoop Streaming

.Net based code User uploads their code processing binaries/ libraries

Results S3 Azure Blobs Once job is completed the results are stored back to specific data storage used as source

Visualization Custom Custom 3rd party application can connect to storage to perform visualization

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 26

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 27

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 28

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Big Data Analytics

With Hadoop

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 29

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 30

http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx