Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

30
Apache Hadoop Training Series: Hadoop Introduction 10/23/14 Apache Hadoop Training Series: Hadoop Introduction 1 [email protected] https://www.linkedin.com/in/avkashchauhan Lets Start and Define Big Data

description

Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Transcript of Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Page 1: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 1

[email protected]

https://www.linkedin.com/in/avkashchauhan

Lets Start and Define Big

Data

Page 2: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 2

Page 3: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 3

Lets Start and

Define Big Data

How Hadoop

Fits in this scenario

Page 4: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 4

http://www.packtpub.com/using-cloudera-impala/book http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802

http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx

https://www.linkedin.com/in/avkashchauhan

Page 5: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 5

Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage

& processing, distributed across machines.

Page 6: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 6

Flexibility A Single Repo for

storing and analyzing any kind of data not bounded by schema

Scalability Scale-out architecture

divides workload across multiple nodes using flexible

distributed file system

Low Cost Deployed on commodity

hardware & open source platform

Fault Tolerant Continue working event if node(s) go

down

A system to move computation, where the data is.

Page 7: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 7

Lets Start and Define Big Data

How Hadoop

Fits in this scenario

Hadoop Landscape

Page 8: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 8

Page 9: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 9

Page 10: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 10

Page 11: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 11

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core

Components

Data Storage

Data Processing

Page 12: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 12

Hadoop Common

HDFS MapReduce

/YARN

Page 13: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 13

Cloud

Cloudera Impala Hortonworks Tez

Impala uses C++ based in-memory processing of HDFS data through SQL like statements to expedite the data processing

Use cases include user collaborative filtering, user recommendations, clustering and classification.

Page 14: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 14

Lets Start and Define Big Data

How Hadoop Fits

in this scenario

Hadoop Landscape

Hadoop Core

Components

Applying Hadoop to Save $$

Page 15: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 15

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Page 16: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 16

Page 17: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 17

Page 18: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 18

Lets Start and Define Big Data

How Hadoop Fits

in this scenario

Hadoop Landscape

Hadoop Core

Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Page 19: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 19

Page 20: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 20

Page 21: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 21

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Big Data Analytics

Page 22: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 22

EDW

OLAP

ODS

Page 23: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 23

Page 24: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 24

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Big Data Analytics

With Hadoop

Page 25: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 25

Amazon HDInsight Directives Data Storage S3 Azure Blobs Direct access to compute

machine to super fast data delivery

Processing EC2

Azure Compute Dedicated Machines ready to turn with specific version of Hadoop runtime

Processing Libraries Java based or any other language supported through Hadoop Streaming

.Net based code User uploads their code processing binaries/ libraries

Results S3 Azure Blobs Once job is completed the results are stored back to specific data storage used as source

Visualization Custom Custom 3rd party application can connect to storage to perform visualization

Page 26: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 26

Page 27: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 27

Page 28: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 28

Lets Start and Define Big Data

How Hadoop Fits in this scenario

Hadoop Landscape

Hadoop Core Components

Applying Hadoop to Save $$

Concept of Data Lake

Hadoop in Cloud

Big Data Analytics

With Hadoop

Page 29: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 29

Page 30: Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics

Apache Hadoop Training Series: Hadoop Introduction

10/23/14

Apache Hadoop Training Series: Hadoop Introduction 30

http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx