Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
-
Upload
avkash-chauhan -
Category
Documents
-
view
903 -
download
3
description
Transcript of Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 1
https://www.linkedin.com/in/avkashchauhan
Lets Start and Define Big
Data
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 2
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 3
Lets Start and
Define Big Data
How Hadoop
Fits in this scenario
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 4
http://www.packtpub.com/using-cloudera-impala/book http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx
https://www.linkedin.com/in/avkashchauhan
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 5
Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage
& processing, distributed across machines.
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 6
Flexibility A Single Repo for
storing and analyzing any kind of data not bounded by schema
Scalability Scale-out architecture
divides workload across multiple nodes using flexible
distributed file system
Low Cost Deployed on commodity
hardware & open source platform
Fault Tolerant Continue working event if node(s) go
down
A system to move computation, where the data is.
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 7
Lets Start and Define Big Data
How Hadoop
Fits in this scenario
Hadoop Landscape
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 8
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 9
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 10
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 11
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core
Components
Data Storage
Data Processing
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 12
Hadoop Common
HDFS MapReduce
/YARN
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 13
Cloud
Cloudera Impala Hortonworks Tez
Impala uses C++ based in-memory processing of HDFS data through SQL like statements to expedite the data processing
Use cases include user collaborative filtering, user recommendations, clustering and classification.
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 14
Lets Start and Define Big Data
How Hadoop Fits
in this scenario
Hadoop Landscape
Hadoop Core
Components
Applying Hadoop to Save $$
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 15
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 16
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 17
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 18
Lets Start and Define Big Data
How Hadoop Fits
in this scenario
Hadoop Landscape
Hadoop Core
Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 19
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 20
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 21
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Big Data Analytics
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 22
EDW
OLAP
ODS
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 23
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 24
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Big Data Analytics
With Hadoop
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 25
Amazon HDInsight Directives Data Storage S3 Azure Blobs Direct access to compute
machine to super fast data delivery
Processing EC2
Azure Compute Dedicated Machines ready to turn with specific version of Hadoop runtime
Processing Libraries Java based or any other language supported through Hadoop Streaming
.Net based code User uploads their code processing binaries/ libraries
Results S3 Azure Blobs Once job is completed the results are stored back to specific data storage used as source
Visualization Custom Custom 3rd party application can connect to storage to perform visualization
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 26
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 27
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 28
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Big Data Analytics
With Hadoop
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 29
Apache Hadoop Training Series: Hadoop Introduction
10/23/14
Apache Hadoop Training Series: Hadoop Introduction 30
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx