Is Hadoop a necessity for Data Science

29
www.edureka.co/r-for-analyti www.edureka.co/big-data-and-hadoop Is Hadoop a necessity for data Science ?

Transcript of Is Hadoop a necessity for Data Science

Page 1: Is Hadoop a necessity for Data Science

www.edureka.co/r-for-analyticswww.edureka.co/big-data-and-hadoop

Is Hadoop a necessity for data Science ?

Page 2: Is Hadoop a necessity for Data Science

Slide 2Slide 2Slide 2 www.edureka.co/big-data-and-hadoop

Today we will take you through the following:

What is Big Data & Hadoop?

What is a Data Product?

What is Data Science?

Why Hadoop for Data Science?

Is Hadoop a necessity for Data Science?

AGENDA

Page 3: Is Hadoop a necessity for Data Science

Slide 3Slide 3Slide 3 www.edureka.co/big-data-and-hadoop

What is Big Data & Hadoop?

Page 4: Is Hadoop a necessity for Data Science

Slide 4Slide 4Slide 4 www.edureka.co/big-data-and-hadoop

BIG DATABig data is a popular term used to describe the exponential growth of data.

Big Data can be either Structured data or Unstructured data or a combination of both.

Big Data

Page 5: Is Hadoop a necessity for Data Science

Slide 5Slide 5Slide 5 www.edureka.co/big-data-and-hadoop

BIG DATA

3 V’s (Volume, Variety and Velocity) are three defining properties or dimensions of Big Data.

Page 6: Is Hadoop a necessity for Data Science

Slide 6Slide 6Slide 6 www.edureka.co/big-data-and-hadoop

HADOOP

Hadoop is a programming framework

that supports the processing of large

data sets in a distributed computing

environment.

Hadoop was the first and still the best tool to handle Big Data.

Page 7: Is Hadoop a necessity for Data Science

Slide 7Slide 7Slide 7 www.edureka.co/big-data-and-hadoop

A BRIEF HISTORY OF HADOOP

Page 8: Is Hadoop a necessity for Data Science

Slide 8Slide 8Slide 8 www.edureka.co/big-data-and-hadoop

HADOOP:- HDFS & MAP-REDUCE

Most efficient for Large-Scale Storage & Processing

HDFS: Distributed file system Self-Healing Data store

MAP-REDUCE: Distributed computation framework that handles the complexities of distributed programming

Page 9: Is Hadoop a necessity for Data Science

Slide 9Slide 9Slide 9 www.edureka.co/big-data-and-hadoop

KEY TO HADOOP’S POWER Computation co-located with data

Data and computation system co-designed and co-developed to work together

Process data in parallel across thousands of “commodity” hardware nodesSelf-healing; failure handled by software

Designed for one write and multiple readsThere are no random writesOptimized for minimum seek on hard drives

Page 10: Is Hadoop a necessity for Data Science

Slide 10Slide 10Slide 10 www.edureka.co/big-data-and-hadoop

What is a Data product?

“A software system whose core functionality depends on the application of statistical analysis and machine learning to data.”

Page 11: Is Hadoop a necessity for Data Science

Slide 11Slide 11Slide 11 www.edureka.co/big-data-and-hadoop

Example #1: People you may know

Page 12: Is Hadoop a necessity for Data Science

Slide 12Slide 12Slide 12 www.edureka.co/big-data-and-hadoop

Example #2: Spell Correction

Page 13: Is Hadoop a necessity for Data Science

Slide 13Slide 13Slide 13 www.edureka.co/big-data-and-hadoop

What is

Data Science?

Page 14: Is Hadoop a necessity for Data Science

Slide 14Slide 14Slide 14 www.edureka.co/big-data-and-hadoop

DATA SCIENCE#1: Extracting deep meaning from data(data mining; finding “gems” in data)

Page 15: Is Hadoop a necessity for Data Science

Slide 15Slide 15Slide 15 www.edureka.co/big-data-and-hadoop

Common Data Science tasks

Page 16: Is Hadoop a necessity for Data Science

Slide 16Slide 16Slide 16 www.edureka.co/big-data-and-hadoop

DATA SCIENCE

#2: Building Data Products(Delivering Gems on a regular basis)

Page 17: Is Hadoop a necessity for Data Science

Slide 17Slide 17Slide 17 www.edureka.co/big-data-and-hadoop

Why HADOOP for DATA SCIENCE?

Reason #1:

Explore full datasets

Page 18: Is Hadoop a necessity for Data Science

Slide 18Slide 18Slide 18 www.edureka.co/big-data-and-hadoop

#1: Exploration of Data sets

Page 19: Is Hadoop a necessity for Data Science

Slide 19Slide 19Slide 19 www.edureka.co/big-data-and-hadoop

Why HADOOP for DATA SCIENCE?

Reason #2:

Mining of larger datasets

Page 20: Is Hadoop a necessity for Data Science

Slide 20Slide 20Slide 20 www.edureka.co/big-data-and-hadoop

#2: Mining of larger data sets

More Data ---> Better Outcomes

Page 21: Is Hadoop a necessity for Data Science

Slide 21Slide 21Slide 21 www.edureka.co/big-data-and-hadoop

Why HADOOP for DATA SCIENCE?

Reason #3: Large-scale data preparation

Page 22: Is Hadoop a necessity for Data Science

Slide 22Slide 22Slide 22 www.edureka.co/big-data-and-hadoop

#3: Large-Scale Data preparation

80% of data science work is data preparation

Page 23: Is Hadoop a necessity for Data Science

Slide 23Slide 23Slide 23 www.edureka.co/big-data-and-hadoop

Reason #4: Accelerate data-driven innovation

Why HADOOP for DATA SCIENCE?

Page 24: Is Hadoop a necessity for Data Science

Slide 24Slide 24Slide 24 www.edureka.co/big-data-and-hadoop

Speed Barriers of traditional Data Architectures

Page 25: Is Hadoop a necessity for Data Science

Slide 25Slide 25Slide 25 www.edureka.co/big-data-and-hadoop

“Schema on read” means faster time-to-innovation

Page 26: Is Hadoop a necessity for Data Science

Demo

Page 27: Is Hadoop a necessity for Data Science

Questions

Slide 27

Page 28: Is Hadoop a necessity for Data Science

Slide 28

Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!

Please spare few minutes to take the survey after the webinar.

SURVEY

Page 29: Is Hadoop a necessity for Data Science