Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Is Hadoop a necessity for Data Science
-
Upload
edureka -
Category
Technology
-
view
1.048 -
download
0
Transcript of Is Hadoop a necessity for Data Science
![Page 1: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/1.jpg)
www.edureka.co/r-for-analyticswww.edureka.co/big-data-and-hadoop
Is Hadoop a necessity for data Science ?
![Page 2: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/2.jpg)
Slide 2Slide 2Slide 2 www.edureka.co/big-data-and-hadoop
Today we will take you through the following:
What is Big Data & Hadoop?
What is a Data Product?
What is Data Science?
Why Hadoop for Data Science?
Is Hadoop a necessity for Data Science?
AGENDA
![Page 3: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/3.jpg)
Slide 3Slide 3Slide 3 www.edureka.co/big-data-and-hadoop
What is Big Data & Hadoop?
![Page 4: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/4.jpg)
Slide 4Slide 4Slide 4 www.edureka.co/big-data-and-hadoop
BIG DATABig data is a popular term used to describe the exponential growth of data.
Big Data can be either Structured data or Unstructured data or a combination of both.
Big Data
![Page 5: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/5.jpg)
Slide 5Slide 5Slide 5 www.edureka.co/big-data-and-hadoop
BIG DATA
3 V’s (Volume, Variety and Velocity) are three defining properties or dimensions of Big Data.
![Page 6: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/6.jpg)
Slide 6Slide 6Slide 6 www.edureka.co/big-data-and-hadoop
HADOOP
Hadoop is a programming framework
that supports the processing of large
data sets in a distributed computing
environment.
Hadoop was the first and still the best tool to handle Big Data.
![Page 7: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/7.jpg)
Slide 7Slide 7Slide 7 www.edureka.co/big-data-and-hadoop
A BRIEF HISTORY OF HADOOP
![Page 8: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/8.jpg)
Slide 8Slide 8Slide 8 www.edureka.co/big-data-and-hadoop
HADOOP:- HDFS & MAP-REDUCE
Most efficient for Large-Scale Storage & Processing
HDFS: Distributed file system Self-Healing Data store
MAP-REDUCE: Distributed computation framework that handles the complexities of distributed programming
![Page 9: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/9.jpg)
Slide 9Slide 9Slide 9 www.edureka.co/big-data-and-hadoop
KEY TO HADOOP’S POWER Computation co-located with data
Data and computation system co-designed and co-developed to work together
Process data in parallel across thousands of “commodity” hardware nodesSelf-healing; failure handled by software
Designed for one write and multiple readsThere are no random writesOptimized for minimum seek on hard drives
![Page 10: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/10.jpg)
Slide 10Slide 10Slide 10 www.edureka.co/big-data-and-hadoop
What is a Data product?
“A software system whose core functionality depends on the application of statistical analysis and machine learning to data.”
![Page 11: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/11.jpg)
Slide 11Slide 11Slide 11 www.edureka.co/big-data-and-hadoop
Example #1: People you may know
![Page 12: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/12.jpg)
Slide 12Slide 12Slide 12 www.edureka.co/big-data-and-hadoop
Example #2: Spell Correction
![Page 13: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/13.jpg)
Slide 13Slide 13Slide 13 www.edureka.co/big-data-and-hadoop
What is
Data Science?
![Page 14: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/14.jpg)
Slide 14Slide 14Slide 14 www.edureka.co/big-data-and-hadoop
DATA SCIENCE#1: Extracting deep meaning from data(data mining; finding “gems” in data)
![Page 15: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/15.jpg)
Slide 15Slide 15Slide 15 www.edureka.co/big-data-and-hadoop
Common Data Science tasks
![Page 16: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/16.jpg)
Slide 16Slide 16Slide 16 www.edureka.co/big-data-and-hadoop
DATA SCIENCE
#2: Building Data Products(Delivering Gems on a regular basis)
![Page 17: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/17.jpg)
Slide 17Slide 17Slide 17 www.edureka.co/big-data-and-hadoop
Why HADOOP for DATA SCIENCE?
Reason #1:
Explore full datasets
![Page 18: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/18.jpg)
Slide 18Slide 18Slide 18 www.edureka.co/big-data-and-hadoop
#1: Exploration of Data sets
![Page 19: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/19.jpg)
Slide 19Slide 19Slide 19 www.edureka.co/big-data-and-hadoop
Why HADOOP for DATA SCIENCE?
Reason #2:
Mining of larger datasets
![Page 20: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/20.jpg)
Slide 20Slide 20Slide 20 www.edureka.co/big-data-and-hadoop
#2: Mining of larger data sets
More Data ---> Better Outcomes
![Page 21: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/21.jpg)
Slide 21Slide 21Slide 21 www.edureka.co/big-data-and-hadoop
Why HADOOP for DATA SCIENCE?
Reason #3: Large-scale data preparation
![Page 22: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/22.jpg)
Slide 22Slide 22Slide 22 www.edureka.co/big-data-and-hadoop
#3: Large-Scale Data preparation
80% of data science work is data preparation
![Page 23: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/23.jpg)
Slide 23Slide 23Slide 23 www.edureka.co/big-data-and-hadoop
Reason #4: Accelerate data-driven innovation
Why HADOOP for DATA SCIENCE?
![Page 24: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/24.jpg)
Slide 24Slide 24Slide 24 www.edureka.co/big-data-and-hadoop
Speed Barriers of traditional Data Architectures
![Page 25: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/25.jpg)
Slide 25Slide 25Slide 25 www.edureka.co/big-data-and-hadoop
“Schema on read” means faster time-to-innovation
![Page 26: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/26.jpg)
Demo
![Page 27: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/27.jpg)
Questions
Slide 27
![Page 28: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/28.jpg)
Slide 28
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!
Please spare few minutes to take the survey after the webinar.
SURVEY
![Page 29: Is Hadoop a necessity for Data Science](https://reader035.fdocuments.in/reader035/viewer/2022062503/58a39ddd1a28abb1348b6553/html5/thumbnails/29.jpg)