Is Hadoop a Necessity for Data Science
-
Upload
edureka -
Category
Technology
-
view
383 -
download
0
Transcript of Is Hadoop a Necessity for Data Science
![Page 1: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/1.jpg)
Is Hadoop a necessity for Data Science?
![Page 2: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/2.jpg)
What will you learn today?
Let us have a quick poll, do you know the following topics?
What is Big Data & Hadoop?
What is a Data Product?
What is Data Science?
Why Hadoop for Data Science?
Is Hadoop a necessity for Data Science?
![Page 3: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/3.jpg)
What is Big Data & Hadoop?
![Page 4: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/4.jpg)
Big data is a popular term used to describe the exponential growth of data.
Big Data can be either Structured data or Unstructured data or a combination of both.
BIG DATA
![Page 5: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/5.jpg)
BIG DATA
3 V’s(Volume, Variety and Velocity) are three defining properties or dimensions of Big Data.
![Page 6: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/6.jpg)
HADOOP
Hadoop is a programming framework that supports the processing of large
data sets in a distributed computing environment.
Hadoop was the first and still the best tool to handle Big Data
![Page 7: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/7.jpg)
A BRIEF HISTORY OF HADOOP
![Page 8: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/8.jpg)
HADOOP:- HDFS & MAP-REDUCE
Most efficient for Large-Scale Storage & Processing
HDFS: Distributed file system & a Self-Healing Data store
MAP-REDUCE: Distributed computation framework that handles the complexities of distributed programming
![Page 9: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/9.jpg)
KEY TO HADOOP’S POWER
Computation co-located with data Data and computation system co-designed and co-developed to work
together
Process data in parallel across thousands of “commodity” hardware nodes Self-healing; failure handled by software
Designed for one write and multiple reads There are no random writes Optimized for minimum seek on hard drives
![Page 10: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/10.jpg)
What is a Data Product?
![Page 11: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/11.jpg)
Data product?
“A software system whose core functionality depends on the application of statistical
analysis and machine learning to data.”
![Page 12: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/12.jpg)
Example #1: People you may know
![Page 13: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/13.jpg)
Example #2: Spell Correction
![Page 14: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/14.jpg)
What is Data Science?
![Page 15: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/15.jpg)
DATA SCIENCE
#1: Extracting deep meaning from data
(data mining; finding “gems” in data)
![Page 16: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/16.jpg)
Common Data Science tasks
![Page 17: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/17.jpg)
DATA SCIENCE
#2: Building Data Products(Delivering Gems on a regular basis)
![Page 18: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/18.jpg)
Why Hadoop for Data Science?
![Page 19: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/19.jpg)
Reason #1: Explore the entire Dataset
![Page 20: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/20.jpg)
Reason #2: Mining of larger Datasets
More Data ---> Better Outcomes
![Page 21: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/21.jpg)
Reason #3: Large-scale Data-Preparation
80% of data science work is data preparation
![Page 22: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/22.jpg)
Reason #4: Accelerate data-driven innovation
Speed Barriers of traditional Data Architectures
![Page 23: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/23.jpg)
Reason #4: Accelerate Data-driven Innovation
“Schema on read” means faster time-to-innovation
![Page 24: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/24.jpg)
Demo
![Page 25: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/25.jpg)
Survey
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!
Please spare few minutes to take the survey after the webinar.
![Page 26: Is Hadoop a Necessity for Data Science](https://reader034.fdocuments.in/reader034/viewer/2022042906/58a176651a28ab04278b5c49/html5/thumbnails/26.jpg)
Thank You
Questions/Queries/Feedback
Recording and presentation will be made available to you within 24 hours