Data mining with bigdata
-
Upload
khanfaizakram -
Category
Documents
-
view
39 -
download
1
description
Transcript of Data mining with bigdata
![Page 1: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/1.jpg)
Data mining With Big Data
Presented By:
Sandip B. Tipayle Patil
Under the Guidance of
Prof. Y.N.Patil
DEPARTMENT OF COMPUTER ENGINEERING
DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY
Lonere.
![Page 2: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/2.jpg)
Outlines Introduction
What is Big Data?
How Much Data really Exist?
Literature Review
4Vs of Big Data
Proposed System
System Architecture
Big Data mining Framework
Hadoop Framework
Big Data Challenges and solution
Conclusion
![Page 3: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/3.jpg)
Introduction
![Page 4: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/4.jpg)
Interesting Facts
The volume of business data worldwide, across all companies, doubles every 1.2 years (was 1.5 years)
Daily 2500 quadrillion of data are produced and more than 90 percentage of data are produced within past two years.
A regular person is processing daily more data than a 16th century individual in his entire life
In the last years cost of storage and processing power dropped significantly
Bad data or poor data quality costs US businesses $600 billion annually
Facebook processes 10 TB of data every day / Twitter 7 TB
Google has over 3 million servers processing over 2 trillion searches per year in 2012 (only 22 million in 2000)
![Page 5: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/5.jpg)
What is
![Page 6: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/6.jpg)
“Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.”
-- Forrester
![Page 7: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/7.jpg)
“Big Data is the frontier of a firm's ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.”
-- Forrester
Boring!
![Page 8: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/8.jpg)
“Big data is the data characterized by 3 attributes: volume, variety and velocity.”
-- IBM
![Page 9: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/9.jpg)
“Big data is the data characterized by 3 attributes: volume, variety and velocity.”
Randomwords
-- IBM
![Page 10: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/10.jpg)
Big Data is not about the size of the data,it’s about the value within the data.
![Page 11: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/11.jpg)
What is …… ?
Data Mining
‣ computational process of discovering patterns in large data sets
Big Data
The term Big data is used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.
![Page 12: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/12.jpg)
‘Big Data’ is similar to ‘small data’, but bigger
…but having data bigger it requires different approaches: Techniques, tools and architecture
…with an aim to solve new problems …or old problems in a better way
![Page 13: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/13.jpg)
How much Data does exist? 2.5 quintillion bytes of data are created EVERY DAY
IBM: 90 percent of the data in the world today were produced with past two years
Forms of Data????
Examples : Boing Jet, Scientific Data, Sensor Data, Internet Data,
![Page 14: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/14.jpg)
![Page 15: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/15.jpg)
Literature Review
Data has grown tremendously.
This large amount of data is beyond the software tools to manage.
Exploring the large volume of data and extracting useful information and knowledge is a challenge, and sometimes, it is almost infeasible.
Most people don’t know what to do with all data that they already have
![Page 16: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/16.jpg)
Giant Elephant
![Page 17: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/17.jpg)
Huge Data with heterogeneous and diverse dimensionality
‣ represent huge volume of data
Autonomous sources with distributed and decentralized control
‣ main characteristics of Big Data
Complex and evolving relationships
![Page 18: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/18.jpg)
4 Vs of Big Data
Volume• Data
quantity
Velocity• Data Speed
Variety• Data Types
Veracity• Authenticit
y
![Page 19: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/19.jpg)
Proposed System:
Identify relationships between different idea
Capable of handling Huge volume of Data
Uses distributed parallel computing with help of Hadoop
Provides platform for process data in different dimensions and summarized results.
system architecture is to be flexible enough that the components built on top of it for expressing the various kinds of processing tasks can tune it to efficiently run these different workloads.
System will process these data within reasonable cost and time limits.
![Page 20: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/20.jpg)
Gap due to Lack of analysis
![Page 21: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/21.jpg)
System Architecture:
![Page 22: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/22.jpg)
Hadoop framework :
![Page 23: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/23.jpg)
Big Data Mining framework
Big Data Mining Platform
Dig Data Semantics and Application Knowledge
I. Information Sharing and Data Privacy
II. Domain and Application Knowledge
Big Data Mining Algorithm
I. Local Learning and Model Fusion for Multiple Information Sources
II. mining from Sparse, Uncertain, and Incomplete Data
III. Mining Complex and Dynamic Data
![Page 24: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/24.jpg)
Big Data mining Framework
![Page 25: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/25.jpg)
Challenges
Location of Big Data sources- Commonly Big Data are stored in different locations
Volume of the Big Data- size of the Big Data grows continuously.
Hardware resources- RAM capacityPrivacy- Medical reports, bank transactionsHaving domain knowledgeGetting meaningful information
![Page 26: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/26.jpg)
Solutions
Parallel computing programmingAn efficient platform for computing will not
have centralized data storage instead of that platform will be distributed in big scale storage.
Restricting access to the data
![Page 27: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/27.jpg)
Advantages:
Fast response
Extract useful information
Prediction of required data from large amount of data.
Savour of better results in the form of visualization.
![Page 28: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/28.jpg)
Conclusion
We have entered an era of Big Data. Through better analysis of the large volumes of data that are becoming available, there is the potential for making faster advances in many scientific and improving the profitability and success of many enterprises by using technologies like hadoop ,pig and so on.
Proposed system will fully serviceable across a large variety of application domains, and therefore not cost-effective to address in the context of one domain alone.
Furthermore, this system will provide fully transformative solutions, and will be address naturally for the next generation of industrial applications. We must support and encourage this proposed framework towards addressing these technical challenges of unstructured data, if we are to achieve the promised benefits of Big Data.
![Page 29: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/29.jpg)
![Page 30: Data mining with bigdata](https://reader035.fdocuments.in/reader035/viewer/2022062516/55cf9059550346703ba5188b/html5/thumbnails/30.jpg)