Data mining process

Post on 26-Jan-2015

130 views 5 download

description

Data Mining process explained in brief and simple terminology.

Transcript of Data mining process

Data Mining ProcessByPrithvi Raj, Vineesha,Varun, Swaroop, Saranya

Content Layout• Introduction: What is Data Mining?

– Invention: Why is Data Mining?

– Evolution of database technologies

• Applications

• Steps Involved

• Data Mining Techniques

• Data Mining Tools

Introduction

On a commercial POV, a lot of data is being collected and stored

• Web data, e-commerce• Grocery stores billings• Bank/ Credit card transactions• Railway Bookings etc.

These minor data entries can be very important at times (crime investigations, return of products etc)

DefinitionMany Definitions

Extraction of implicit, previously unknown and potentially useful information from data

Exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order

to discover meaningful patterns

What is data mining?Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases.

Alternative names: Knowledge Discovery in Databases(KDD), knowledge extraction, data/ pattern analysis, data archeology, data dredging, information harvesting, business intelligence etc.

What is not data mining?

What is NOT data mining? What is Data Mining?

• Look up phone numbers in phone directory

• Query a web search engine for information about “ Amazon”

• Certain names are more prevalent in certain areas ( srinu, venkat, harsha)• Group together similar documents returned by search engine according to their context( Amazon forest, Amazon.com)

“History of data mining”Necessity is the Mother of Invention

Data explosion problem

• Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories.

“Solution”

Data Mining– Extraction of Interesting

Knowledge from data in large databases

(rules, regularities, patterns constraints)

Evolution of Database Technology• 1960s:

– Data collection, database creation, IMS and network DBMS

• 1970s:– Relational data model, relational DBMS implementation

• 1980s:– RDBMS, advanced data models and application oriented DBMS

• 1990s – present:– Data mining and data warehousing, multimedia databases and web databases

Origins of DATA MINING

• Draw ideas from machine learning/AL, pattern recognition, statistics and database systems

• Traditional techniques may be unsuitable due to– Enormity of data

– High dimensionality of data

– Heterogeneous distributed nature of data

Applications of data mining:

Industry Application

Finance Card transaction analysis

Insurance Claims, Fraud analysis

Telecommunication Call recordings

Transport Logistics management

Consumer goods Promotion analysis

Scientific research Image, Video, Speech

Utilities Power usage analysis

Business intelligence (BI) refers to applications and technologies which are used to gather, provide access to, and analyze data and information about their company operations.

Diaphragm showing usage of data mining in making decisions and business intelligence

BUSINESS INTELLIGENCE

Steps involved in Data Mining• Data integration

• Data selection

• Data cleaning

• Data transformation

• Data mining

• Pattern evaluation

• Knowledge presentation

Data Mining Techniques

• Classification and Prediction– Focused hiring

• Cluster analysis– Market segmentation

• Outlier analysis– Fraud detection

• Association analysis– Market basket analysis

• Evolution analysis

Data Mining Tools

• Microsoft SQL Server 2005

• Microsoft SQL Server 2008

• Oracle Data Mining

• DB Miner

By Prithvi Raj, Vineesha,Varun, Swaroop, Saranya