The Art and Technology of Data Mining

42
University of Toronto 06/22/22 1 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad

description

 

Transcript of The Art and Technology of Data Mining

Page 1: The Art and Technology of Data Mining

University of Toronto04/11/23 1

Data Mining

The Art and Science of Obtaining Knowledge from Data

Dr. Saed Sayad

Page 2: The Art and Technology of Data Mining

University of Toronto04/11/23 2

Agenda

Explosion of data Introduction to data mining Examples of data mining in science

and engineering Challenges and opportunities

Page 3: The Art and Technology of Data Mining

University of Toronto04/11/23 3

Explosion of Data Data in the world doubles every 20 months!

NASA’s Earth Orbiting System:

46 megabytes of data per second

4,000,000,000,000 bytes a day

FBI fingerprints image library:

200,000,000,000,000 bytes

In-line image analysis for particle detection:

1 megabyte in one second

Page 4: The Art and Technology of Data Mining

University of Toronto04/11/23 4

Explosion of Data (cont.)

Page 5: The Art and Technology of Data Mining

University of Toronto04/11/23 5

Explosion of Data (cont.)

Page 6: The Art and Technology of Data Mining

University of Toronto04/11/23 6

Explosion of Data (cont.)

Page 7: The Art and Technology of Data Mining

University of Toronto04/11/23 7

Explosion of Data (cont.)

Page 8: The Art and Technology of Data Mining

University of Toronto04/11/23 8

Fast, accurate, and scalable data analysis techniques to extract useful knowledge:

The answer is Data Mining.

What we need?

Page 9: The Art and Technology of Data Mining

University of Toronto04/11/23 9

What is Data Mining?

“Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.”

Data KnowledgeData Mining

Page 10: The Art and Technology of Data Mining

University of Toronto04/11/23 10

AI,Machine Learning

Statistics

Data Mining

Database

Data Analysis

Data WarehouseOLAP

Page 11: The Art and Technology of Data Mining

University of Toronto04/11/23 11

Data MiningData Mining

Data Analysis Database

Statistics Machine Learning Data Warehouse OLAP

Page 12: The Art and Technology of Data Mining

University of Toronto04/11/23 12

Text Files Relational Database

Multi-dimensional Database

Entities File Table Cube

Attributes Row and Col

Record, Field, Index

Dimension, Level, Measurement

Methods Read, Write

Select, Insert, Update, Delete

Drill down, Drill up, Drill through

Language - SQL MDX

Database

Page 13: The Art and Technology of Data Mining

University of Toronto04/11/23 13

Data Analysis

Classification Regression Clustering Association Sequence Analysis

Page 14: The Art and Technology of Data Mining

University of Toronto04/11/23 14

Data Analysis

X1

X2 Y2

Output Variablesor

Targets

Y1Numeric

Categorical

Numeric

Categorical

Regression (0,1)

Classification (good, bad)

age, income, …

gender, occupation, …

Linear Modelsor

Decision Trees

Input Variablesor

Attributes

ModelModel

W1

W2

Page 15: The Art and Technology of Data Mining

University of Toronto04/11/23 15

Data Analysis (cont.)

Age

Income

Clustering

1, chips, coke, chocolate2, gum, chips3, chips, coke4, …

Probability (chips, coke) ?

Association

Sequence Analysis

…ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA…

Xt-1 XtT

Page 16: The Art and Technology of Data Mining

University of Toronto04/11/23 16

Data Mining in Research Life Cycle

Questions Needs

Search

Research

Experiment

Modeling

Report

Library

Data

Database

Data Analysis

Page 17: The Art and Technology of Data Mining

University of Toronto04/11/23 17

Data Mining – Modeling Steps

1.Problem Definition

2.Data Preparation

3.Exploration

4.Modeling

5.Evaluation

6.Deployment

Page 18: The Art and Technology of Data Mining

University of Toronto04/11/23 18

Agenda

Explosion of data Introduction to data mining Examples of data mining in science and

engineering Challenges and opportunities

Page 19: The Art and Technology of Data Mining

University of Toronto04/11/23 19

Examples of data mining in science & engineering

1. Data mining in Biomedical Engineering

“Robotic Arm Control Using Data Mining Techniques”

2. Data mining in Chemical Engineering

“Data Mining for In-line Image Monitoring of Extrusion Processing”

Page 20: The Art and Technology of Data Mining

University of Toronto04/11/23 20

1. Problem Definition“Control a robotic arm by means of EMG signals from biceps and triceps muscles.”

Supination Pronation Flexion Extension

Muscle Contraction

Biceps Triceps

Supination H HPronation L LFlexion H LExtension L H

Page 21: The Art and Technology of Data Mining

University of Toronto04/11/23 21

2. Data Preparation

The dataset includes 80 records.

There are two input variables; biceps signal and triceps signal.

One output variable, with four possible values; Supination, Pronation, Flexion and Extension.

Page 22: The Art and Technology of Data Mining

University of Toronto04/11/23 22

3. Exploration

Triceps

Record#

Scatter Plot

Flexion Extension Supination Pronation

Page 23: The Art and Technology of Data Mining

University of Toronto04/11/23 23

3. Exploration (cont.)

Biceps

Record#

Scatter Plot

Flexion Extension Supination Pronation

Page 24: The Art and Technology of Data Mining

University of Toronto04/11/23 24

5. Modeling

Classification

OneR Decision Tree Naïve Bayesian K-Nearest Neighbors Neural Networks Linear Discriminant Analysis Support Vector Machines …

Page 25: The Art and Technology of Data Mining

University of Toronto04/11/23 25

6. Model Deployment

A neural network model was successfully implemented inside the robotic arm.

Page 26: The Art and Technology of Data Mining

University of Toronto04/11/23 26

Examples of data mining in science & engineering

1. Data mining in Biomedical Engineering

“Robotic Arm Control Using Data Mining Techniques”

2. Data mining in Chemical Engineering

“Data Mining for In-line Image Monitoring of Extrusion Processing”

Page 27: The Art and Technology of Data Mining

University of Toronto04/11/23 27

Plastics Extrusion

Plastic pellets

Plastic melt

Page 28: The Art and Technology of Data Mining

University of Toronto04/11/23 28

Film Extrusion

Extruder

Plastic Film

Defect due to particle

contaminant

Page 29: The Art and Technology of Data Mining

University of Toronto04/11/23 29

In-Line Monitoring

Transition Piece

Window Ports

Page 30: The Art and Technology of Data Mining

University of Toronto04/11/23 30

In-Line Monitoring

Light Source Extruder and Interface

Optical Assembly

Imaging Computer

Light

Page 31: The Art and Technology of Data Mining

University of Toronto04/11/23 31

Melt Without Contaminant Particles (WO)

Page 32: The Art and Technology of Data Mining

University of Toronto04/11/23 32

Melt With Contaminant Particles (WP)

Page 33: The Art and Technology of Data Mining

University of Toronto04/11/23 33

1. Problem Definition

Classify images into those with particles (WP) and those without particles (WO).

WO WP

Page 34: The Art and Technology of Data Mining

University of Toronto04/11/23 34

2. Data Preparation

2000 Images

54 Input variables all numeric

One output variables with two possible values-With Particle -Without Particle

Page 35: The Art and Technology of Data Mining

University of Toronto04/11/23 35

2. Data Preparation (cont.) Pre-processed images to remove noise

Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles

Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles

54 Input variables, all numeric

One output variable, with two possible values (WP and WO)

Page 36: The Art and Technology of Data Mining

University of Toronto04/11/23 36

3. Exploration

Demo!

Page 37: The Art and Technology of Data Mining

University of Toronto04/11/23 37

4. Modeling

Classification:

• OneR• Decision Tree• 3-Nearest Neighbors• Naïve Bayesian

Page 38: The Art and Technology of Data Mining

University of Toronto04/11/23 38

5. Evaluation

Dataset Attrib. Class One-R C4.5 3.N.N Bayes

Sharp Images

54 2 99.9 99.8 99.8 95.8

Sharp + Blurry Images

54 2 98.5 97.8 97.8 93.3

Sharp + Blurry Images

54 3 87 87 84 79

10 -fold cross-validation

If pixel_density_max < 142 then WP

Page 39: The Art and Technology of Data Mining

University of Toronto04/11/23 39

6. Deploy model A Visual Basic program will be developed to implement the model.

Page 40: The Art and Technology of Data Mining

University of Toronto04/11/23 40

Agenda

Explosion of data Introduction to data mining Examples of data mining in science &

engineering Challenges and opportunities

Page 41: The Art and Technology of Data Mining

University of Toronto04/11/23 41

Challenges and Opportunities Data mining is a ‘top ten’ emerging technology. High pay job! in the financial, medical and engineering. Faster, more accurate and more scalable techniques. Incremental, on-line and real-time learning algorithms. Parallel and distributed data processing techniques.

Page 42: The Art and Technology of Data Mining

University of Toronto04/11/23 42

Data mining is an exciting and challenging field with the ability to solve many complex scientific and

business problems.

You can be part of the solution!