University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data...

42
University of Toronto 03/27/22 1 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad

Transcript of University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data...

Page 1: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 1

Data Mining

The Art and Science of Obtaining Knowledge from Data

Dr. Saed Sayad

Page 2: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 2

Agenda

Explosion of data Introduction to data mining Examples of data mining in science

and engineering Challenges and opportunities

Page 3: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 3

Explosion of Data Data in the world doubles every 20 months!

NASA’s Earth Orbiting System:

46 megabytes of data per second

4,000,000,000,000 bytes a day

FBI fingerprints image library:

200,000,000,000,000 bytes

In-line image analysis for particle detection:

1 megabyte in one second

Page 4: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 4

Explosion of Data (cont.)

Page 5: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 5

Explosion of Data (cont.)

Page 6: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 6

Explosion of Data (cont.)

Page 7: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 7

Explosion of Data (cont.)

Page 8: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 8

Fast, accurate, and scalable data analysis techniques to extract useful knowledge:

The answer is Data Mining.

What we need?

Page 9: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 9

What is Data Mining?

“Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.”

Data KnowledgeData Mining

Page 10: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 10

AI,Machine Learning

Statistics

Data Mining

Database

Data Analysis

Data WarehouseOLAP

Page 11: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 11

Data MiningData Mining

Data Analysis Database

Statistics Machine Learning Data Warehouse OLAP

Page 12: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 12

Text Files Relational Database

Multi-dimensional Database

Entities File Table Cube

Attributes Row and Col

Record, Field, Index

Dimension, Level, Measurement

Methods Read, Write

Select, Insert, Update, Delete

Drill down, Drill up, Drill through

Language - SQL MDX

Database

Page 13: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 13

Data Analysis

Classification Regression Clustering Association Sequence Analysis

Page 14: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 14

Data Analysis

X1

X2 Y2

Output Variablesor

Targets

Y1Numeric

Categorical

Numeric

Categorical

Regression (0,1)

Classification (good, bad)

age, income, …

gender, occupation, …

Linear Modelsor

Decision Trees

Input Variablesor

Attributes

ModelModel

W1

W2

Page 15: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 15

Data Analysis (cont.)

Age

Income

Clustering

1, chips, coke, chocolate2, gum, chips3, chips, coke4, …

Probability (chips, coke) ?

Association

Sequence Analysis

…ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA…

Xt-1 XtT

Page 16: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 16

Data Mining in Research Life Cycle

Questions Needs

Search

Research

Experiment

Modeling

Report

Library

Data

Database

Data Analysis

Page 17: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 17

Data Mining – Modeling Steps

1.Problem Definition

2.Data Preparation

3.Exploration

4.Modeling

5.Evaluation

6.Deployment

Page 18: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 18

Agenda

Explosion of data Introduction to data mining Examples of data mining in science and

engineering Challenges and opportunities

Page 19: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 19

Examples of data mining in science & engineering

1. Data mining in Biomedical Engineering

“Robotic Arm Control Using Data Mining Techniques”

2. Data mining in Chemical Engineering

“Data Mining for In-line Image Monitoring of Extrusion Processing”

Page 20: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 20

1. Problem Definition“Control a robotic arm by means of EMG signals from biceps and triceps muscles.”

Supination Pronation Flexion Extension

Muscle Contraction

Biceps Triceps

Supination H HPronation L LFlexion H LExtension L H

Page 21: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 21

2. Data Preparation

The dataset includes 80 records.

There are two input variables; biceps signal and triceps signal.

One output variable, with four possible values; Supination, Pronation, Flexion and Extension.

Page 22: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 22

3. Exploration

Triceps

Record#

Scatter Plot

Flexion Extension Supination Pronation

Page 23: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 23

3. Exploration (cont.)

Biceps

Record#

Scatter Plot

Flexion Extension Supination Pronation

Page 24: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 24

5. Modeling

Classification

OneR Decision Tree Naïve Bayesian K-Nearest Neighbors Neural Networks Linear Discriminant Analysis Support Vector Machines …

Page 25: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 25

6. Model Deployment

A neural network model was successfully implemented inside the robotic arm.

Page 26: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 26

Examples of data mining in science & engineering

1. Data mining in Biomedical Engineering

“Robotic Arm Control Using Data Mining Techniques”

2. Data mining in Chemical Engineering

“Data Mining for In-line Image Monitoring of Extrusion Processing”

Page 27: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 27

Plastics Extrusion

Plastic pellets

Plastic melt

Page 28: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 28

Film Extrusion

Extruder

Plastic Film

Defect due to particle

contaminant

Page 29: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 29

In-Line Monitoring

Transition Piece

Window Ports

Page 30: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 30

In-Line Monitoring

Light Source Extruder and Interface

Optical Assembly

Imaging Computer

Light

Page 31: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 31

Melt Without Contaminant Particles (WO)

Page 32: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 32

Melt With Contaminant Particles (WP)

Page 33: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 33

1. Problem Definition

Classify images into those with particles (WP) and those without particles (WO).

WO WP

Page 34: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 34

2. Data Preparation

2000 Images

54 Input variables all numeric

One output variables with two possible values-With Particle -Without Particle

Page 35: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 35

2. Data Preparation (cont.) Pre-processed images to remove noise

Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles

Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles

54 Input variables, all numeric

One output variable, with two possible values (WP and WO)

Page 36: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 36

3. Exploration

Demo!

Page 37: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 37

4. Modeling

Classification:

• OneR• Decision Tree• 3-Nearest Neighbors• Naïve Bayesian

Page 38: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 38

5. Evaluation

Dataset Attrib. Class One-R C4.5 3.N.N Bayes

Sharp Images

54 2 99.9 99.8 99.8 95.8

Sharp + Blurry Images

54 2 98.5 97.8 97.8 93.3

Sharp + Blurry Images

54 3 87 87 84 79

10 -fold cross-validation

If pixel_density_max < 142 then WP

Page 39: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 39

6. Deploy model A Visual Basic program will be developed to implement the model.

Page 40: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 40

Agenda

Explosion of data Introduction to data mining Examples of data mining in science &

engineering Challenges and opportunities

Page 41: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 41

Challenges and Opportunities Data mining is a ‘top ten’ emerging technology. High pay job! in the financial, medical and engineering. Faster, more accurate and more scalable techniques. Incremental, on-line and real-time learning algorithms. Parallel and distributed data processing techniques.

Page 42: University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.

University of Toronto04/19/23 42

Data mining is an exciting and challenging field with the ability to solve many complex scientific and

business problems.

You can be part of the solution!