Promise and Perils of AIcybernephrology.ualberta.ca/Technology/PowerPoint... · Promise and Perils...

Faculty of Computer Science

© 2011

Technology and the Future of Medicine

Promise and Perils of AIPart II

Osmar R. ZaïaneProfessor and Scientific DirectorAlberta Innovates Centre forMachine Learning

Continuous Professional Learning Course

O.R. Zaïane © 2011

Promise and Perils of AI

- UofA Edmonton – September 2011

Intellectual growth should commence at birth and cease only at death.

Albert Einstein(Nobel Prize for Physics in 1921)

Intelligence could be measured by what we know and how we use what we know.We all have the same capacity to learn but not the same opportunities to learn.




The greatest virtue of man is perhaps curiosity.

Anatole France(Nobel Prize for Literature in 1921)

Inquisitive behaviour, investigation, exploration represent the main drive for learning.Curiosity is the fuel for knowledge and the seed for intelligence.




Intelligence Ability to

– Reason and plan

– Solve Complex Problems

– Think abstractly

– Comprehend Complex Ideas

– Learn

Programs that analyse and interpret data to learn from observations and adapt to changing situations.

What is Artificial Intelligence (Computational Intelligence)

Interpret; learn; adapt. understand to solve.




Road Map

Promise and Perils of AIPart I (September 28)

• Artificial Intelligence and Expert Systems

Promise and Perils of AIPart II (September 29)

• Machine Learning and Data Mining

Promise and Perils of AIPart III (October 13)

• Applications: Fiction or Reality; Risks and Potential




What is Data Mining?

The process of extracting patterns from data. A technique for searching large-scale databases for patterns. Wikipedia

Data processing using sophisticated data search capabilities and statistical algorithms to discover patterns and correlations in large preexisting databases; a way to discover new meaning in data . WordNetweb




What is Data Mining?

The analysis and Intelligent interpretation of large data in order to provide actionable knowledge from this data for human decision support or automatic decision making.

Data mining is the process of discovering and extracting potentially useful and previously unknown patterns from large collections of data.

Data mining involves methods from artificial intelligence, statistics and database management.




Data Mining is at the confluence of many disciplines

Database Systems Artificial Intelligence

Visualization

DBMSQuery processingDatawarehousingOLAP…

Machine LearningNeural NetworksAgentsKnowledge Representation…

Computer graphicsHuman Computer Interaction3D representation…

Information Retrieval

Statistics

High PerformanceComputing Statistical and

Mathematical Modeling…

Other

Parallel andDistributedComputing…

IndexingInverted files…

Natural Language processingImage processing…

•Descriptive DM tasksDescribe general properties

•Predictive DM tasksInfer on available data

ClassificationClusteringOutlier detectionAssociationDiscriminationContrastingetc.




We are not trying to find the needle in the haystack because DBMSs know how to do that given some good indexes.

We are merely trying to understand the consequences of the presence of the needle, if it exists.

Data Mining as opposed to Information Retrieval




Association Rule Mining Association rule mining searches for relationships between

items in a dataset:

– aims at discovering associations between items in a transactional database.Store {a,b,c,d…}

{x,y,z}

{ , , ,…}

• Rule form: “Body Head [support, confidence]”.buys(x, “bread” and “apple”) buys(x, “milk”) [0.6%, 65%]

find combinations of items that occur typically together




Outlier (Anomaly) Detection To find exceptional data in various datasets and uncover the implicit

patterns of rare cases (deviations from the norm)

Long been studied in statistics. Active area in data mining

Many applications

– Identifying network intrusion (Hackers, DoS, etc.)– Monitoring video surveillance– Fraud detection (Credit cards, stocks, financial transactions,

communications, voting irregularities, etc.)

– Performance Analysis (for scouting athletes, etc.)

– Weather Prediction (Environmental protection, disaster prevention,…)

– Real-time anomaly detection in various monitoring systems, such as structural health monitoring, transportation;




What is Machine Learning? Machine learning is a scientific discipline that is

concerned with the design and development of algorithms that allow computers to change behavior based on data. Wikipedia

Machine Learning provides means to learn from large data, interpret the trends in the data and adapt to the data as opposed to static programs

Learn from experience – Adapt to environment




What is Machine Learning?

• Medical image analysis• filtering data for analytics• Control robot in unknown

environment

• Accounting program• Querying a database• Control welding robot in manufacturing




Kinds of Machine Learning? Supervised learning - Generates a function that maps inputs to

desired outputs. Supervised because it learns from labeled data (training set)

Unsupervised learning - Models a set of inputs into groups (clusters). It is based on unlabeled data.

Semi-supervised learning - Combines both labeled and unlabeled examples to generate an appropriate function or classifier. The set of label data is typically small.

Reinforcement learning - Learns how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm.

Active learning – learns from a very small training set and continues to lean from what it correctly labels.




1 2 3 4 n…

A model is first created based on the data distribution.The model is then used to classify new data.Given the model, a class can be predicted for new data.

?

With classification, I can predict in which bucket to put the ball, but I can’t predict the weight of the ball.

The typical process




Classification = Learning a ModelTraining Set (labeled)

ClassificationModel

New unlabeled data Labeling=Classification




Challenges in Supervised LearningTraining Set (labeled)

ClassificationModel

New unlabeled data Labeling=Classification

What are the relevant features to consider and

how to represent them?

What algorithm to use, how to

optimize it for effectiveness and

efficiency? How to effectively

represent the learned model (function, data

structure)?How to effectively use

the learned model?




Concrete Applications

Credit Approval

Spam detection

Suspicious credit card transaction(Falcon fraud assessment system)




1 2 3 4 n…

Supervised Classification = ClassificationWe know the class labels and the number of classes

Unsupervised Classification = ClusteringWe do not know the class labels and may not know the number of classes

1 2 3 4 n…???? ?

1 2 3 4 ?…???? ?

blackgreenblueredgray

Supervised vs Unsupervised




GroupingClusteringPartitioning

– Objects are not labeled, i.e. there is no training data. – We need a notion of similarity or closeness (what features?)– Should we know apriori how many clusters exist?– How do we characterize members of groups?– How do we label groups?– How do we deal with high dimensionality?

a

aaa

aab

bbb

cc

c

dd

dd

d

d

e

e

e

e

e

The process of putting similar data together.Clustering




Framework (Supervised Learning)

TrainingData

TestingData

Labeled Data

Derive Classifier(Model)

Estimate Accuracy

Unlabeled New Data



- UofA Edmonton – September 2011Principles of Knowledge Discovery in Data

Classification Methods

Neural Networks K-Nearest Neighbour Decision Tree Induction Bayesian Classification Associative Classifiers Support Vector Machines Case-Based Reasoning Genetic Algorithms Rough Set Theory Fuzzy Sets Etc.

TrainingData

TestingData

Labeled Data

Derive Classifier(Model)

Estimate Accuracy

Unlabeled New Data




Human Nervous System• We have only just began to understand how

our neural system operates• A huge number of neurons and

interconnections between them– 100 billion (i.e. 1010 ) neurons in the brain

• a full Olympic-sized swimming pool contains 1010

raindrops; the number of stars in the Milky Way is of the same magnitude

– 104 connections per neuron

• Biological neurons are slower than computers – Neurons operate in 10-3 seconds, computers in 10-9 seconds– The brain makes up for the slow rate of operation by a single neurone by

the large number of neurons and connections (think about the speed of face recognition by a human, for example, and the time it takes fast computers to do the same task.)




What is an Artificial Neural Network (NN)?

A neural network is a data structure that supposedly simulates the behaviour of neurons in a biological brain.

A neural network is composed of layers of units interconnected.Messages are passed along the connections from one unit to the other. Messages can change based on the weight of the connection and the value in the node.

Output vectorInput vector: xi

Input nodes

Hidden nodes Output nodes

feedforward




What is an Artificial Neural Network (NN)?

A network of many simple units (neurons, nodes)

NNs learn from examples and exhibit some capability for generalization beyond the training data.• knowledge is acquired by the network from its environment via learning

and is stored in the weights of the connections.• the training (learning) rule – a procedure for modifying the weights of

connections in order to perform a certain task.• There are also some sophisticated techniques that allow learning by

adding and pruning connections (between nodes).

• The units are connected by connections.• Each connection has an associated numeric weight.• Units receive inputs (from the environment or

other units) via the connections. They produce output using their weights and the inputs (i.e. they operate locally).

• A NN can be represented as a directed graph.

0.30.2

0.7

…




A Neuron

The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping.

f

weighted sum

Inputvector x

output y

Activationfunction

weightvector w

w0

w1

wn

x0

x1

xn

.

.

.

.

.

.

bias




Correspondence Between Artificial and Biological Neurons

• How this artificial neuron relates to the biological one?– input p (or input vector p) – input signal (or signals) at the

dendrite– weight w (or weight vector w) - strength of the synapse (or

synapses)– summer & transfer function - cell body– neuron output a - signal at the axon




Learning Paradigms

Actual Output

(1) Classificationadjust weights using

Error = Desired - Actual

(2) Reinforcementadjust weights

using reinforcement

Training data

Compare actual class with output

Inputs

Label




Advantages

– prediction accuracy is generally high.

– robust, works when training examples contain errors.

Criticism

– long training time.

– difficult to understand the learned function (weights).

– Typically for numerical data

– not easy to incorporate domain knowledge.

– Design can be tedious and error prone (Too small: slow learning -Too big: instability or poor performance)




The Simple Nearest Neighbour Approach Nearest Neighbour is very simple. The training is nothing more

than sorting the training data and storing it in a list.

To classify a new entry, this entry is compared to the list to find the closest record, with value as similar as possible to the entry to classify (i.e. nearest neighbour). The class of this record is simply assigned to the new entry.

Different measures of similarity or distance can be used.




The Nearest Neighbour

. . .

Sorted training data New entry

Find record with closest values

Distance function

Class label of new entry




k Nearest Neighbours

. . .

Sorted training data New entry

Find k records with closest values

Distance function

Class label of new entry“Vote”




A Sample Decision Tree

Outlook?

Humidity? Windy?

high normal falsetrue

N N

P

sunny rainovercast

Outlook Tempreature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N

PP




Decision Tree Construction

Tree starts a single node representing all data.

If sample are all same class then node becomes a leaf labeled with class label.

Otherwise, select attribute that best separates sample into individual classes.

Recursion stops when:

– Sample in node belong to the same class (majority);

– There are no remaining attributes on which to split;

– There are no samples with attribute value.

CL

Atr=?

Recursive process:




Breast Cancer Detection

Goal : Mammography Classification (build a tool that ranks mammograms by priority – 2nd screening recommendation)– Current prototype: limited visual features; classifies malignant,

benign, normal; accuracy about 80%.– Are there better visual feature to exploit?

“Screening mammograms can detect breast cancer & early detection increases the chance of successful treatment”

•Regular for 50 to 69•Regular if prescribed for 40 to 49 and over 70

False-Positive and False-Negatives vary among Radiologists [3.5% to 21%] (American Cancer Society)

Health Canada reports that only 12% of eligible women in Alberta underwent regular screening in 2002. Today only 40% according to ACB. Goal: 80%. Most are single readings. 25% double reading selected randomly (not enough staff)




Helping Doctors Interpret MRIs

» Will allow doctors to perform less invasive brain surgery and treatment

» Working with scientists and doctors at the Cross Cancer institute to improve treatments for tumors through automated tumor segmentation of MRIs

Looking ahead to remote MRIs



- UofA Edmonton – September 2011Courtesy: National Cancer Institute

Biopsy

g1 g2 g3 … gN

7.3 2.1 55.0 … 1.1

ER

Negative

Classifier

DNA Microarray

N = 33,000 genesper Patient

SNP: Single Nucleotide Polymorphism

Microarray Analysis




Other examples in BioMedical applications

> Breast Cancer Prognosis– Who will relapse in 3 years?

> Kidney Transplant prediction– How will transplant behave over time?– Will a patient reject a transplant?

> Prostate Cancer- What toxicity level will the patient reach with

radiotherapy?

> Cancer Cachexia– Which cancer patients will “waste away”?

> Breast Cancer– Predict who would likely develop breast cancer? – Predict if breast cancer would reoccur

Microarray: gene expressionsSNP: Single Nucleotide Polymorphism




Prostate Cancer

Radiotherapy

Toxicity 0Toxicity 1Toxicity 2Toxicity 3Toxicity 4Toxicity 5

…

Classifier

Predict Low ToxicityorHigh Toxicity

Past Patients

New Patient

Learn

Other treatments




Why we would need more data?

Classifier

Predict Low ToxicityorHigh Toxicity

Past Patients

New Patient

Learn

Other treatments

Cancer is not just geneticThere are environmental contributors to Cancer

What about associations with other diseases and/or treatments?

Promise and Perils of AIcybernephrology.ualberta.ca/Technology/PowerPoint... · Promise and Perils...

Documents

Transcript of Promise and Perils of AIcybernephrology.ualberta.ca/Technology/PowerPoint... · Promise and Perils...