Promise and Perils of AIcybernephrology.ualberta.ca/Technology/PowerPoint... · Promise and Perils...
Transcript of Promise and Perils of AIcybernephrology.ualberta.ca/Technology/PowerPoint... · Promise and Perils...
Faculty of Computer Science
© 2011
Technology and the Future of Medicine
Promise and Perils of AIPart II
Osmar R. ZaïaneProfessor and Scientific DirectorAlberta Innovates Centre forMachine Learning
Continuous Professional Learning Course
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Intellectual growth should commence at birth and cease only at death.
Albert Einstein(Nobel Prize for Physics in 1921)
Intelligence could be measured by what we know and how we use what we know.We all have the same capacity to learn but not the same opportunities to learn.
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
The greatest virtue of man is perhaps curiosity.
Anatole France(Nobel Prize for Literature in 1921)
Inquisitive behaviour, investigation, exploration represent the main drive for learning.Curiosity is the fuel for knowledge and the seed for intelligence.
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Intelligence Ability to
– Reason and plan
– Solve Complex Problems
– Think abstractly
– Comprehend Complex Ideas
– Learn
Programs that analyse and interpret data to learn from observations and adapt to changing situations.
What is Artificial Intelligence (Computational Intelligence)
Interpret; learn; adapt. understand to solve.
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Road Map
Promise and Perils of AIPart I (September 28)
• Artificial Intelligence and Expert Systems
Promise and Perils of AIPart II (September 29)
• Machine Learning and Data Mining
Promise and Perils of AIPart III (October 13)
• Applications: Fiction or Reality; Risks and Potential
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
What is Data Mining?
The process of extracting patterns from data. A technique for searching large-scale databases for patterns. Wikipedia
Data processing using sophisticated data search capabilities and statistical algorithms to discover patterns and correlations in large preexisting databases; a way to discover new meaning in data . WordNetweb
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
What is Data Mining?
The analysis and Intelligent interpretation of large data in order to provide actionable knowledge from this data for human decision support or automatic decision making.
Data mining is the process of discovering and extracting potentially useful and previously unknown patterns from large collections of data.
Data mining involves methods from artificial intelligence, statistics and database management.
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Data Mining is at the confluence of many disciplines
Database Systems Artificial Intelligence
Visualization
DBMSQuery processingDatawarehousingOLAP…
Machine LearningNeural NetworksAgentsKnowledge Representation…
Computer graphicsHuman Computer Interaction3D representation…
Information Retrieval
Statistics
High PerformanceComputing Statistical and
Mathematical Modeling…
Other
Parallel andDistributedComputing…
IndexingInverted files…
Natural Language processingImage processing…
•Descriptive DM tasksDescribe general properties
•Predictive DM tasksInfer on available data
ClassificationClusteringOutlier detectionAssociationDiscriminationContrastingetc.
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
We are not trying to find the needle in the haystack because DBMSs know how to do that given some good indexes.
We are merely trying to understand the consequences of the presence of the needle, if it exists.
Data Mining as opposed to Information Retrieval
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Association Rule Mining Association rule mining searches for relationships between
items in a dataset:
– aims at discovering associations between items in a transactional database.Store {a,b,c,d…}
{x,y,z}
{ , , ,…}
• Rule form: “Body Head [support, confidence]”.buys(x, “bread” and “apple”) buys(x, “milk”) [0.6%, 65%]
find combinations of items that occur typically together
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Outlier (Anomaly) Detection To find exceptional data in various datasets and uncover the implicit
patterns of rare cases (deviations from the norm)
Long been studied in statistics. Active area in data mining
Many applications
– Identifying network intrusion (Hackers, DoS, etc.)– Monitoring video surveillance– Fraud detection (Credit cards, stocks, financial transactions,
communications, voting irregularities, etc.)
– Performance Analysis (for scouting athletes, etc.)
– Weather Prediction (Environmental protection, disaster prevention,…)
– Real-time anomaly detection in various monitoring systems, such as structural health monitoring, transportation;
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
What is Machine Learning? Machine learning is a scientific discipline that is
concerned with the design and development of algorithms that allow computers to change behavior based on data. Wikipedia
Machine Learning provides means to learn from large data, interpret the trends in the data and adapt to the data as opposed to static programs
Learn from experience – Adapt to environment
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
What is Machine Learning?
• Medical image analysis• filtering data for analytics• Control robot in unknown
environment
• Accounting program• Querying a database• Control welding robot in manufacturing
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Kinds of Machine Learning? Supervised learning - Generates a function that maps inputs to
desired outputs. Supervised because it learns from labeled data (training set)
Unsupervised learning - Models a set of inputs into groups (clusters). It is based on unlabeled data.
Semi-supervised learning - Combines both labeled and unlabeled examples to generate an appropriate function or classifier. The set of label data is typically small.
Reinforcement learning - Learns how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm.
Active learning – learns from a very small training set and continues to lean from what it correctly labels.
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
1 2 3 4 n…
A model is first created based on the data distribution.The model is then used to classify new data.Given the model, a class can be predicted for new data.
?
With classification, I can predict in which bucket to put the ball, but I can’t predict the weight of the ball.
The typical process
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Classification = Learning a ModelTraining Set (labeled)
ClassificationModel
New unlabeled data Labeling=Classification
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Challenges in Supervised LearningTraining Set (labeled)
ClassificationModel
New unlabeled data Labeling=Classification
What are the relevant features to consider and
how to represent them?
What algorithm to use, how to
optimize it for effectiveness and
efficiency? How to effectively
represent the learned model (function, data
structure)?How to effectively use
the learned model?
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Concrete Applications
Credit Approval
Spam detection
Suspicious credit card transaction(Falcon fraud assessment system)
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
1 2 3 4 n…
Supervised Classification = ClassificationWe know the class labels and the number of classes
Unsupervised Classification = ClusteringWe do not know the class labels and may not know the number of classes
1 2 3 4 n…???? ?
1 2 3 4 ?…???? ?
blackgreenblueredgray
Supervised vs Unsupervised
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
GroupingClusteringPartitioning
– Objects are not labeled, i.e. there is no training data. – We need a notion of similarity or closeness (what features?)– Should we know apriori how many clusters exist?– How do we characterize members of groups?– How do we label groups?– How do we deal with high dimensionality?
a
aaa
aab
bbb
cc
c
dd
dd
d
d
e
e
e
e
e
The process of putting similar data together.Clustering
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Framework (Supervised Learning)
TrainingData
TestingData
Labeled Data
Derive Classifier(Model)
Estimate Accuracy
Unlabeled New Data
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011Principles of Knowledge Discovery in Data
Classification Methods
Neural Networks K-Nearest Neighbour Decision Tree Induction Bayesian Classification Associative Classifiers Support Vector Machines Case-Based Reasoning Genetic Algorithms Rough Set Theory Fuzzy Sets Etc.
TrainingData
TestingData
Labeled Data
Derive Classifier(Model)
Estimate Accuracy
Unlabeled New Data
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Human Nervous System• We have only just began to understand how
our neural system operates• A huge number of neurons and
interconnections between them– 100 billion (i.e. 1010 ) neurons in the brain
• a full Olympic-sized swimming pool contains 1010
raindrops; the number of stars in the Milky Way is of the same magnitude
– 104 connections per neuron
• Biological neurons are slower than computers – Neurons operate in 10-3 seconds, computers in 10-9 seconds– The brain makes up for the slow rate of operation by a single neurone by
the large number of neurons and connections (think about the speed of face recognition by a human, for example, and the time it takes fast computers to do the same task.)
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
What is an Artificial Neural Network (NN)?
A neural network is a data structure that supposedly simulates the behaviour of neurons in a biological brain.
A neural network is composed of layers of units interconnected.Messages are passed along the connections from one unit to the other. Messages can change based on the weight of the connection and the value in the node.
Output vectorInput vector: xi
Input nodes
Hidden nodes Output nodes
feedforward
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
What is an Artificial Neural Network (NN)?
A network of many simple units (neurons, nodes)
NNs learn from examples and exhibit some capability for generalization beyond the training data.• knowledge is acquired by the network from its environment via learning
and is stored in the weights of the connections.• the training (learning) rule – a procedure for modifying the weights of
connections in order to perform a certain task.• There are also some sophisticated techniques that allow learning by
adding and pruning connections (between nodes).
• The units are connected by connections.• Each connection has an associated numeric weight.• Units receive inputs (from the environment or
other units) via the connections. They produce output using their weights and the inputs (i.e. they operate locally).
• A NN can be represented as a directed graph.
0.30.2
0.7
…
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
A Neuron
The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping.
f
weighted sum
Inputvector x
output y
Activationfunction
weightvector w
w0
w1
wn
x0
x1
xn
.
.
.
.
.
.
bias
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Correspondence Between Artificial and Biological Neurons
• How this artificial neuron relates to the biological one?– input p (or input vector p) – input signal (or signals) at the
dendrite– weight w (or weight vector w) - strength of the synapse (or
synapses)– summer & transfer function - cell body– neuron output a - signal at the axon
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Learning Paradigms
Actual Output
(1) Classificationadjust weights using
Error = Desired - Actual
(2) Reinforcementadjust weights
using reinforcement
Training data
Compare actual class with output
Inputs
Label
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Advantages
– prediction accuracy is generally high.
– robust, works when training examples contain errors.
Criticism
– long training time.
– difficult to understand the learned function (weights).
– Typically for numerical data
– not easy to incorporate domain knowledge.
– Design can be tedious and error prone (Too small: slow learning -Too big: instability or poor performance)
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
The Simple Nearest Neighbour Approach Nearest Neighbour is very simple. The training is nothing more
than sorting the training data and storing it in a list.
To classify a new entry, this entry is compared to the list to find the closest record, with value as similar as possible to the entry to classify (i.e. nearest neighbour). The class of this record is simply assigned to the new entry.
Different measures of similarity or distance can be used.
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
The Nearest Neighbour
. . .
Sorted training data New entry
Find record with closest values
Distance function
Class label of new entry
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
k Nearest Neighbours
. . .
Sorted training data New entry
Find k records with closest values
Distance function
Class label of new entry“Vote”
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
A Sample Decision Tree
Outlook?
Humidity? Windy?
high normal falsetrue
N N
P
sunny rainovercast
Outlook Tempreature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N
PP
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Decision Tree Construction
Tree starts a single node representing all data.
If sample are all same class then node becomes a leaf labeled with class label.
Otherwise, select attribute that best separates sample into individual classes.
Recursion stops when:
– Sample in node belong to the same class (majority);
– There are no remaining attributes on which to split;
– There are no samples with attribute value.
CL
Atr=?
Recursive process:
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Breast Cancer Detection
Goal : Mammography Classification (build a tool that ranks mammograms by priority – 2nd screening recommendation)– Current prototype: limited visual features; classifies malignant,
benign, normal; accuracy about 80%.– Are there better visual feature to exploit?
“Screening mammograms can detect breast cancer & early detection increases the chance of successful treatment”
•Regular for 50 to 69•Regular if prescribed for 40 to 49 and over 70
False-Positive and False-Negatives vary among Radiologists [3.5% to 21%] (American Cancer Society)
Health Canada reports that only 12% of eligible women in Alberta underwent regular screening in 2002. Today only 40% according to ACB. Goal: 80%. Most are single readings. 25% double reading selected randomly (not enough staff)
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Helping Doctors Interpret MRIs
» Will allow doctors to perform less invasive brain surgery and treatment
» Working with scientists and doctors at the Cross Cancer institute to improve treatments for tumors through automated tumor segmentation of MRIs
Looking ahead to remote MRIs
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011Courtesy: National Cancer Institute
Biopsy
g1 g2 g3 … gN
7.3 2.1 55.0 … 1.1
ER
Negative
Classifier
DNA Microarray
N = 33,000 genesper Patient
SNP: Single Nucleotide Polymorphism
Microarray Analysis
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Other examples in BioMedical applications
> Breast Cancer Prognosis– Who will relapse in 3 years?
> Kidney Transplant prediction– How will transplant behave over time?– Will a patient reject a transplant?
> Prostate Cancer- What toxicity level will the patient reach with
radiotherapy?
> Cancer Cachexia– Which cancer patients will “waste away”?
> Breast Cancer– Predict who would likely develop breast cancer? – Predict if breast cancer would reoccur
Microarray: gene expressionsSNP: Single Nucleotide Polymorphism
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Prostate Cancer
Radiotherapy
Toxicity 0Toxicity 1Toxicity 2Toxicity 3Toxicity 4Toxicity 5
…
Classifier
Predict Low ToxicityorHigh Toxicity
Past Patients
New Patient
Learn
Other treatments
O.R. Zaïane © 2011
Promise and Perils of AI
- UofA Edmonton – September 2011
Why we would need more data?
Classifier
Predict Low ToxicityorHigh Toxicity
Past Patients
New Patient
Learn
Other treatments
Cancer is not just geneticThere are environmental contributors to Cancer
What about associations with other diseases and/or treatments?