Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks...
Transcript of Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks...
![Page 1: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/1.jpg)
Artificial Neural Networks
Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg
Harvard-MIT Division of Health Sciences and TechnologyHST.951J: Medical Decision Support
![Page 2: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/2.jpg)
Knowledge
textbook
verbal
rules
rule-based systems
experience
non-verbal
patterns
pattern recognition
![Page 3: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/3.jpg)
A real-life situation…
![Page 4: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/4.jpg)
…and its abstraction
(f, 30,1,0,67.8,12.2,…)(m, 52,1,1,57.4,8.9,…)(m, 28, 1,1,51.1,19.2,…) Model(p)(f, 46, 1,1,16.3,9.5.2,…) (m, 65,1,0,56.1,17.4,…)(m, 38, 1,0,22.8,19.2,…)
![Page 5: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/5.jpg)
Another real-life situation
benign lesion malignant lesion
![Page 6: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/6.jpg)
Example: Logistic regression
1y =
1 + e−( b1x1 +b2 x2 +b0 )
![Page 7: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/7.jpg)
So why use ANNs?
• Human brain good at pattern recognition
• Mimic structure and processing of brain: – Parallel processing – Distributed representation
• Expect: – Fault tolerance – Good generalization capability – More flexible than logistic regression
![Page 8: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/8.jpg)
Overview
• Motivation • Perceptrons • Multilayer perceptrons • Improving generalization • Bayesian perspective
![Page 9: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/9.jpg)
Terminology
input
output
weights
learning
covariate
dependent var.
parameters
estimation
![Page 10: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/10.jpg)
ANN topology
![Page 11: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/11.jpg)
Artificial neurons
![Page 12: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/12.jpg)
Activation functions
![Page 13: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/13.jpg)
Hyperplanes
• A vector w = (w1,…,wn) defines a hyperplane
• Hyperplane divides n-space of points x = (x1,…,xn): � w1 x1 + … + wn xn > 0 � w1 x1 + … + wn xn = 0 (the plane itself) � w1 x1 + … + wn xn < 0
• Abbreviation: w • x := w1 x1 + … + wn xn
![Page 14: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/14.jpg)
Linear separability
• Hyperplane through origin: w • x = 0 • Bias w0 to move hyperplane from origin:
w • x + w0 = 0
![Page 15: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/15.jpg)
Linear separability
• Convention: w := (w0,w), x := (1,x) • Class labels ti ∈{+1,-1}
• Error measure E = -Σ ti (w • xi) i miscl.
• How to minimize E?
![Page 16: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/16.jpg)
Linear separability
Error measure E = -Σ ti (w • xi) ≥ 0 i miscl.
+1 { x | w • x > 0 }
× -1
{ x | w • x < 0 }
![Page 17: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/17.jpg)
Gradient descent
• Simple function minimization algorithm • Gradient is vector of partial derivatives • Negative gradient is direction of
steepest descent
![Page 18: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/18.jpg)
Perceptron learning
• Find minimum of E by iterating wk+1 = wk – η gradw E
• E = -Σ ti (w • xi) ⇒ i miscl.
gradw E = -Σ ti xii miscl.
• “online” version: pick misclassified xiwk+1 = wk + η ti xi
![Page 19: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/19.jpg)
Perceptron learning
• Update rule wk+1 = wk + η ti xi
• Theorem: perceptron learning converges for linearly separable sets
![Page 20: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/20.jpg)
From perceptrons to multilayer perceptrons
Why?
![Page 21: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/21.jpg)
Multilayer perceptrons
• Sigmoidal hidden layer • Can represent arbitrary decision regions • Can be trained similar to perceptrons
![Page 22: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/22.jpg)
Decision theory
• Pattern recognition not deterministic • Needs language of probability theory • Given abstraction x:
x model
C1
C2
Decide C1 if P(C1|x) > P(C2|x)
![Page 23: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/23.jpg)
Some background math
• Have data set D = {(xi,ti)} drawn from probability distribution P(x,t)
• Model P(x,t) given samples D by ANN with adjustable parameter w
• Statistics analogy:
![Page 24: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/24.jpg)
Some background math
• Maximize likelihood of data D
• Likelihood L = Π p(xi,ti) = Π p(ti|xi)p(xi)
• Minimize -log L = -Σ log p(ti|xi) -Σ log p(xi)
• Drop second term: does not depend on w • Two cases: regression and classification
![Page 25: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/25.jpg)
Likelihood for regression
• For regression, targets t are real values
• Minimize -Σ log p(ti|xi)
• Assume network outputs y(xi,w) are noisy targets ti
• Minimizing –log L equivalent to minimizing Σ (y(xi,w) – ti)2 (sum-of-squares error)
![Page 26: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/26.jpg)
Likelihood for classification
• For classification, targets t are class labels
• Minimize -Σ log p(ti|xi)
• Assume network outputs y(xi,w) are P(C1|x) • Minimizing –log L equivalent to minimizing
-Σ ti log y(xi,w) +(1 – ti) * log(1-y(xi,w)) (cross-entropy error)
![Page 27: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/27.jpg)
Backpropagation algorithm
• Minimizing error function by gradient descent: wk+1 = wk – η gradw E
• Iterative gradient calculation by propagating error signals
![Page 28: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/28.jpg)
Backpropagation algorithm
Problem: how to set learning rate η ?
Better: use more advanced minimization algorithms (second-order information)
![Page 29: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/29.jpg)
Backpropagation algorithm
Classification Regression
cross-entropy sum-of-squares
![Page 30: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/30.jpg)
ANN output for regression
Mean of p(t|x)
![Page 31: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/31.jpg)
ANN output for classification
P(t = 1|x)
![Page 32: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/32.jpg)
Improving generalization
Problem: memorizing (x,t) combinations (“overtraining”)
0.7 0.5 0
-0.5 0.9 1
-0.2 -1.2 1
0.3 0.6 1
-0.2 0.5 ?
![Page 33: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/33.jpg)
Improving generalization
• Need test set to judge performance • Goal: represent information in data set,
not noise • How to improve generalization?
– Limit network topology – Early stopping – Weight decay
![Page 34: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/34.jpg)
Limit network topology
• Idea: fewer weights ⇒ less flexibility • Analogy to polynomial interpolation:
![Page 35: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/35.jpg)
Limit network topology
![Page 36: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/36.jpg)
Early stopping
• Idea: stop training when information (but not noise) is modeled
• Need validation set to determine when to stop training
![Page 37: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/37.jpg)
Early stopping
![Page 38: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/38.jpg)
Weight decay
• Idea: control smoothness of network output by controlling size of weights
• Add term - α||w||2 to error function
![Page 39: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/39.jpg)
Weight decay
![Page 40: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/40.jpg)
Bayesian perspective
• Error function minimization corresponds to maximum likelihood (ML) estimate: single best solution wML
• Can lead to overtraining • Bayesian approach: consider weight
posterior distribution p(w|D). • Advantage: error bars for regression,
averaged estimates for classification
![Page 41: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/41.jpg)
Bayesian perspective
• Posterior = likelihood * prior • p(w|D) = p(D|w) p(w)/p(D) • Two approaches to approximating
p(w|D): – Sampling – Gaussian approximation
![Page 42: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/42.jpg)
Sampling from p(w|D)
prior * likelihood = posterior
![Page 43: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/43.jpg)
Gaussian approx. to p(w|D)
• Find maximum wMAP of p(w|D) • Approximate p(w|D) by Gaussian
around wMAP
• Fit curvature:
![Page 44: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/44.jpg)
Gaussian approx. to p(w|D)
• Max p(w|D) = min –log p(w|D) = min –log p(D|w) –log p(w)
• Minimizing first term: finds ML solution • Minimizing second term: for zero-mean
Gaussian prior p(w) adds term - α||w||2
• Therefore, adding weight decay amounts to finding MAP solution!
![Page 45: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/45.jpg)
Bayesian example for regression
![Page 46: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/46.jpg)
Bayesian example for classification
![Page 47: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/47.jpg)
Summary
• ANNs inspired by functionality of brain • Nonlinear data model • Trained by minimizing error function • Goal is to generalize well • Avoid overtraining • Distinguish ML and MAP solutions
![Page 48: Artificial Neural Networks - MIT OpenCourseWare · · 2017-12-28Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT](https://reader030.fdocuments.in/reader030/viewer/2022032613/5ada6c7a7f8b9a6d318cb93f/html5/thumbnails/48.jpg)
Pointers to the literature• Lisboa PJ. A review of evidence of health benefit from artificial
neural networks in medical intervention. Neural Netw. 2002 Jan;15(1):11-39.
• Almeida JS. Predictive non-linear modeling of complex data by artificial neural networks. Curr Opin Biotechnol. 2002 Feb;13(1):72-6.
• Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med. 2001 Aug;23(1):89-109.
• Dayhoff JE, DeLeo JM. Artificial neural networks: opening the black box. Cancer. 2001 Apr 15;91(8 Suppl):1615-35.
• Basheer IA, Hajmeer M. Artificial neural networks: fundamentals, computing, design, and application. J Microbiol Methods. 2000 Dec 1;43(1):3-31.
• Bishop, CM. Neural Networks for Pattern Recognition. Oxford University Press 1995.