Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine...
Transcript of Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine...
![Page 1: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/1.jpg)
||
Francesca GrisoniUniversity of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milan, ItalyETH Zurich, Dept. of Chemistry and Applied Biosciences, Zurich, [email protected]
17.05.2017 1
Machine Learning for ChemoinformaticsAn introduction
© F. Grisoni, BigChem online course
![Page 2: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/2.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 2
Presentation Outline
• Introduction• Definition• Elements of Machine Learning
• Additional Considerations• The NFL theorem• Validation• Applicability
• Machine Learning approaches: some examples• Local methods• Tree-like approaches• Neural Networks
![Page 3: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/3.jpg)
||
Machine learning in chemoinformatics
17.05.2017© F. Grisoni, BigChem online course 3
P = f ( )
• Biological activity prediction
• Toxicity
• Physico-chemical properties
• Multi-objective optimization
• Rational drug design
Introduction
![Page 4: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/4.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 4
Machine Learning (ML)
“Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.”1
1 Samuel, A. L. (1959). “Some studies in machine learning using the game of checkers”. IBM Journal of research and development, 3(3), 210-229.2 Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877.
“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”2
https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer(1) Data (2) Task
(3) Performance
Introduction
![Page 5: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/5.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 5
(1) The data and the “G-I-G-O” principle
X (n x p)n
sam
ples
p variables
X
Machine Learning Elements
f ( )
f ( )0.1, 1, 0, 3, 3.5, 2, …
![Page 6: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/6.jpg)
||
P =
P =
17.05.2017© F. Grisoni, BigChem online course 6
(1) The data and the “G-I-G-O” principle
X (n x p)n
sam
ples
p variables
X
Machine Learning Elements
p'
nsa
mpl
es
Y (n x 1)
Y
f ( )
f ( )0.1, 1, 0, 3, 3.5, 2, …
![Page 7: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/7.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 7
(1) The data and the “G-I-G-O” principle
Garbage In = Garbage OutX (n x p)n
sam
ples
p variables
X
Machine Learning Elements
p'
nsa
mpl
es
Y (n x 1)
Y
• Structures• Experimental Responses
![Page 8: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/8.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 8
(2) Machine Learning Tasks
x1
x2
Unsupervised Learning
X (n x p)
nsa
mpl
es
p variables
X
Machine Learning Elements
![Page 9: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/9.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 9
(2) Machine Learning Tasks
x1
x2
Unsupervised Learning
X (n x p)
nsa
mpl
es
p variables
X
Machine Learning Elements
![Page 10: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/10.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 10
(2) Machine Learning Tasks
x1
x2
X (n x p)
nsa
mpl
es
p variables
X
Supervised Learning
Machine Learning Elements
![Page 11: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/11.jpg)
||
p'
17.05.2017© F. Grisoni, BigChem online course 11
(2) Machine Learning Tasks
x1
x2
Supervised Learning
X (n x p) +
nsa
mpl
es
p variables
Y (n x 1)
nsa
mpl
es
X Y
Machine Learning Elements
![Page 12: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/12.jpg)
||
p'
17.05.2017© F. Grisoni, BigChem online course 12
(2) Machine Learning Tasks
x1
x2
Supervised Learning
X (n x p) +
nsa
mpl
es
p variables
Y (n x 1)
nsa
mpl
es
X Y
Machine Learning Elements
![Page 13: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/13.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 13
(2) Machine Learning Tasks
Machine Learning Elements
![Page 14: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/14.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 14
Classification
Regression
(2) Machine Learning Tasks
Machine Learning Elements
![Page 15: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/15.jpg)
||
• Sensitivity or True Positive rate (TPR)
• Specificity or True Negative rate (TNR)
• Precision
17.05.2017© F. Grisoni, BigChem online course 15
N
P
N P
TN
TP
FP
FN
Single class
(3) PerformanceClassification
Machine Learning Elements
![Page 16: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/16.jpg)
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 16
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
![Page 17: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/17.jpg)
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 17
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
N = 990; P = 10TN = 990 (100%)TP = 0 (0%)
Snp = 0/10 = 0%Snp = 990/990 = 100%
NER = 50%Acc = 99%
![Page 18: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/18.jpg)
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 18
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
N = 990; P = 10TN = 990 (100%)TP = 0 (0%)
SnP = 0/10 = 0%SnN = 990/990 = 100%
NER = 50%Acc = 99%
![Page 19: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/19.jpg)
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 19
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
N = 990; P = 10TN = 990 (100%)TP = 0 (0%)
SnP = 0/10 = 0%SnN = 990/990 = 100%
NER = 50%Acc = 99%
![Page 20: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/20.jpg)
||
• Non-Error Rate or Balanced-Accuracy ϵ [0,1]
• Matthews Correlation Coefficient (MCC) ϵ [-1,1]
• Accuracy ϵ [0,1]
17.05.2017© F. Grisoni, BigChem online course 20
Global Performance
(3) PerformanceClassification
N
P
N P
TN
TP
FP
FN
Machine Learning Elements
N = 990; P = 10TN = 990 (100%)TP = 0 (0%)
SnP = 0/10 = 0%SnN = 990/990 = 100%
NER = 50%Acc = 99%
![Page 21: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/21.jpg)
||
• Root Mean Squared Error in Prediction (RMSEP)
17.05.2017© F. Grisoni, BigChem online course 21
𝒚 𝒚#
Real Pred.
(3) PerformanceRegression
Machine Learning Elements
![Page 22: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/22.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 22
Additional Considerations
Considerations on ML
![Page 23: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/23.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 23
Additional Considerations
Considerations on ML
1. Choice of the learner
“For every learner, there exists a task on which it fails, even though that task can be successfully learned by another learner.”1
1 Shalev-Shwartz, S., & Ben-David, S. (2014). “Understanding machine learning: From theory to algorithms.” Cambridge university press.
“No Free Lunch” Theorem:
![Page 24: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/24.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 24
Additional Considerations
Considerations on ML
2. Bias-Variance Trade-off
Complexity
Error • Bias à generalization (underfitting)• Variance à descriptive ability (overfitting)
![Page 25: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/25.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 25
Additional Considerations
Considerations on ML
2. Bias-Variance Trade-off
Complexity
Error • Bias à generalization (underfitting)• Variance à descriptive ability (overfitting)
![Page 26: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/26.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 26
group 1
group 2
group 4
Initial dataset
Considerations on ML
Additional Considerations3. Validation
![Page 27: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/27.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 27
group 1
group 2
group 4
Initial dataset
Training set Test set
Considerations on ML
Additional Considerations3. Validation
![Page 28: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/28.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 28
group 1
group 2
group 4
Initial dataset
Training set Test set
Training set Test setValidation set
Considerations on ML
Additional Considerations3. Validation
![Page 29: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/29.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 29
group 1
group 2
group 3
group 4
group 5
Validation set
Training set
Considerations on ML
Additional Considerations3. Validation
![Page 30: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/30.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 30
Applicability Domain: Chemical space where the property can be reliably predicted
Machine learning models à Reductionist
• Types of chemical structures • Physicochemical properties • Mechanisms of action considered
Considerations on ML
Additional Considerations4. Applicability
The “No Free Dessert (either!)” theorem
![Page 31: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/31.jpg)
||
T -1H=X (X X) X× ×
17.05.2017© F. Grisoni, BigChem online course 31
Man
1
p
xy j jj
D x y=
= -å1 1
2 2
min( ), max( )min( ), max( )
x xx x
Considerations on ML
Additional Considerations4. Applicability
![Page 32: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/32.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 32
Standard Machine Learning workflow (in chemoinformatics)Considerations on ML
![Page 33: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/33.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 33
Informationextraction
Standard Machine Learning workflow (in chemoinformatics)Considerations on ML
![Page 34: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/34.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 34
Applicability & predictivity Informationextraction
Standard Machine Learning workflow (in chemoinformatics)Considerations on ML
![Page 35: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/35.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 35
Applicability & predictivity Informationextraction
Application & new knowledge
Standard Machine Learning workflow (in chemoinformatics)Considerations on ML
![Page 36: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/36.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 36
Machine Learning methods (overview)
1. Decision Tree-based learning
• Decision Trees
• Random Forest
2. Local Methods
• k-means algorithm
• k-NN algorithm
3. Artificial Neural Networks
• Feed-Forward NN
• Kohonen Maps
![Page 37: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/37.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 37
(1) Decision Tree Learning
Root node
Decisionnode(s)
Leaves
![Page 38: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/38.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 38
Root node
Decisionnode(s)
Leaves
1. Easy to interpret
2. No data pretreatment
3. Numerical/categorical variables
4. Classification and regression
5. Non parametric
6. Automatic variable selection
(1) Decision Tree Learning
![Page 39: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/39.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 39
(1) Decision Tree Learning
Bagging (Bootstrap Aggregating) = “the power of the crowd”Random Forest
![Page 40: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/40.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 40
(2) Local approachesk-Means clustering
x1
x2
![Page 41: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/41.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 41
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
![Page 42: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/42.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 42
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
![Page 43: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/43.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 43
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
![Page 44: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/44.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 44
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
![Page 45: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/45.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 45
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
![Page 46: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/46.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 46
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
![Page 47: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/47.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 47
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
![Page 48: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/48.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 48
(2) Local approachesk-Means clustering
x1
x21. Select a k (3)
2. Random assignment
3. Centroid calculation
4. Closest centroid
5. End
![Page 49: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/49.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 49
(2) Local approachesk-Nearest Neighbor (kNN)
?
![Page 50: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/50.jpg)
||
1. Calculate a distance (ntrain times)
17.05.2017© F. Grisoni, BigChem online course 50
(2) Local approachesk-Nearest Neighbor (kNN)
?
![Page 51: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/51.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 51
(2) Local approachesk-Nearest Neighbor (kNN)
?
1. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
![Page 52: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/52.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 52
(2) Local approachesk-Nearest Neighbor (kNN)
?
k = 11. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
![Page 53: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/53.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 53
(2) Local approachesk-Nearest Neighbor (kNN)
?
1. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
k = 2
![Page 54: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/54.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 54
(2) Local approachesk-Nearest Neighbor (kNN)
?
1. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
k = 3
![Page 55: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/55.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 55
(2) Local approachesk-Nearest Neighbor (kNN)
?
1. Calculate a distance (ntrain times)
2. Select a number of neighbors (k) to predict the response
k = 4
![Page 56: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/56.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 56
(2) Local approachesk-Nearest Neighbor (kNN)
1. Good for large training set with “localized” differences
2. Difficult to be interpreted
3. Which k?
4. Which distance measure?
5. Curse of dimensionality à Variable selection
![Page 57: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/57.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 57
(3) Neural NetworksArtificial Neurons
x1
f (x)
x2
x3
x4
xp
y
Inputs
Output
Activation Function
![Page 58: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/58.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 58
(3) Neural NetworksArtificial Neurons
x1
f (x)
x2
x3
x4
xp
y
Inputs
Output
Activation Function Neural networks. Comprehensive Chemometrics: Chemical and Biochemical Data Analysis Vol, 3.
![Page 59: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/59.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 59
(3) Neural Networks
Input Layer
Hidden Layer(s)
Output Layer
1. Untrained Network
2. Compute the outcome
3. Compute the error
4. Back propagation learning
5. Repeat until stop criterion
Feed-Forward NN
![Page 60: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/60.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 60
(3) Neural Networks
Epochs
Error
Training set
Feed-Forward NN
![Page 61: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/61.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 61
(3) Neural Networks
Epochs
ErrorValidation set
Training set
Feed-Forward NN
![Page 62: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/62.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 62
(3) Neural Networks
Epochs
ErrorValidation set
Training set
Feed-Forward NN
![Page 63: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/63.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 63
(3) Neural NetworksKohonen Maps
• Unsupervised non-linear mapping• Topology preserving map
p dimensional
2 dimensional
![Page 64: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/64.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 64
(3) Neural NetworksKohonen Maps
…
Neurons
InputKohonenLayer
Weights
1. Competitive Learning
• Similarity to each neuron
• “Winner takes all”
2. Collaborative Learning• Winning neuron update
• Update of close neurons
![Page 65: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/65.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 65
(3) Neural NetworksKohonen Maps
Top map (compounds)
![Page 66: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/66.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 66
(3) Neural NetworksKohonen Maps
Top map (compounds)
Weight maps (p)
![Page 67: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/67.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 67
(3) Neural NetworksKohonen Maps
Top map (compounds)
Weight maps (p)
![Page 68: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/68.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 68
• Purpose (clustering, regression, classification)
• Performance vs interpretability
• Covered chemical space (e.g., AD)
• Types of included variables
• …
http://scikit-learn.org/stable/tutorial/machine_learning_map/
Which ML algorithm?
![Page 69: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/69.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 69
Summary
• Machines can learn from our data
• No ML algorithm always outperforms the others
• Validation and Applicability Domain assessment are crucial
• Pay attention to what the performance metric is telling you!
![Page 70: Machine Learning for Chemoinformaticsbigchem.eu/sites/default/files/Online14_Grisoni.pdf · Machine Learning (ML) “Machine Learning is the field of study that gives computers the](https://reader036.fdocuments.in/reader036/viewer/2022071219/60541be34bfac31d9f53e97f/html5/thumbnails/70.jpg)
|| 17.05.2017© F. Grisoni, BigChem online course 70
Supplementary reading
Theory and Algorithms
• Shalev-Shwartz, S., & Ben-David, S. (2014). “Understanding machine learning: From theory to algorithms.” Cambridge university press.
• Marini, F. (2009). “Neural networks”. In: Comprehensive Chemometrics: Chemical and Biochemical Data Analysis - Vol, 3.
Online resources
• [Coursera] Ng, A. Machine Learning, Stanford University. https://www.coursera.org/learn/machine-learning• [Online Book] Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/