Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA...

Post on 19-Jan-2016

214 views 0 download

Tags:

Transcript of Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA...

Machine LearningLecture for Methodological Foundations of Biomedical Informatics

Fall 2015 (BMSC-GA 4449)

Sisi MaNYU Langone Medical Center

CHIBI

What type of problems can machine learning solve?

• Re Real Estate

Artificial Intelligence

Retail Sales

Conservation

Climate

Current Active Projects on Kaggle as of Oct, 26th,2015

What type of problems can machine learning solve?

Predominantly:

Classification

How to classify?

Main Ways to Classify:- Unsupervised- Supervised

Unsupervised Learning

Group similar items together

Comics credit: http://nlp.cs.berkeley.edu/comics.shtml

Unsupervised Learning

Since the definition of similarity is arbitrary, one can get different labeling solutions.

Unsupervised Learning

The solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.

Unsupervised LearningThe solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.

Unsupervised LearningThe solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.

Lowe, 2012

Unsupervised LearningThe solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.

Image Credit: https://en.wikipedia.org/wiki/Metric_(mathematics)

Unsupervised Learning

How do we know the solution is good?It corresponds to something we care about.

Unsupervised Learning

Supervised Learning

Supervised Learning

Overfitting

Duda, 2ed

Supervised Learning

Overfitting

Image Credit: https://commons.wikimedia.org/wiki/File:Overfitting.svg

Supervised Learning

How do I know if I am overfitting?

Validation Data

Supervised Learning

How do I know if I am overfitting?

Duda, 2ed

18

Supervised Learning

Support Vector Machine

Key Characteristics of SVM• Maximum gap to prevent overfitting• QP problems can be solved with

standard methods.• Soft margins to tolerate noise• Kernel trick for linearly non-separable

dataStatnikov et al., 2011

Most modern algorithms have built in mechanism to minimize overfitting.

19

Predictive Modeling: A Simplified General Framework

Validation Data

20

Predictive Modeling: Cross Validation for error estimation and model selection

Ma et al., 2015 (in preparation)

Machine Learning vs Statistics

Robert Tibshiriani

Machine Learning vs Statistics

Robert Tibshiriani

Machine Learning vs Statistics

Machine Learning Statistics

One major difference between machine learning and statistics :How is the model evaluated?

Machine Learning vs StatisticsWhat is a good model? According to most statistician, in practice especially

Most commonly evaluated by R-squared Breiman, 2001

Machine Learning vs Statistics

Validation Data

What is a good model? According to machine learning researcher.

The Future

What’s the job?

Homework

Research bias-variance decomposition and answer the following question from ”An Introduction to Statistical Learning”.

Resources