Machine Learning

SCE 5820: Machine Learning

Instructor: Jinbo Bi

Computer Science and Engineering Dept.

Course Information

Instructor: Dr. Jinbo Bi – Office: ITEB 233– Phone: 860-486-1458

– Email: jinbo@engr.uconn.edu

– Web: http://www.engr.uconn.edu/~jinbo/– Time: Tue / Thur. 2:00pm – 3:15pm – Location: BCH 302– Office hours: Thur. 3:15-4:15pm

HuskyCT– http://learn.uconn.edu– Login with your NetID and password

– Illustration

Introduction of the instructor and TA

Ph.D in Mathematics Research interests: machine learning, data mining,

optimization, biomedical informatics, bioinformatics

subtyping GWAS

Color of flowers

Cancer, Psychiatri

c disorde

rs, …

http://labhealthinfo.uconn.edu/EasyBreathing

Course Information

Prerequisite: Basics of linear algebra, calculus, optimization and basics of programming

Course textbook (not required):

– Introduction to Data Mining (2005) by Pang-Ning Tan, Michael Steinbach, Vipin Kumar

– Pattern Recognition and Machine Learning (2006) Christopher M. Bishop

– Pattern Classification (2nd edition, 2000) Richard O. Duda, Peter E. Hart and David G. Stork

Additional class notes and copied materials will be given Reading material links will be provided

Objectives:

– Introduce students knowledge about the basic concepts of machine learning and the state-of-the-art machine learning algorithms

– Focus on some high-demanding application domains with hands-on experience of applying data mining/ machine learning techniques

Format:

– Lectures, Micro teaching assignment, Quizzes, A term project

Course Information

Grading

Micro teaching assignment (1): 20% In-class/In-lab open-book open notes quizzes (4-5):

40% Term Project (1): 30% Participation: 10%

Term Project is one for each term. A term can consist of one or two students. Each student in the team needs to specify his/her roles in the project.

Term projects can be chosen from a list of pre-defined projects

Policy

Computers Participation in micro-teaching sessions is very

important, and itself accounts for 50% of the credits for micro-teaching assignment

Quizzes are graded by the instructor Final term projects will be graded by the

instructor If you miss two quizzes, there will be a take-

home quiz to make up the credits (missing one may be ok for your final grade.)

Micro-teaching sessions

Students in our class need to form THREE roughly-even study groups

The instructor will help to balance off the study groups

Each study group will be responsible of teaching one specific topic chosen from the following:– Support Vector Machines– Spectral Clustering– Boosting (PAC learning model)

Term Project

Each team needs to give two presentations: a progress or preparation presentation (10-15min); a final presentation in the last week (15-20min)

Each team needs to submit a project report– Definition of the problem– Data mining approaches used to solve the

problem– Computational results– Conclusion (success or failure)

Machine Learning / Data Mining

Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information

– http://www.kdd.org/kdd2013/ ACM SIGKDD conference

The ultimate goal of machine learning is the creation and understanding of machine intelligence

– http://icml.cc/2013/ ICML conference

The main goal of statistical learning theory is to provide a framework for studying the problem of inference, that is of gaining knowledge, making predictions, and decisions from a set of data.

– http://nips.cc/Conferences/2012/ NIPS conference

Traditional Topics in Data Mining /AI

Fuzzy set and fuzzy logic– Fuzzy if-then rules

Evolutionary computation– Genetic algorithms– Evolutionary strategies

Artificial neural networks– Back propagation network (supervised

learning)– Self-organization network (unsupervised

learning, will not be covered)

Lack theoretical analysis about the behavior of the algorithms

Traditional Techniquesmay be unsuitable due to – Enormity of data– High dimensionality

of data– Heterogeneous,

distributed nature of data

Challenges in traditional techniques

Machine Learning/Pattern

Recognition

Statistics/AI

Soft Computing

Recent Topics in Data Mining

Unsupervised learning such as clustering– K-means – Gaussian mixture models– Hierarchical clustering– Graph based clustering (spectral clustering)

Dimension reduction– Feature selection– Compact feature space into low-dimensional

space (principal component analysis)

Statistical Behavior

Many perspectives to analyze how the algorithm handles uncertainty

Simple examples:– Consistency analysis– Learning bounds (upper bound on test error of

the constructed model or solution) “Statistical” not “deterministic”

– With probability p, the upper bound holds

P( > p) <= Upper_bound

Tasks may be in Data Mining

Prediction tasks (supervised problem)– Use some variables to predict unknown or

future values of other variables.

Description tasks (unsupervised problem)– Find human-interpretable patterns that

describe the data.

From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

Classification: Definition

Given a collection of examples (training set )– Each example contains a set of attributes, one of

the attributes is the class. Find a model for class attribute as a function

of the values of other attributes. Goal: previously unseen examples should be

assigned a class as accurately as possible.– A test set is used to determine the accuracy of the

model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Classification Example

Tid Refund MaritalStatus

TaxableIncome Cheat

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes10

categoric

continuous

Refund MaritalStatus

TaxableIncome Cheat

No Single 75K ?

Yes Married 50K ?

No Married 150K ?

Yes Divorced 90K ?

No Single 40K ?

No Married 80K ?10

TestSet

Training Set

ModelLearn

Classifier

Classification: Application 1

High Risky Patient Detection– Goal: Predict if a patient will suffer major complication

after a surgery procedure– Approach:

Use patients vital signs before and after surgical operation.– Heart Rate, Respiratory Rate, etc.

Monitor patients by expert medical professionals to label which patient has complication, which has not.

Learn a model for the class of the after-surgery risk. Use this model to detect potential high-risk patients for a

particular surgical procedure

Face recognition

– Goal: Predict the identity of a face image

– Approach: Align all images to derive the features Model the class (identity) based on these features

Cancer Detection

– Goal: To predict class (cancer or normal) of a sample (person), based on the microarray gene expression data

– Approach: Use expression levels of all

genes as the features Label each example as cancer

or normal Learn a model for the class of

all samples

Alzheimer's Disease Detection

– Goal: To predict class (AD or normal) of a sample (person), based on neuroimaging data such as MRI and PET

– Approach: Extract features from

neuroimages Label each example as AD or

normal Learn a model for the class of

all samples

Reduced gray matter volume (colored areas) detected by MRI voxel-basedmorphometry in AD patients compared to normal healthy controls.

Regression

Predict a value of a real-valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.

Extensively studied in statistics, neural network fields. Find a model to predict the dependent variable

as a function of the values of independent variables.

Goal: previously unseen examples should be predicted as accurately as possible.– A test set is used to determine the accuracy of the

model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Regression application 1

categoric

continuous

Continuous ta

Refund Marital Status

Taxable Income Loss

No Single 75K ?

Yes Married 50K ?

No Married 150K ?

Yes Divorced 90K ?

No Single 40K ?

No Married 80K ? 10

TestSet

Training Set

ModelLearn

RegressorPast transaction records, label them

Current data, want to use the model to predict

goals: Predict the possible loss from a customer

Tid Refund MaritalStatus

TaxableIncome Loss

1 Yes Single 125K 100

2 No Married 100K 120

3 No Single 70K -200

4 Yes Married 120K -300

5 No Divorced 95K -400

6 No Married 60K -500

7 Yes Divorced 220K -190

8 No Single 85K 300

9 No Married 75K -240

10 No Single 90K 9010

Regression applications

Examples:– Predicting sales amounts of new product

based on advertising expenditure.– Predicting wind velocities as a function of

temperature, humidity, air pressure, etc.– Time series prediction of stock market indices.

Clustering Definition

Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that– Data points in one cluster are more similar to

one another.– Data points in separate clusters are less

similar to one another. Similarity Measures:

– Euclidean Distance if attributes are continuous.

– Other Problem-specific Measures

Illustrating Clustering

Euclidean Distance Based Clustering in 3-D space.

Intracluster distancesare minimized

Intercluster distancesare maximized

Clustering: Application 1

High Risky Patient Detection– Goal: Predict if a patient will suffer major complication

after a surgery procedure– Approach:

Use patients vital signs before and after surgical operation.– Heart Rate, Respiratory Rate, etc.

Find patients whose symptoms are dissimilar from most of other patients.

Clustering: Application 2

Document Clustering:– Goal: To find groups of documents that are

similar to each other based on the important terms appearing in them.

– Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.

– Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.

Illustrating Document Clustering

Clustering Points: 3204 Articles of Los Angeles Times. Similarity Measure: How many words are common in

these documents (after some word filtering).

Category TotalArticles

CorrectlyPlaced

Financial 555 364

Foreign 341 260

National 273 36

Metro 943 746

Sports 738 573

Entertainment 354 278

Algorithms to solve these problems

Classification algorithms

K-Nearest-Neighbor classifiers Naïve Bayes classifier Neural Networks Linear Discriminant Analysis (LDA) Support Vector Machines (SVM) Decision Trees Logistic Regression Graphical models

Regression methods

Linear Regression Ridge Regression LASSO – Least Absolute Shrinkage and

Selection Operator Neural Networks

Clustering algorithms

K-Means Hierarchical clustering Graph-based clustering (Spectral

clustering) Semi-supervised clustering Others

Challenges of Data Mining

Scalability Dimensionality Complex and Heterogeneous Data Data Quality Data Ownership and Distribution Privacy Preservation

Basics of probability

An experiment (random variable) is a well-defined process with observable outcomes.

The set or collection of all outcomes of an experiment is called the sample space, S.

An event E is any subset of outcomes from S.

Probability of an event, P(E) is P(E) = number of outcomes in E / number of outcomes in S.

Probability Theory

Apples and Oranges

Assume P(Y=r) = 40%, P(Y=b) = 60% (prior)P(X=a|Y=r) = 2/8 = 25%P(X=o|Y=r) = 6/8 = 75%

P(X=a|Y=b) = 3/4 = 75%P(X=o|Y=b) = 1/4 = 25%

X: identity of the fruitY: identity of the box

Marginal P(X=a) = 11/20, P(X=o) = 9/20Posterior P(Y=r|X=o) = 2/3 P(Y=b|X=o) = 1/3

Probability Theory

Marginal Probability

Conditional Probability

Joint Probability

Probability Theory

Sum Rule

• Product Rule

The marginal prob of X equals the sum of the joint prob of x and y with respect to y

The joint prob of X and Y equals the product of the conditional prob of Y given X and the prob of X

Illustration

p(X|Y=1)

p(X,Y)

The Rules of Probability

Sum Rule

Product Rule

Bayes’ Rule

posterior likelihood × prior

= p(X|Y)p(Y)

Application of Prob Rules

p(X=a) = p(X=a,Y=r) + p(X=a,Y=b)= p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20=0.25*0.4 + 0.75*0.6 = 11/20

p(Y=r|X=o) = p(Y=r,X=o)/p(X=o)= p(X=o|Y=r)p(Y=r)/p(X=o)= 0.75*0.4 / (9/20) = 2/3

Assume P(Y=r) = 40%, P(Y=b) = 60%P(X=a|Y=r) = 2/8 = 25%P(X=o|Y=r) = 6/8 = 75%

P(X=a|Y=b) = 3/4 = 75%P(X=o|Y=b) = 1/4 = 25%

Application of Prob Rules

p(X=a) = p(X=a,Y=r) + p(X=a,Y=b)= p(X=a|Y=r)p(Y=r) + p(X=a|Y=b)p(Y=b) P(X=o) = 9/20=0.25*0.4 + 0.75*0.6 = 11/20

p(Y=r|X=o) = p(Y=r,X=o)/p(X=o)= p(X=o|Y=r)p(Y=r)/p(X=o)= 0.75*0.4 / (9/20) = 2/3

Assume P(Y=r) = 40%, P(Y=b) = 60%P(X=a|Y=r) = 2/8 = 25%P(X=o|Y=r) = 6/8 = 75%

P(X=a|Y=b) = 3/4 = 75%P(X=o|Y=b) = 1/4 = 25%

Mean and Variance

The mean of a random variable X is the average value X takes.

The variance of X is a measure of how dispersed the values that X takes are.

The standard deviation is simply the square root of the variance.

Simple Example

X= {1, 2} with P(X=1) = 0.8 and P(X=2) = 0.2

Mean – 0.8 X 1 + 0.2 X 2 = 1.2

Variance – 0.8 X (1 – 1.2) X (1 – 1.2) + 0.2 X (2 – 1.2)

X (2-1.2)

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian

References

SC_prob_basics1.pdf (necessary) SC_prob_basic2.pdf

Loaded to HuskyCT

Basics of Linear Algebra

Matrix Multiplication

The product of two matrices

Special case: vector-vector product, matrix-vector product

Matrix Multiplication

Rules of Matrix Multiplication

Orthogonal Matrix

. ifonly and if orthormal, are )( of columns The

)matrixidentity theis(.ifonlyandif ,orthogonalis1-

IV VnmV

Square Matrix – EigenValue, EigenVector

reigenvecto theisx

eigenvalue theis

.ifonlyandif,ofpaireigenanis),(

Symmetric Matrix – EigenValue EigenVector

,,1 ,0

. nonzeroany for ,0 if definite, positive and symmetric is

,,1 ,0

.any for ,0 if definite,-semi positive and symmetric is

TAAA if symmetric, is

eigen-decomposition of A

Matrix Norms and Trace

columns. lorthonorma has if,

). trace( ) trace(), trace( )trace(

.by size ofmatrix square afor ,)trace(

.:norm-1

.:norm-F

. of alueeigenlargest theofroot square the :norm-2

:normMatrix

BAABAAAAA

Frobenius norm

Singular Value Decomposition

. of rseigenvecto theforms:

.min and with diagonal is),,(and ,orthogonal are

and,where, :(SVD)ion Decomposit ValueSingular

AAVVVAA

AAUUUAA

(m,n)rdiag

VUAVUA

nnmmnmT

orthogonalorthogonal

diagonal

References

SC_linearAlg_basics.pdf (necessary) SVD_basics.pdf

loaded to HuskyCT

Summary

This is the end of the FIRST chapter of this course

Next Class

Cluster analysis– General topics– K-means

Slides after this one are backup slides, you can also check them to learn more

Machine Learning

Documents

Transcript of Machine Learning

Machine Learning on Spark - UC Berkeley AMP Campampcamp.berkeley.edu/.../Machine-Learning-on-Spark... · Machine Learning on Spark Shivaram Venkataraman ... Machine learning algorithms

Section 1Section 1 Machine Learning basic concepts · Machine Learning TutorialMachine Learning Tutorial ... Section 1Section 1 Machine Learning basic concepts ... Learning Tutorial

Chapter- 6 : Machine Learning - IOE Notesioenotes.edu.np › media › ...Chapter-6-machine-learning... · Chapter- 6 : Machine Learning-Machine learning is a branch of AI that uses

1-Machine Learning in Oracle - ITOUG2017/12/01 · – Oracle Machine Learning (on Oracle Autonomous Data Warehouse Cloud Service) – Automated Machine Learning => Machine Learning

Bayesian Methods for Machine Learning - Machine Learning (Theory)

Introduction To Azure Machine Learning...Expected Learning Outcomes Azure Machine Learning @tetranoodle Supervised Machine Learning Machine Learning Algorithms in Azure ML Workflow

Learning with large datasets Machine Learning Large scale machine learning.

Machine Learning Chapter 11. 2 Machine Learning What is learning?

MACHINE REASONING: A PERSPECTIVE AND …...“From machine learning to machine reasoning”. Machine Learning Volume 94 Issue 2, 2004, Pages 133-149 Machine Learning Volume …

Machine learning for neuroimaging with scikit-learn · Machine learning for neuroimaging with scikit-learn. ... a Python machine learning library, ... Abraham et al. Machine learning

MathWorks - Introducing Machine Learning · 4 ntroducing Machine Learning How Machine Learning Works Machine learning uses two types of techniques: supervised learning, which trains

Machine Learning Lecture 5 Bayesian Learning G53MLE | Machine Learning | Dr Guoping Qiu1.

Machine Learning: Machine Learning: Introduction Introduction

Masters of Craft: Programming - Machine Learning : Research · Masters of Craft: Programming - Machine Learning: Research Janet Choi. Machine Learning Machine Learning is a trend

Hawaii Machine Learning Meetup · Introduction –Why Machine Learning? Machine Learning Quotes ^A breakthrough in machine learning would be worth ten Microsofts. ―Bill Gates For

Introducing Machine Learning · 4 ntroducing Machine Learning How Machine Learning Works Machine learning uses two types of techniques: supervised learning, which trains a model on

Machine Learning: Machine Learning:

Machine Learning and Data Mining: Lecture Noteshertzman/411notes.pdf · CSC 411 / CSC D11 Introduction to Machine Learning 1 Introduction to Machine Learning Machine learning is a

An Introduction to Machine Learning - GitHub Pages · Introduction Machine learning The machine learning process What’s machine learning History Supervised learning Non-supervised

Machine Learning 2. Logistic Regression and LDA · Machine Learning Machine Learning 2. Logistic Regression and LDA Lars Schmidt-Thieme Information Systems and Machine Learning Lab