Part I: Classifier Performance Mahesan Niranjan Department of Computer Science The University of...

Part I: Classifier Performance

Mahesan Niranjan

Department of Computer ScienceThe University of Sheffield

[email protected]

&Cambridge Bioinformatics Limited

[email protected]

BCS, Exeter, July 2004 Mahesan Niranjan 2

Relevant Reading

• Bishop, Neural Networks for Pattern Recognition

• http://www.ncrg.aston.ac.uk/netlab• David Hand, Construction and Assessment of

Classification Rules

• Lovell, et. Al. CUED/F-INFENG/TR.299• Scott et al CUED/F-INFENG/TR.323

reports linked from http://www.dcs.shef.ac.uk/~niranjan

http://www.ncrg.aston.ac.uk/netlab

http://www.dcs.shef.ac.uk/~niranjan


Pattern Recognition Framework


Two Approaches to Pattern Recognition

• Probabilistic via explicit modelling of probabilities encountered in Bayes’ formula

• Parametric form for class boundary and optimise it• In some specific cases (often not) both reduce to the

same answer


Pattern Recognition: Simple case

O Gaussian Distributions Isotropic Equal Variances

Optimal Classifier:

• Distance to mean• Linear Class Boundary


Distance can be misleading

O

Mahalanobis Distance

Optimal Classifier for this case is Fisher Linear Discriminant


Support Vector MachinesMaximum Margin Perceptron

X

XX

X

X

X

O

O

OO

O O

OO

O

O

X

X

XX

XX


Support Vector MachinesNonlinear Kernel Functions

X

XX

X

O OO

O

OO

OX

XX

XX

X

O

OO O

O

O

O


Support Vector MachinesComputations

• Quadratic Programming

• Class boundary defined only by data that lie close to it - support vectors

• Kernels in data space equal scalar products in higher dimensional space

x Axt

0 x Ci


Support Vector MachinesThe Hypes

• Strong theoretical basis - Computational Learning Theory; complexity controlled by the Vapnik-Chervonenkis dimension

• Not many parameters to tune

• High performance on many practical problems, high dimensional problems in particular


Support Vector MachinesThe Truths

• Worst case bounds from Learning theory are not very practical

• Several parameters to tune– What kernel?– Internal workings of the optimiser– Noise in training data

• Performance? – depends on who you ask


SVM: data driven kernel

• Fisher Kernel [Jaakola & Haussler]– Kernel based on a generative model of all the data

p x|

Ux p x ln |

K x x U I Ui j xt

xi j( , ) 1


Classifier Performance

• Error rates can be misleading

– Imbalance in training/test data• 98% of population healthy• 2% population has disease

– Cost of misclassification can change after design of classifier


x

xx

xxx

xx

x

x

x

x

Adverse Outcome

Benign Outcome

Threshold

Class Boundary


Tru

e P

osi

tive

False Positive

Area under the ROC Curve: Neat Statistical Interpretation


Convex Hull of ROC Curves

False Positive

Tru

e P

osi

tive


Yeast Gene Example: MATLAB Demo here

Part II: Particle Filters for Tracking and Sequential

Problems

Mahesan Niranjan

Department of Computer ScienceThe University of Sheffield


Overview

• Motivation

• State Space Model

• Kalman Filter and Extensions

• Sequential MCMC Methods

– Particle Filter & Variants


Motivation

• Neural Networks for Learning:– Function Approximation– Statistical Estimation– Dynamical Systems– Parallel Processing

• Guarantee Generalisation:– Regularise / control complexity– Cross validate to detect / avoid overfitting– Bootstrap to deal with model / data uncertainty

• Many of the above tricks won’t work in a sequential setting


Interesting Applications

• Speech Signal Processing

• Medical Signals

– Monitoring Liver Transplant Patients

• Tracking the prices of Options contracts in

computational finance


Good References• Bar-Shalom and Fortman:

Tracking and Data Association

• Jazwinski:

Stochastic Processes and Filtering Theory

• Arulampalam et al:

“Tutorial on Particle Filters…”; IEEE Transactions on Signal Processing

• Arnaud Doucet:

Technical Report 310, Cambridge University Engineering Department

• Benveniste, A et al:

Adaptive Algorithms and Stochastic Approximation

• Simon Haykin:

Adaptive Filters


Matrix Inversion Lemma


Linear Regression


Recursive Least Squares


State Space Model

State Process Noise

Observation Measurement Noise


Simple Linear Gaussian Model


Kalman Filter

Prediction

Correction


Kalman Filter

Innovation

Kalman Gain


Bayesian SettingPrior Likelihood

Innovation Probability

•Run Multiple Models and Switch - Bar-Shalom•Set Noise Levels to Max Likelihood Values - Jazwinski


Extended Kalman Filter

Lee Feldkamp @ Ford Successful training of Recurrent Neural Networks

Taylor Series Expansion around the operating point

First Order

Second Order

Iterated Extended Kalman Filter


Iterated Extended Kalman Filter

Local Linearization of State and / or Observation

Propagation and Update


Unscented Kalman FilterGenerate some points at time

So they can represent the mean and covariance:

Propagate these through the state equations

Recompute predicted mean and covariance:


Recipe to define:Recompute:


Formant Tracking Example

Linear Filter

Excitation Speech


Formant Tracking Example


Formant Track Example


Grid-based methods

Discretize continuous state into “cells”

Integrating probabilities over each partition

Fixed partitioning of state space


Sampling Methods: Bayesian Inference

Parameters

Uncertainty over parameters

Inference:


Basic Tool: Composition [Tanner]

To generate samples of


Importance Sampling


Particle Filters

Prediction

Weights of Sample

Bootstrap Filters ( Gordon et al, Tracking ) CONDENSATION Algorithm ( Isard et al, Vision )


Sequential Importance Sampling

Recursive update of weights

Only upto a constant of proportionality


Degeneracy in SIS

Variance of weights monotonically increases All except one decay to zero very rapidly

Effective number of particles

Resample if


Sampling, Importance Re-Sampling (SIR)

Multiply samples of high weight; kill off samples in parts of space not relevant “Particle Collapse”


Marginalizing Part of the State Space

Suppose

Possible to analytically integrate with respect to part of the state space

Sample with respect to

Integrate with respect to

Rao-Blackwell


Variations to the Basic Algorithm

• Integrate out part of the state space– Rao-Blackwellized particle filters

( e.g. Multi-layer perceptron with linear output layer )• Variational Importance Sampling ( Lawrence et al )

• Auxilliary Particle Filters ( Pitt et al )• Regularized Particle Filters • Likelihood Particle Filters


Regularised PF: basic idea

Samples

Kernel Density

Resample

Propagate in time


Conclusion / Summary

• Collection of powerful algorithms

• New and interesting signal processing problems

Part I: Classifier Performance Mahesan Niranjan Department of Computer Science The University of...

Documents

Transcript of Part I: Classifier Performance Mahesan Niranjan Department of Computer Science The University of...