Chapter 6 The Structural Risk Minimization Principle Junping Zhang [email protected] Intelligent...

108
Chapter 6 The Structural Risk Minimization Principle Junping Zhang [email protected] Intelligent Information Processing Labor atory, Fudan University March 23, 2004

Transcript of Chapter 6 The Structural Risk Minimization Principle Junping Zhang [email protected] Intelligent...

Page 1: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Chapter 6 The Structural Risk Minimization Principle

Junping [email protected]

Intelligent Information Processing Laboratory, Fudan UniversityMarch 23, 2004

Page 2: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Objectives

Page 3: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Structural risk minimization

Page 4: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Two other induction principles

Page 5: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The Scheme of the SRM induction principle

Page 6: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Real-Valued functions

Page 7: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 8: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 9: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 10: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 11: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 12: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 13: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Principle of SRM

Page 14: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 15: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 16: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 17: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 18: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 19: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 20: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 21: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

SRM

Page 22: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 23: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Minimum Description Length and SRM inductive principles

The idea about the Nature of Random Phenomena

Minimum Description Length Principle for the Pattern Recognition Problem

Bounds for the MDL SRM for the simplest Model and MDL The Shortcoming of the MDL

Page 24: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The idea about the Nature of Random Phenomena

Probability theory (1930s, Kolmogrov) Formal inference Axiomatization hasn’t considered nat

ure of randomness Axioms: given probability measures

Page 25: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The idea about the Nature of Random Phenomena

The model of randomness Solomonoff (1965), Kolmogrov (1965), Ch

aitin (1966). Algorithm (descriptive) complexity

The length of the shortest binary computer program

Up to an additive constant does not depend on the type of computer.

Universal characteristic of the object.

Page 26: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

A relatively large string describing an object is random If algorithm complexity of an object is high If the given description of an object cannot be

compressed significantly. MML (Wallace and Boulton, 1968)& MDL

(Rissanen, 1978) Algorithm Complexity as a main tool of induc

tion inference of learning machines

Page 27: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Minimum Description Length Principle for the Pattern Recognition Problem

Given l pairs containing the vector x and the binary value ω

Consider two strings: the binary string

Page 28: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Question

Q: Given (147), is the string (146) a random object?

A: to analyze the complexity of the string (146) in the spirit of Solomonoff-Kolmogorov-Chaitin ideas

Page 29: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Compress its description

Since ω i i=1,…l are binary values, the string (146) is described by l bits.

Since training pairs were drawn randomly and independently.

The value ω i depend on the vector xi but not on the vector xj.

Page 30: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Model

Page 31: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 32: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 33: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 34: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

General Case: not contain the perfect table.

Page 35: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 36: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Randomness

Page 37: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Bounds for the MDL

Q: Does the compression coefficient

K(T) determine the probability of the test error in classification (decoding) vectors x by the table T?

A: Yes

Page 38: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Comparison between the MDL and ERM in the simplest model

Page 39: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 40: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 41: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

SRM for the simplest Model and MDL

Page 42: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

SRM for the simplest Model and MDL

Page 43: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 44: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The power of compression coefficient

To obtain bound for the probability of error

Only information about the coefficient need to be known.

Page 45: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The power of compression coefficient

How many examples we used How the structure of code books was

organized Which code book was used and how

many tables were in this code book. How many errors were made by the

table from the code book we used.

Page 46: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

MDL principle

To minimize the probability of error One has to minimize the coefficient

of compression

Page 47: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The shortcoming of the MDL

MDL uses code books with a finite number of tables.

Continuously depends on parameters, one has to first quantize that set to make the tables.

Page 48: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Quantization

How do we make the ‘smart’ quantization for a given number of observations.

For a given set of functions, how can we construct a code book with a small number of tables but with good approximation ability?

Page 49: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The shortcoming of the MDL

Finding a good quantization is extremely difficult and determines the main shortcoming of MDL principle.

The MDL principle works well when the problem of constructing reasonable code books has a good solution.

Page 50: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Consistency of the SRM principle and asymptotic bounds on the rate of convergence

Q: Is the SRM consistent? What is the bound on the

(asymptotic) rate of convergence?

Page 51: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 52: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 53: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Consistency of the SRM principle.

Page 54: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Simplification version

Page 55: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 56: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 57: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 58: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 59: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Remark

To avoid choosing the minimum of functional (156) over the infinite number of elements of the structure.

Additional constraint Choose the minimum from the first l

elements of the structure where l is equal to the number of observations.

Page 60: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 61: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 62: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Discussions and Example

Page 63: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The rate of convergence is determined by two contradictory requirements on the rule n=n(l). The first summand: The larger n=n(l) , the small

er is the deviation The second summand: The larger n=n(l), the lar

ger deviation For structures with a known bound on the r

ate of approximation, select the rule that assures the largest rate of convergence.

Page 64: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 65: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 66: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 67: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 68: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 69: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 70: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Bounds for the regression estimation problem

Page 71: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The model of regression estimation by series expansion

Page 72: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 73: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 74: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 75: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 76: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 77: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Example

Page 78: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 79: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 80: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

The problem of approximating functions

Page 81: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 82: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 83: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 84: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 85: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 86: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 87: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 88: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 89: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 90: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 91: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

To get high asymptotic rate of approximation

the only constraint is that the kernel should be a bounded

function which can be described as a family of functions possessing finite VC dimension.

Page 92: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Problem of local risk minimization

Page 93: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 94: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 95: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 96: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Local Risk Minimization Model

Page 97: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 98: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 99: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 100: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 101: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 102: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 103: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 104: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 105: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.
Page 106: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Note

Using local risk minimization methods, one probably does not need rich sets of approximating functions. Whereas the classical semi-local

methods are based on using a set of constant functions.

Page 107: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Note

For local estimation functions in the one-dimensional case, it is probably enough to consider elements Sk, k=0,1,2,3 containing the polynomials of degree 0,1,2,3

Page 108: Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University.

Summary MDL SRM Local Risk Functional