Linear Models (I)

Rong Jin

Review of Information Theory What is information? What is entropy?

Average information Minimum coding length Important inequality

1( ) logi

H P pp

1 1( ) log logi i

i ii i

H P p pp q

Distribution for Generating Symbols

Distribution for Coding Symbols

Review of Information Theory (cont’d) Mutual information

Measure the correlation between two random variables Symmetric

Kullback-Leibler distance

Difference between two distributions

( , )( ; ) ( ) ( | ) ( , ) log

( ) ( )x y

P x yI X Y H X H X Y P x y

P x P y

~( ) ( )

( , ) ( ) log [log ]( ) ( )D

D DD M D x Px

P x P xKL P P P x E

P x P x

Outline Classification problems Information theory for text classification Gaussian generative Naïve Bayes Logistic regression

Classification ProblemsYXf :

XInput Y Output?

• Given input X={x1, x2, …, xm}

• Predict the class label y

• y{-1,1}, binary class classification problems

• y {1, 2, 3, …, c}, multiple class classification problems

• Goal: need to learn the function: YXf :

Examples of Classification Problems Text categorization:

Input features: words ‘campaigning’, ‘efforts’, ‘Iowa’, ‘Democrats’, … Class label: ‘politics’ and ‘non-politics’

Image Classification:

Input features: color histogram, texture distribution, edge distribution, … Class label: ‘bird image’ and ‘non-bird image’

Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, …

Topic: politics

Which is a bird image?

Learning Setup for Classification Problems Training examples:

Identical Independent Distribution (i.i.d.)

Training examples are similar to testing examples Goal

Find a model or a function that is consistent with the training data

1 1 2 2{ , , , ,..., , }train n nD x y x y x y

Information Theory for Text Classification

If coding distribution is similar to the generating distribution short coding length good compression rate

1 1( ) log logi i

i ii i

H P p pp q

Distribution for Generating Symbols

Distribution for Coding Symbols

Compression Algorithm for TC

Compression Model M1

Compression Model M2

Politics

Sports

New Document

16K bits

10K bits

Topic:

Sports

Probabilistic Models for Classification Problems Apply statistical inference methods

Key: finding the best parameters Maximum likelihood (MLE) approach

Log-likelihood of data

Find the parameters that maximizes the log-likelihood

1( ) log ( | ; )

ntrain i ii

l D p y x

max ( ) log ( | ; )n

train i iil D p y x

Training Examples

{ , }i ix y

Learning a Statistical Model

Prediction

p(y|x;)

Generative Models Not directly estimate p(y|x;) Using Bayes rule

Estimate p(xly;) instead of p(y|x;)

Why p(xly;)? Most well known distributions are p(xl). Allocate a separate set of parameters for each class

{1, 2,…, c}

p(xly;) p(xly) Describes the special input patterns for each class y

( ; ) ( | ; )( | ; )

( , ; )

p y p x yp y x

Gaussian Generative Model (I) Assume a Gaussian model for each class One dimension case

Results for MLE

( )1( | ; ) exp

{ , ,..., }, ={ , , ( )}

c k k k

xp x y

2{ | } { | }

1 1, ( )

| { | } | | { | } |

| { | } |( )

n nk i k i ki y k i y k

x xi y k i y k

i y kp y k

( ; ) ( | ; )( | ; )

( , ; )

p y p x yp y x

Example

1.7, 0.1, 0.5

1.5, 0.2, 0.5male male male

female female female

• Height histogram for males and females.

• Using Gaussian generative model

• P(male|1.8) = ? , P(female|1.4) = ?

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female

Gaussian Generative Model (II) Consider multiple input features

X={x1, x2, …, xm} Multi-variate Gaussian distribution

y is a mm covariance matrix Results for MLE

Problem: Singularity of y : too many parameters

/ 2 1/ 2

1 1( | ; ) ~ ( , ) exp

22 | |

( , , ( 1),..., , , ( ))

y y y y ymy

p x y N x x

p y p y k

,{ , | , }

| { | } |

1( )( )

|{ | } | | { | } |

ny ii y y

ni j i i j jy s y s ys t y y y y

xi y y

x xs y y t y y

Overfitting Issue Complex model Insufficient training

Consider a classification problem of multiple inputs 100 input features 5 classes 1000 training examples

Total number parameters for a full Gaussian model is 5 means 500 parameters 5 covariance matrices 50,000 parameters 50,500 parameters insufficient training data

Another Example of Overfitting

-6 -4 -2 0 2 4 6-0.6

-6 -4 -2 0 2 4 6-0.8

-8 -6 -4 -2 0 2 4 6 8-1

Naïve Bayes Simplify the model complexity

Diagonalize the covariance matrix y

Simplified Gaussian distribution

Feature independence assumption Naïve Bayes assumption

211/ 2 2

1 21 1

( )1( | ; ) exp ( | ; )

{ , ,..., }, { , ,..., , }

i i mm y i i

iimim y

m i i i i ic c

xp x y p x y

Naïve Bayes A terrible estimator for But it is a very reasonable estimator for

The ratio of likelihood is more important

Naïve Bayes does a reasonable job on the estimation of ratio

( | ; )p x y

( | ; )p y x

' 1 ' 1

( ; ) ( | ; ) 1( | ; )

( '; ) ( | '; )( '; ) ( | '; )( '; ) ( | '; )

c cy y

p y p x yp y x

p y p x yp y p x yp y p x y

( | '; )

The Ratio of Likelihood Binary class

Both classes share the similar variance

12 221

( 1) ( | 1) ( 1)log log

( 1) ( | 1) ( 1)

2( 1)log

2 ( ,..., )

i i i im

i i i i im

x xp y p x y p y

p y p x y p y

x diag

( 1)log

1{ ,..., }m

• A linear model !

Decision Boundary

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20

Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female

• Gaussian Generative Models == Finding a linear decision boundary

• Why not do it directly?

Linear Models (I)

Documents

Transcript of Linear Models (I)

Linear Models Notes

Chapter 3: Linear & Non-Linear Interaction Modelsmcrane/CA659/CA659NewSlidesCh3_2Up.pdf · 2011. 3. 22. · Linear & Non-Linear Interaction Models Introduction Linear Models Non-Linear

1 Linear models and linear mixed effects models in R: Tutorial 11 ...

An Introduction to Linear and Logit Mixed Models - Day 1 · Generalized Linear Mixed Models Florian Jaeger Generalized Linear ... I I am also grateful for feedback from: I Austin

Generalized Linear Models

Linear and Generalized Linear Models - Lecture 10njc23/Lecture10.pdf · Fit Linear Models Inference Model Diagnostics Model Selection Descriptive Plots Generalized Linear Models Example

Linear Models

Linear Hierarchical Models

USING LINEAR REGRESSION AND MIXED MODELS TO …d-scholarship.pitt.edu/28089/1/Freyder_thesis_updated_6_2016.pdf · USING LINEAR REGRESSION AND MIXED MODELS ... I ran a simple linear

Basics of regression analysis I Purpose of linear models Least-squares solution for linear models Analysis of diagnostics.

Linear models and linear mixed effects models in R: …bodowinter.com/tutorial/bw_LME_tutorial1.pdf · Linear models and linear mixed effects models in R: Tutorial 11 Bodo Winter

Functions, Equations, and Graphs Ch. 2.5 Using Linear Models EQ: How can I write linear equations that model real-world data? I will write linear equations.

Design Issues for Generalized Linear Models: A Reviewarchived.stat.ufl.edu/personnel/usrpages/STS159.pdf · Design Issues for Generalized Linear Models: A Review André I. Khuri,

Linear & Generalized Linear Mixed Models · Cross-sectional data −→Hierarchical data Linear regression models −→Linear mixed models Generalized linear models −→Generalized

Introduction to Generalized Linear Mixed Models, Matrix …€¦ · Introduction to Generalized Linear Mixed Models and Mathematical Statistics Generalized Linear Mixed Models Lecture

Generalized Linear Models and Extensions · 2010. 12. 16. · Session 1 - Generalized linear models † Introduction † Motivating examples † History † Generalized linear models

Linear Models - Numerical Methods for Deep Learning · I Part 1: Linear Models 1.Introduction and Applications 2.Linear Models: Least-Squares and Logistic Regression I Part 2: Neural

Linear Models & Linear Regression

Dynamic Linear Models (DLMs) or state space models de ne a ...ghuerta/tseries/dlmch2.pdf · Introduction to Dynamic Linear Models Dynamic Linear Models (DLMs) or state space models

Introduction to Generalized Linear Models · Generalized linear models extend the general linear model framework to address both of these issues. Introduction Generalized Linear Models