Project Now it is time to think about the project It is a team work Each team will consist of 2...
-
date post
20-Dec-2015 -
Category
Documents
-
view
224 -
download
0
Transcript of Project Now it is time to think about the project It is a team work Each team will consist of 2...
![Page 1: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/1.jpg)
Project Now it is time to think about the project It is a team work
Each team will consist of 2 people It is better to consider a project of your own
Otherwise, I will assign you to some “difficult” project . Important date
03/11: project proposal due 04/01: project progress report due 04/22 and 04/24: final presentation 05/03: final report due
![Page 2: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/2.jpg)
Project Proposal What do I expect?
Introduction: describe the research problem that you try to solve Related wok: describe the existing approaches and their deficiency Proposed approaches: describe your approaches and why it may have
potential to alleviate the deficiency with existing approaches Plan: what you plan to do in this project?
Format It should look like a research paper The required format (both Microsoft Word and Latex) can be
downloaded from www.cse.msu.edu/~cse847/assignments/format.zip
![Page 3: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/3.jpg)
Project Progress Report Introduction: overview the problem that you try to solve and
the solutions that you present in the proposal Progress
Algorithm description in more details Related data collection and cleanup Preliminary results
Format should be same as the project report
![Page 4: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/4.jpg)
Project Final Report It should like a research paper that is ready for
submission to research conferences What do I expect?
Introduction Algorithm description and discussion Empirical studies
I am expecting careful analysis of results no matter if it is a successful approach or a complete failure
Presentation 25 minute presentation 5 minute discussion
![Page 5: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/5.jpg)
Exponential Model and Maximum Entropy Model
Rong Jin
![Page 6: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/6.jpg)
Recap: Logistic Regression Model Assume the inputs and outputs are related in the log
linear function
Estimate weights: MLE approach
1 2
1( | ; )
1 exp ( )
{ , ,..., , }m
p y xy x w c
w w w c
*21
21 1
max ( ) max log ( | ; )
1max log
1 exp( )
nreg train i iiw w
n mji jw
w l D p y x s w
s wy x w c
1 2{ , ,..., , }mw w w c
![Page 7: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/7.jpg)
How to Extend Logistic Regression Model to Multiple Classes? y{+1, -1} {1,2,…,C}?
1 2
1( | ; )
1 exp ( )
{ , ,..., , }m
p y xy x w c
w w w c
( 1 | )log
( 1| )
p y xx w c
p y x
![Page 8: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/8.jpg)
Conditional Exponential Model Introduce a different set of parameters for
each class
Ensure the sum of probability to be 1
( | ; ) exp( ) { , }y y y y yp y x c x w c w
1( | ; ) exp( )
( )
( ) exp( )
y y
y yy
p y x c x wZ x
Z x c x w
( | ; )p y x
![Page 9: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/9.jpg)
Conditional Exponential Model Predication probability
Model parameters: For each class y, we have weights wy and threshold cy
Maximum likelihood estimation
exp( )( | ; ) , {1,2,..., }
exp( )y y
y yy
c x wp y x y C
c x w
1 1
exp( )( ) log ( | ) log
exp( )i iN N y i y
train i ii iy i yy
c x wl D p y x
c x w
Any Problems?
![Page 10: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/10.jpg)
Conditional Exponential Model Add a constant vector to every weight vector, we have the
same log-likelihood function
Not unique optimum solution! How to resolve this problem?
0 0
0 0
10 0
1
,
exp( )( ) log
exp( )
exp( )log
exp( )
i i
i i
y y y y
N y i ytrain i
y i yy
N y i y
iy i yy
w w w c c c
c c x w wl D
c c x w w
c x w
c x w
Solution: Set w1 to be a zero vector and c1 to be zero
![Page 11: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/11.jpg)
Modified Conditional Exponential Model
Prediction probability
Model parameters: For each class y>1, we have weights wy and threshold cy
Maximum likelihood estimation
' '' 1
' '' 1
exp( ){2,..., }
1 exp( )( | ; )
11
1 exp( )
y y
y yy
y yy
c x wy C
c x wp y x
yc x w
1
{ | 1} { | 1}1 1
( ) log ( | )
exp( )1log log
1 exp( ) 1 exp( )i i
i i
Ntrain i ii
y i y
i y i yy i y y i yy y
l D p y x
c x w
c x w c x w
![Page 12: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/12.jpg)
Maximum Entropy Model: Motivation Consider a translation example English ‘in’ French {dans, en, à, au-cours-de, pendant} Goal: p(dans), p(en), p(à), p(au-cours-de), p(pendant) Case 1: no prior knowledge on tranlation
What is your guess of the probabilities?
![Page 13: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/13.jpg)
Maximum Entropy Model: Motivation Consider a translation example English ‘in’ French {dans, en, à, au cours de, pendant} Goal: p(dans), p(en), p(à), p(au-cours-de), p(pendant) Case 1: no prior knowledge on tranlation
What is your guess of the probabilities? p(dans)=p(en)=p(à)=p(au-cours-de)=p(pendant)=1/5
Case 2: 30% of times either dans or en is used
![Page 14: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/14.jpg)
Maximum Entropy Model: Motivation Consider a translation example English ‘in’ French {dans, en, à, au cours de, pendant} Goal: p(dans), p(en), p(à), p(au-cours-de), p(pendant) Case 1: no prior knowledge on tranlation
What is your guess of the probabilities? p(dans)=p(en)=p(à)=p(au-cours-de)=p(pendant)=1/5
Case 2: 30% of times either dans or en is used What is your guess of the probabilities? p(dans)=p(en)=3/20 p(à)=p(au-cours-de)=p(pendant)=7/30
Uniform distribution is favored
![Page 15: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/15.jpg)
Maximum Entropy Model: Motivation Case 3: 30% of time dans or en is used, and 50% of times
dans or à is used What is your guess of the probabilities?
![Page 16: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/16.jpg)
Maximum Entropy Model: Motivation Case 3: 30% of time dans or en is used, and 50% of times dans
or à is used What is your guess of the probabilities?
A good probability distribution should Satisfy the constraints Be close to uniform distribution, but how?
Measure Uniformality using
Kullback-Leibler Distance !
![Page 17: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/17.jpg)
Maximum Entropy Principle (MaxEnt) A uniformity of distribution is measured by entropy of the
distribution
Solution: p(dans) = 0.2, p(a) = 0.3, p(en)=0.1, p(au-cours-de) = 0.2, p(pendant) = 0.2
* max ( )
where ( ) ( ) log ( ) ( ) log ( ) ( ) log ( )
( ) log ( ) ( ) log ( )
subject to
( ) ( ) 3/10
( ) ( ) 1/ 2
( ) ( ) ( ) (
PP H P
H P p dans p dans p en p en p a p a
p au course de p au course de p pendant p pendant
p dans p en
p dans p a
p dans p en p a p au cours d
) ( ) 1e p pendant
![Page 18: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/18.jpg)
MaxEnt for Classification Problems Want a p(y|x) to be close to a uniform distribution
Maximize the conditional entropy of training data
Constraints Valid probability distribution
From training data: the model should be consistent with data For each class, model mean of x = empirical mean of x
1 1
1 1[1,2,..., ] ( | ) ( , )
N Ni i i ii i
y C p y x x x y yN N
1 1( | ) ( | ) ( | ) log ( | )
N Ni i ii i y
H y x H y x p y x p y x
, ( | ) 1iyi p y x
![Page 19: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/19.jpg)
MaxEnt for Classification Problems Want a p(y|x) to be close to a uniform distribution
Maximize the conditional entropy of training data
Constraints Valid probability distribution
From training data: the model should be consistent with data For each class, model mean of x = empirical mean of x
1 1
1 1[1,2,..., ] ( | ) ( , )
N Ni i i ii i
y C p y x x x y yN N
1 1( | ) ( | ) ( | ) log ( | )
N Ni i ii i y
H y x H y x p y x p y x
, ( | ) 1iyi p y x
![Page 20: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/20.jpg)
MaxEnt for Classification Problems
Requiring the mean be consistent between the empirical data and the model
No assumption about the parametric form for likelihood Only assume it is C2 continuous
1
1
1 1
max ( | ) max ( | )
max ( | ) log ( | )
subject to
( | ) ( , ), ( | )=1
Niip p
Ni i ii yp
N Ni i i i ii i y
H y x H y x
p y x p y x
p y x x x y y p y x
![Page 21: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/21.jpg)
MaxEnt Model Consistency with data is ensured by the equality
constraints
For each feature, the empirical mean equal to the model mean Beyond feature vector x:
1 1( | ) ( , )
N Ni i i ii i
p y x x x y y
1 1( | ) ( ) ( ) ( , )
N Ni k i k i ii i
p y x f x f x y y
![Page 22: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/22.jpg)
Translation Problem Parameters: p(dans), p(en), p(au), p(a), p(pendant) Represent each French word with two features
{dans, en} {dans, a}
dans 1 1
en 1 0
au-cours-de 0 0
a 0 1
pendant 0 0
Empirical Average 0.3 0.5
![Page 23: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/23.jpg)
Constraints
1 1( | ) ( , )
N Ni i i ii i
p y x x x y y
( ) (1,1) ( ) (1,0) ( ) (0,0) ( )(0,1) ( ) (0,0)
(0.3,0.5)
p dans p en p au p a p pendant
( ) ( ) 3/10
( ) ( ) 1/ 2
p dans p en
p dans p a
![Page 24: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/24.jpg)
Solution to MaxEnt Surprisingly, the solution is just conditional
exponential model without thresholds
Why?
exp( )( | ; )
exp( )y
yy
x wp y x
x w
![Page 25: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/25.jpg)
Solution to MaxEntS o lv e t h e m a x im u m e n t ro p y p r o b l e m , i .e .
1 1
m a x ( | ) m a x ( | ) lo g ( | )
s u b j e c t t o ( | ) ( , ) , ( | ) 1
i iyp p
N Ni i ii i y
H y x p y x p y x
p y x x x y y p y x
T o m a x im iz e t h e e n t r o p y f u n c t io n u n d e r t h e c o n s t r a in t s , w e c a n in t r o d u c e a s e t o f L a g r a n g i a n m u l t i p l i e r s i n to t h e o b j e c t iv e f u n c t io n , i . e . ,
1 1 1( | ) ( | ) ( , ) ( | ) 1
N N Ny i i iy i i i y
G H y x p y x x x y y p y x
S e t t i n g t h e f i r s t d e r iv a t i v e o f G w i th r e s p e c t ( | )ip y x t o b e z e r o , w e h a v e e q u a t io n s
lo g ( | ) 0 ( | ) e x p ( )( | ) i y i i y i
i
Gp y x x p y x x
p y x
S in c e ( | ) 1iyp y x , w e h a v e ( | )ip y x
a s
e x p ( )( | )
e x p ( )
yi
yy
xp y x
x
![Page 26: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/26.jpg)
Maximum Entropy Model versusConditional Exponential Model
1
1 1
max ( | )
max ( | ) log ( | )
subject to
( | ) ( , ),
( | )=1,
p
Ni i i ii yp
N Ni ii i
iy
H y x
p y x p y x
p y x x x y y i
p y x i
Maximum Entropy Model
{ }
1{ }
exp( )( | ; )
exp( )
max ( )
exp( )max log
exp( )
y
i
y
y
yy
trainw
N i y
iw i yy
x wp y x
x w
l D
x w
x w
Conditional Exponential Model
Dual
Problem
![Page 27: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/27.jpg)
Maximum Entropy Model vs. Conditional Exponential Model
However, where is the threshold term c?
{ , }
1 2{ , }
exp( )( | ; )
exp( )
max ( )
exp( )max log
exp( )
y y
i
y y
y y
y yy
reg trainc w
N i yyi yc w i yy
c x wp y x
c x w
l D
x ws w
x w
Maximum Entropy Conditional Exponential
{ }
1{ }
exp( )( | ; )
exp( )
max ( )
exp( )max log
exp( )
y
i
y
y
yy
trainw
N i y
iw i yy
x wp y x
x w
l D
x w
x w
![Page 28: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/28.jpg)
Solving Maximum Entropy Model
Iterative scaling algorithm Assume
1
1 1
max ( | ) max ( | ) log ( | )
subject to: ( | ) ( , ), ( | )=1
Ni i i ii yp p
N Ni i ii i y
H y x p y x p y x
p y x x x y y p y x
,
,1
[1... ], [1... ] : 0
[1... ] : , where is a constant independent from
i j
di jj
i n j d x
i n x g g i
![Page 29: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/29.jpg)
Solving Maximum Entropy Model Compute the empirical mean for each feature of every class,
i.e., for every j and every class y
Start w1 ,w2 …, wc = 0 Repeat
Compute p(y|x) for each training data point (xi, yi) using w and c from the previous iteration
Compute the mean of each feature of every class using the estimated probabilities, i.e., for every j and every y
Compute for every j and every y
Update w as
, ,1( , )
Ny j i j ii
e x y y N
, ,1( | )
Ny j i j ii
m x p y x N
, , ,j y j y j yw w w
, , ,log logj y j y j yw e m
![Page 30: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/30.jpg)
Solving Maximum Entropy Model Compute the empirical mean for each feature of every class,
i.e., for every j and every class y
Start w1 ,w2 …, wc = 0 Repeat
Compute p(y|x) for each training data point (xi, yi) using w from the previous iteration
Compute the mean of each feature of every class using the estimated probabilities, i.e., for every j and every y
Compute for every j and every y
Update w as
, ,1( , )
Ny j i j ii
e x y y N
, ,1( | )
Ny j i j ii
m x p y x N
, , ,j y j y j yw w w
, , ,1
log logj y j y j yw e mg
![Page 31: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/31.jpg)
Solving Maximum Entropy Model The likelihood function always increases !
![Page 32: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/32.jpg)
Solving Maximum Entropy Model How about each feature can take both positive and
negative values?
How about the sum of features is not a constant?
How to apply this approach to conditional exponential model with bias term (or threshold term)?
![Page 33: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/33.jpg)
Improved Iterative Scaling It only requires all the input features to be positive Compute the empirical mean for each feature of every class,
i.e., for every j and every class y Start w1 ,w2 …, wc = 0 Repeat
Compute p(y|x) for each training data point (xi, yi) using w and c from the previous iteration
Solve for every j and every y
Update w as
, , ,j y j y j yw w w
, , , ,1
( | ) expi j i j y i j y ii jx p y x w x e
N
, ,1( , )
Ny j i j ii
e x y y N
![Page 34: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/34.jpg)
Choice of Features
A feature does not have to be one of the inputs For maximum entropy model, bound features are more
favorable. Very often, people use binary feature
Feature selection Features with small weights are eliminated
1
1
1 1
max ( | ) max ( | )
max ( | ) log ( | )
subject to : ( | ) ( ) ( ) ( , ), ( | )=1
Niip p
Ni i ii yp
N Ni i i i ii i y
H y x H y x
p y x p y x
y p y x f x f x y y p y x
![Page 35: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/35.jpg)
Feature Selection vs. Regularizers
Regularizer sparse solution automatic feature selection But, L2 regularizer rarely results in features with zero
weights not appropriate for feature selection For the purpose of feature selection, usually using L1 norm
2
2,1
( ) ( )
exp( )log
exp( )i i
reg train train yy
N y i yy ji y j
y i yy
l D l D s w
c x ws w
c x w
![Page 36: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/36.jpg)
Feature Selection vs. Regularizers
Regularizer sparse solution automatic feature selection But, L2 regularizer rarely results in features with zero
weights not appropriate for feature selection For the purpose of feature selection, usually using L1 norm
2
2,1
( ) ( )
exp( )log
exp( )i i
reg train train yy
N y i yy ji y j
y i yy
l D l D s w
c x ws w
c x w
1
,1
( ) ( )
exp( )log
exp( )i i
reg train train yy
N y i yy ji y j
y i yy
l D l D s w
c x ws w
c x w
![Page 37: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/37.jpg)
Solving the L1 Regularized Conditional Exponential Model
Solving the L1 regularized conditional exponential model directly is rather difficult Because the absolute value is a discontinuous function
Any suggestion to alleviate this problem?
1
,1
( ) ( )
exp( )log
exp( )i i
reg train train yy
N y i yy ji y j
y i yy
l D l D s w
c x ws w
c x w
![Page 38: Project Now it is time to think about the project It is a team work Each team will consist of 2 people It is better to consider a project of your.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649d4e5503460f94a2cf74/html5/thumbnails/38.jpg)
Solving the L1 Regularized Conditional Exponential Model
,1
exp( )arg max log
exp( )i iN y i y
y ji y jy i yy
c x ws w
c x w
,1
, , , ,
exp( )arg max log
exp( )
subject to , : 0, and
i iN y i yy ji y j
y i yy
y j y j y j y j
c x ws t
c x w
y j t t w t
Slack Variables