Maxent Models (II)
Transcript of Maxent Models (II)
![Page 1: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/1.jpg)
Maxent Models (II)
CMSC 473/673
UMBC
September 20th, 2017
![Page 2: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/2.jpg)
Announcements: Assignment 1
Due 11:59 PM, Saturday 9/23
~3.5 days
Use submit utility with:class id cs473_ferraroassignment id a1
We must be able to run it on GL!Common pitfall #1: forgetting filesCommon pitfall #2: incorrect paths to filesCommon pitfall #3: 3rd party libraries
![Page 3: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/3.jpg)
Announcements: Question 6
𝑝(𝑛) 𝑥𝑖 𝑥𝑖−𝑛+1: 𝑥𝑖−1) = 𝜆𝑛𝑓(𝑛) 𝑥𝑖−𝑛+1: 𝑥𝑖 + 1 − 𝜆𝑛 𝑝(𝑛−1)(𝑥𝑖|𝑥𝑖−𝑛+2: 𝑥𝑖−1)
𝑝(𝑛) 𝑥𝑖 𝑥𝑖−𝑛+1: 𝑥𝑖−1) =
𝜆𝑛,𝑛𝑓(𝑛) 𝑥𝑖−𝑛+1: 𝑥𝑖 +
𝜆𝑛,𝑛−1𝑓(𝑛−1) 𝑥𝑖−𝑛+2: 𝑥𝑖 +
⋯
𝜆𝑛,0𝑓(0) ⋅
𝜆𝑛,0 = 1 −
𝑚=0
𝑛−1
𝜆𝑛,𝑛−𝑚
![Page 4: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/4.jpg)
Announcements: Course Project
Official handout will be out Friday 9/22
Until then, focus on assignment 1
Teams of 1-3
Mixed undergrad/grad is encouraged but not required
Some novel aspect is needed
Ex 1: reimplement existing technique and apply to new domain
Ex 2: reimplement existing technique and apply to new (human) language
Ex 3: explore novel technique on existing problem
![Page 5: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/5.jpg)
Recap from last time…
![Page 6: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/6.jpg)
Classify or Decode with Bayes Rule
how well does text Yrepresent label X?
how likely is label X overall?
For “simple” or “flat” labels:* iterate through labels* evaluate score for each label, keeping only the best (n best)* return the best (or n best) label and score
![Page 7: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/7.jpg)
Classification Evaluation:the 2-by-2 contingency table
ActuallyCorrect
Actually Incorrect
Selected/Guessed
True Positive (TP)
False Positive (FP)
Not selected/not guessed
False Negative (FN)
True Negative (TN)
Classes/Choices
Correct Guessed Correct Guessed
Correct Guessed Correct Guessed
![Page 8: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/8.jpg)
Classification Evaluation:Accuracy, Precision, and Recall
Accuracy: % of items correct
Precision: % of selected items that are correct
Recall: % of correct items that are selected
Actually Correct Actually Incorrect
Selected/Guessed True Positive (TP) False Positive (FP)
Not select/not guessed False Negative (FN) True Negative (TN)
![Page 9: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/9.jpg)
Language Modeling as Naïve Bayes Classifier
Adopt naïve bag of words representation Yi
Assume position doesn’t matter
Assume the feature probabilities are independent given the class X
![Page 10: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/10.jpg)
Naïve Bayes Summary
Potential Advantages
Very Fast, low storage requirements
Robust to Irrelevant Features
Very good in domains with many equally important features
Optimal if the independence assumptions hold
Dependable baseline for text classification (but often not the best)
Potential Issues
Model the posterior in one go?
Are the features really uncorrelated?
Are plain counts always appropriate?
Are there “better” ways of handling missing/noisy data?
(automated, more principled)
![Page 11: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/11.jpg)
Naïve Bayes Summary
Potential Advantages
Very Fast, low storage requirements
Robust to Irrelevant Features
Very good in domains with many equally important features
Optimal if the independence assumptions hold
Dependable baseline for text classification (but often not the best)
Potential Issues
Model the posterior in one go?
Are the features really uncorrelated?
Are plain counts always appropriate?
Are there “better” ways of handling missing/noisy data?
(automated, more principled)
Model the posterior in one go?
Are the features really uncorrelated?
Are plain counts always appropriate?
Are there “better” ways of handling missing/noisy data?
(automated, more principled)
Relevant for classification…
![Page 12: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/12.jpg)
Naïve Bayes Summary
Potential Advantages
Very Fast, low storage requirements
Robust to Irrelevant Features
Very good in domains with many equally important features
Optimal if the independence assumptions hold
Dependable baseline for text classification (but often not the best)
Potential Issues
Model the posterior in one go?
Are the features really uncorrelated?
Are plain counts always appropriate?
Are there “better” ways of handling missing/noisy data?
(automated, more principled)
Model the posterior in one go?
Are the features really uncorrelated?
Are plain counts always appropriate?
Are there “better” ways of handling missing/noisy data?
(automated, more principled)
Relevant for classification… andlanguage modeling
![Page 13: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/13.jpg)
Maximum Entropy Models
a more general language model
argmax𝑋𝑝 𝑌 𝑋) ∗ 𝑝(𝑋)
![Page 14: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/14.jpg)
Maximum Entropy Models
a more general language model
classify in one go
argmax𝑋𝑝 𝑌 𝑋) ∗ 𝑝(𝑋)
argmax𝑋𝑝 𝑋 𝑌)
![Page 15: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/15.jpg)
ATTACKThree people have beenfatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region.
We need to score the different combinations.
Document Classification
![Page 16: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/16.jpg)
Score and Combine Our Possibilities
score1(fatally shot, ATTACK)
score2(seriously wounded, ATTACK)
score3(Shining Path, ATTACK)
…
COMBINEposterior
probability of ATTACK
are all of these uncorrelated?
…scorek(department, ATTACK)
![Page 17: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/17.jpg)
Score and Combine Our Possibilities
score1(fatally shot, ATTACK)
score2(seriously wounded, ATTACK)
score3(Shining Path, ATTACK)
…
COMBINEposterior
probability of ATTACK
Q: What are the score and combine functions for Naïve
Bayes?
![Page 18: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/18.jpg)
Scoring Our Possibilities
Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region .
score( , ) =ATTACK
score1(fatally shot, ATTACK)
score2(seriously wounded, ATTACK)
score3(Shining Path, ATTACK)
…
Learn these scores… but how?
What do we optimize?
![Page 19: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/19.jpg)
Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .
p( | )∝ATTACK
Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region .
SNAP(score( , ))ATTACK
Maxent Modeling
![Page 20: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/20.jpg)
Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region .
exp(score( , ))ATTACK
Maxent ModelingThree people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .
p( | )∝ATTACK
f(x) = exp(x)
exp gives a positive, unnormalized probability
![Page 21: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/21.jpg)
exp( ))score1(fatally shot, ATTACK)
score2(seriously wounded, ATTACK)
score3(Shining Path, ATTACK)…
Maxent Modeling
Learn the scores (but we’ll declare what combinations should be looked at)
Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .
p( | )∝ATTACK
![Page 22: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/22.jpg)
exp( ))weight1 * applies1(fatally shot, ATTACK)
weight2 * applies2(seriously wounded, ATTACK)
weight3 * applies3(Shining Path, ATTACK)…
Maxent ModelingThree people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .
p( | )∝ATTACK
![Page 23: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/23.jpg)
Q: What if none of our features apply?
![Page 24: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/24.jpg)
Guiding Principle for Log-Linear Models
“[The log-linear estimate]is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information.”Edwin T. Jaynes, 1957
![Page 25: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/25.jpg)
Guiding Principle for Log-Linear Models
“[The log-linear estimate]is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information.”Edwin T. Jaynes, 1957
exp(θ· f) exp(θ· 0) = 1
![Page 26: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/26.jpg)
exp( ))θ1 * f1(fatally shot, ATTACK)
θ 2 * f2(seriously wounded, ATTACK)
θ 3 * f 3(Shining Path, ATTACK)…
Easier-to-write form
![Page 27: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/27.jpg)
exp( ))θ1 * f1(fatally shot, ATTACK)
θ 2 * f2(seriously wounded, ATTACK)
θ 3 * f 3(Shining Path, ATTACK)…
Easier-to-write form
K weights
K features
![Page 28: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/28.jpg)
exp( )θ · f (doc, ATTACK)
Easier-to-write form
K-dimensional weight vector
K-dimensional feature vector
dot product
![Page 29: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/29.jpg)
Log-Linear Models
![Page 30: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/30.jpg)
Log-Linear Models
![Page 31: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/31.jpg)
Log-Linear Models
Feature function(s)Sufficient statistics
“Strength” function(s)
![Page 32: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/32.jpg)
Log-Linear Models
Feature WeightsNatural parameters
Distribution Parameters
![Page 33: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/33.jpg)
Log-Linear Models
How do we normalize?
![Page 34: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/34.jpg)
exp( )weight1 * applies1(fatally shot, X)
weight2 * applies2(seriously wounded, X)
weight3 * applies3(Shining Path, X)…
Σlabel x
Z =Normalization for Classification
𝑝 𝑥 𝑦) ∝ exp(𝜃 ⋅ 𝑓 𝑥, 𝑦 ) classify doc y with label x in one go
![Page 35: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/35.jpg)
Normalization for Language Model
general class-based (X) language model of doc y
![Page 36: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/36.jpg)
Normalization for Language Model
Can be significantly harder in the general case
general class-based (X) language model of doc y
![Page 37: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/37.jpg)
Normalization for Language Model
Can be significantly harder in the general case
Simplifying assumption: maxent n-grams!
general class-based (X) language model of doc y
![Page 38: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/38.jpg)
https://www.csee.umbc.edu/courses/undergraduate/473/f17/loglin-tutorial/
https://goo.gl/B23Rxo
Lesson 1
![Page 39: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/39.jpg)
Connections to Other Techniques
Log-Linear Models
![Page 40: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/40.jpg)
Connections to Other Techniques
Log-Linear Models
(Multinomial) logistic regression
Softmax regressionas statistical regression
![Page 41: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/41.jpg)
Connections to Other Techniques
Log-Linear Models
(Multinomial) logistic regression
Softmax regression
Maximum Entropy models (MaxEnt)
as statistical regression
based in information theory
![Page 42: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/42.jpg)
Connections to Other Techniques
Log-Linear Models
(Multinomial) logistic regression
Softmax regression
Maximum Entropy models (MaxEnt)
Generalized Linear Models
as statistical regression
a form of
based in information theory
![Page 43: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/43.jpg)
Connections to Other Techniques
Log-Linear Models
(Multinomial) logistic regression
Softmax regression
Maximum Entropy models (MaxEnt)
Generalized Linear Models
Discriminative Naïve Bayes
as statistical regression
a form of
viewed as
based in information theory
![Page 44: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/44.jpg)
Connections to Other Techniques
Log-Linear Models
(Multinomial) logistic regression
Softmax regression
Maximum Entropy models (MaxEnt)
Generalized Linear Models
Discriminative Naïve Bayes
Very shallow (sigmoidal) neural nets
as statistical regression
a form of
viewed as
based in information theory
to be cool today :)
![Page 45: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/45.jpg)
Learning θ
![Page 46: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/46.jpg)
pθ(y | x ) probabilistic model
objective(given observations)
![Page 47: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/47.jpg)
How will we optimize F(θ)?
Calculus
![Page 48: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/48.jpg)
F(θ)
θ
![Page 49: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/49.jpg)
F(θ)
θθ*
![Page 50: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/50.jpg)
F(θ)
θ
F’(θ)derivative of F
wrt θ
θ*
![Page 51: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/51.jpg)
Example
F’(x) = -2x + 4
F(x) = -(x-2)2
differentiate
Solve F’(x) = 0
x = 2
![Page 52: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/52.jpg)
Common Derivative Rules
![Page 53: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/53.jpg)
F(θ)
θ
F’(θ)derivative of F wrt θ
θ*
What if you can’t find the roots? Follow the derivative
![Page 54: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/54.jpg)
F(θ)
θ
F’(θ)derivative of F wrt θ
θ*
What if you can’t find the roots? Follow the derivative
Set t = 0Pick a starting value θt
Until converged:1. Get value y t = F(θ t)
θ0
y0
![Page 55: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/55.jpg)
F(θ)
θ
F’(θ)derivative of F wrt θ
θ*
What if you can’t find the roots? Follow the derivative
Set t = 0Pick a starting value θt
Until converged:1. Get value y t = F(θ t)2. Get derivative g t = F’(θ t)
θ0
y0
g0
![Page 56: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/56.jpg)
F(θ)
θ
F’(θ)derivative of F wrt θ
θ*
What if you can’t find the roots? Follow the derivative
Set t = 0Pick a starting value θt
Until converged:1. Get value y t = F(θ t)2. Get derivative g t = F’(θ t)3. Get scaling factor ρ t
4. Set θ t+1 = θ t + ρ t *g t
5. Set t += 1θ0
y0
θ1
g0
![Page 57: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/57.jpg)
F(θ)
θ
F’(θ)derivative of F wrt θ
θ*
What if you can’t find the roots? Follow the derivative
Set t = 0Pick a starting value θt
Until converged:1. Get value y t = F(θ t)2. Get derivative g t = F’(θ t)3. Get scaling factor ρ t
4. Set θ t+1 = θ t + ρ t *g t
5. Set t += 1θ0
y0
θ1
y1
θ2
g0
g1
![Page 58: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/58.jpg)
F(θ)
θ
F’(θ)derivative of F wrt θ
θ*
What if you can’t find the roots? Follow the derivative
Set t = 0Pick a starting value θt
Until converged:1. Get value y t = F(θ t)2. Get derivative g t = F’(θ t)3. Get scaling factor ρ t
4. Set θ t+1 = θ t + ρ t *g t
5. Set t += 1θ0
y0
θ1
y1
θ2
y2
y3
θ3
g0
g1 g2
![Page 59: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/59.jpg)
F(θ)
θ
F’(θ)derivative of F wrt θ
θ*
What if you can’t find the roots? Follow the derivative
Set t = 0
Pick a starting value θt
Until converged:
1. Get value y t = F(θ t)2. Get derivative g t = F’(θ t)
3. Get scaling factor ρ t4. Set θ t+1 = θ t + ρ t *g t
5. Set t += 1θ0
y0
θ1
y1
θ2
y2
y3
θ3
g0
g1 g2
![Page 60: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/60.jpg)
Gradient = Multi-variable derivative
K-dimensional input
K-dimensional output
![Page 61: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/61.jpg)
Gradient Ascent
![Page 62: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/62.jpg)
Gradient Ascent
![Page 63: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/63.jpg)
Gradient Ascent
![Page 64: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/64.jpg)
Gradient Ascent
![Page 65: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/65.jpg)
Gradient Ascent
![Page 66: Maxent Models (II)](https://reader033.fdocuments.in/reader033/viewer/2022052501/628b4521561a6c7cfb63f229/html5/thumbnails/66.jpg)
Gradient Ascent