Lacoste · 2016. 10. 10. · Title: Lacoste Author: engelb Created Date: 6/6/2016 10:54:03 AM
Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21,...
-
Upload
jarvis-dodsworth -
Category
Documents
-
view
219 -
download
0
Transcript of Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21,...
![Page 1: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/1.jpg)
Discriminative Methods with Structure
Simon Lacoste-Julien
UC Berkeley
joint work with:
March 21, 2008
Fei Sha
Ben Taskar
Dan Klein
Mike Jordan
![Page 2: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/2.jpg)
« Discriminative method »
Decision theoretic framework: Loss:
Decision function:
Risk
Contrast funtion
![Page 3: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/3.jpg)
« with structure » on outputs:
Handwritingrecognition
Input Output
brace
huge!
Machinetranslation
‘Ce n'est pas un autreproblème de classification.’
‘This is not another classification problem.’
![Page 4: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/4.jpg)
« with structure » on inputs:
text documents
….. ……. … ………
…. .... .... .... ... .. ...... .
.
..... ...........
latent variable model
new representati
on
classification
![Page 5: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/5.jpg)
Structure on outputs:Discriminative Word
Alignmentproject
(joint work with Ben Taskar, Dan Klein and Mike Jordan)
![Page 6: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/6.jpg)
Word Alignment
What is the anticipated cost of collecting fees under the new proposal?
En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?
x yWhat
is the
anticipated
costof
collecting fees
under the
new proposal
?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de les droits?
Key step in most machine translation systems
![Page 7: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/7.jpg)
Overview
Review of large-margin word alignment [Taskar et al. EMNLP 05]
Two new extensions to the basic model: Fertility features First order interactions using quadratic
assignment
Results on Hansards dataset
![Page 8: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/8.jpg)
Feature-Based Alignment
Features: Association
MI = 3.2Dice = 4.1
Lexical pairID(proposal, proposition) =
1 Position in sentence
AbsDist = 5RelDist = 0.3
OrthographyExactMatch = 0Similarity = 0.8
ResourcesPairInDictionary
Other Models (IBM2, IBM4)
Whatis
theanticipate
dcost
ofcollecting
fees under
the new
proposal?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?
j
k
![Page 9: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/9.jpg)
Scoring Whole Alignments
Whatis
theanticipate
dcost
ofcollecting
fees under
the new
proposal?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?
j
k
![Page 10: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/10.jpg)
Prediction as a Linear Program
Still guaranteed to have integral solutions y
Degreeconstraint
Whatis
theanticipate
dcost
ofcollecting
fees under
the new
proposal?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?
j
krelaxation
![Page 11: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/11.jpg)
Learning w
Supervised training data
Training methods Maximum likelihood/entropy Perceptron Maximum margin
![Page 12: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/12.jpg)
Maximum Likelihood/Entropy
Probabilistic approach:
Problem: denominator is #P-complete[Valiant 79, Jerrum & Sinclair 93]
Can’t find maximum likelihood parameters
![Page 13: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/13.jpg)
(Averaged) PerceptronPerceptron for structured output [Collins 2002]:
For each example ,
Predict:
Update:
Output averaged parameters:
![Page 14: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/14.jpg)
Large Margin Estimation
Equivalent min-max formulation[Taskar et al 04,05]
Simple LP
true score other score loss
![Page 15: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/15.jpg)
Min-max formulation - QP
LP duality
QP of polynomial
size!
=> Mosek
![Page 16: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/16.jpg)
Experimental Setup French Canadian Hansards Corpus Word-level aligned
200 sentence pairs (training data) 37 sentence pairs (validation data) 247 sentence pairs (test data)
Sentence-level aligned 1M sentence pairs Generate association-based features Learn unsupervised IBM Models
Learn using Large Margin
Evaluate alignment quality using standard AER (Alignment Error Rate) [similar to F1]
![Page 17: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/17.jpg)
Old Results200 train/247 test split
IBM model 4 (intersected) 6.5 98 / 88%
Basic 8.2 93 / 90%
+ model 4 5.1 98 / 92%
AER Prec / Rec
![Page 18: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/18.jpg)
Improving basic model
We would like to model:
Fertility: Alignments are not necessarily 1-to-1
First-order interactions: Alignments are mostly locally diagonal: would
like to score depending on its neighbors
Strategy: extensions keeping prediction model as a LP
![Page 19: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/19.jpg)
Modeling Fertility
Example of node feature: for word w, fraction of time it had fertility > k on the training set
fertility penalty
![Page 20: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/20.jpg)
Fertility Results200 train/247 test split
IBM model 4 (intersected) 6.5 98 / 88%
Basic 8.2 93 / 90%
+ model 4 5.1 98 / 92%
AER Prec / Rec
![Page 21: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/21.jpg)
Fertility Results200 train/247 test split
IBM model 4 (intersected) 6.5 98 / 88%
Basic 8.2 93 / 90%
+ model 4 5.1 98 / 92%
+ model 4 + fertility 4.9 96 / 94%
AER Prec / Rec
![Page 22: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/22.jpg)
Fertility example
Sure align.
Possible align.
Predicted align.
=
=
=
![Page 23: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/23.jpg)
Modeling First Order Effects
Restrict:
monoticity
local inversion
local fertility
want:
relaxation:
![Page 24: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/24.jpg)
Integer program
Quadratic assignment NP-complete; on real-world sentences (2 to 30 words)
takes a few seconds using Mosek (~1k variables)
Interestingly, in our dataset 80% of examples yield integer solution when
solved via linear relaxation same AER when using relaxation!
![Page 25: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/25.jpg)
New Results200 train/247 test split
IBM model 4 (intersected) 6.5 98 / 88%
Basic 8.2 93 / 90%
+ model 4 5.1 98 / 92%
AER Prec / Rec
![Page 26: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/26.jpg)
New Results200 train/247 test split
IBM model 4 (intersected) 6.5 98 / 88%
Basic 8.2 93 / 90%
+ model 4 5.1 98 / 92%
Basic + fertility + qap 6.1 94 / 93%
AER Prec / Rec
![Page 27: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/27.jpg)
New Results200 train/247 test split
IBM model 4 (intersected) 6.5 98 / 88%
Basic 8.2 93 / 90%
+ model 4 5.1 98 / 92%
Basic + fertility + qap 6.1 94 / 93%
+ fertility + qap + model 4 4.3 96 / 95%
AER Prec / Rec
![Page 28: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/28.jpg)
New Results200 train/247 test split
IBM model 4 (intersected) 6.5 98 / 88%
Basic 8.2 93 / 90%
+ model 4 5.1 98 / 92%
Basic + fertility + qap 6.1 94 / 93%
+ fertility + qap + model 4 4.3 96 / 95%
+ fertility + qap + model 4 + liang
3.8 97 / 96 %
AER Prec / Rec
![Page 29: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/29.jpg)
Fert + qap example
![Page 30: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/30.jpg)
Fert + qap example
![Page 31: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/31.jpg)
Conclusions
Feature-based word alignment Efficient algorithms for supervised learning Exploit unsupervised data via features, other
models Surprisingly accurate with simple features Include fertility model and first order
interactions 38% AER reduction over intersected Model 4 Lowest published AER on this data set High recall alignments -> promising for MT
![Page 32: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/32.jpg)
Structure on inputs:discLDA project
(work in progress)
(joint work with Fei Sha and Mike Jordan)
![Page 33: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/33.jpg)
Unsupervised dimensionality reduction
text documents
….. ……. … ………
…. .... .... .... ... .. ...... .
.
..... ...........
latent variables
model
new representati
on
classification
![Page 34: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/34.jpg)
Analogy: PCA vs. FDA
xxxx
xxx xx
x
ooooo
oooo
ooo
oooo
xxx
PCA direction
FDA direction
![Page 35: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/35.jpg)
Goal: supervised dim. reduction
text documents
….. ……. … ………
…. .... .... .... ... .. ...... .
.
..... ...........
latent variables model with supervised information
new representati
on
classification
![Page 36: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/36.jpg)
Review: LDA model
![Page 37: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/37.jpg)
Discriminative version of LDA
Ultimately, want to learn discriminatively-> but high-dimensional non-convex objective, hard to
optimize!
Instead, propose to learn class-dependent linear transformation of common ‘s:
New generative model:
Equivalently, transformation on :
![Page 38: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/38.jpg)
Simplex Geometry
xxx xx
x
oooo
word simplex
w3 w2
w1
topic simplex
xxx xx
x
oooo
w2
w1
w3
![Page 39: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/39.jpg)
Interpretation 1
Shared topic vs. class-specific topic:
shared topics
class-specific topics
![Page 40: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/40.jpg)
Interpretation 2
Generative model from T, add a new latent variable u:
![Page 41: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/41.jpg)
Compare with AT model
Author-Topic model [Rosen-Zvi et al.
2004]
discLDA
![Page 42: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/42.jpg)
Inference and learning
![Page 43: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/43.jpg)
Learning
For fixed T, learn by sampling (z,u) [Rao-Blackwellized Gibbs sampling]
For fixed , update T using stochastic gradient ascent on conditional log-likelihood:
in an online fashion get approximate gradient using Monte Carlo EM use Harmonic Mean estimator to estimate
Currently, results are noisy…
![Page 44: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/44.jpg)
Inference (dimensionality reduction)
Given learned T and : estimate using Harmonic Mean estimator
compute by marginalizing over y to get new
representation of document
![Page 45: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/45.jpg)
Preliminary Experiments
![Page 46: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/46.jpg)
20 Newsgroup dataset
Used fixed T:
Get reduced representation -> train linear SVM on it
hence 110 topics for
11k train7.5k test
vocabulary: 50k
![Page 47: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/47.jpg)
Classification results
discLDA + SVM: 20% error LDA + SVM: 25% error discLDA predictions: 20% error
![Page 48: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/48.jpg)
Newsgroup embedding (LDA)
![Page 49: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/49.jpg)
Newsgroup embedding (discLDA)
![Page 50: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/50.jpg)
using tSNE (on discLDA)
thanks to Laurens van der Maaten for figure! [Hinton’s group]
![Page 51: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/51.jpg)
using tSNE (on LDA)
thanks to Laurens van der Maaten for figure! [Hinton’s group]
![Page 52: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/52.jpg)
Learned topics
![Page 53: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/53.jpg)
Another embedding
NIPS papers vs. Psychology abstracts
LDA discLDA
![Page 54: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/54.jpg)
13 scenes dataset [Fei-Fei 2005]
train: 100 per category
test: 2558
![Page 55: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/55.jpg)
Vocabulary (visual words)
![Page 56: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/56.jpg)
Topics
![Page 57: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/57.jpg)
Conclusion
fixed transformation T enables topic sharing & exploration
get reduced representation which preserves predictive power
noisy gradient estimates still work in progress will probably try variational approach instead
![Page 58: Discriminative Methods with Structure Simon Lacoste-Julien UC Berkeley joint work with: March 21, 2008 Fei Sha Ben Taskar Dan Klein Mike Jordan.](https://reader036.fdocuments.in/reader036/viewer/2022062318/5518c37b550346a61f8b56ad/html5/thumbnails/58.jpg)