Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer,...
-
Upload
randall-jefferson -
Category
Documents
-
view
224 -
download
1
Transcript of Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer,...
![Page 1: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/1.jpg)
Domain Adaptation with Structural Correspondence Learning
John Blitzer
Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira
Joint work with
![Page 2: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/2.jpg)
Statistical models, multiple domains
![Page 3: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/3.jpg)
Different Domains of Text
• Huge variation in vocabulary & style
tech
blogs
sports
blogs
Yahoo 360 Yahoo 360 Yahoo 360. . . . . .
. . . . . .politics
blogs“Ok, I’ll just build models for each domain I encounter”
![Page 4: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/4.jpg)
Sentiment Classification for Product Reviews
Product Review
Classifier
Positive Negative
SVM, Naïve Bayes, etc.
Multiple Domains
bookskitchen
appliances
. . .
??
??
??
![Page 5: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/5.jpg)
books & kitchen appliances
Running with Scissors: A Memoir
Title: Horrible book, horrible.
This book was horrible. I read half of it,
suffering from a headache the entire time,
and eventually i lit it on fire. One less
copy in the world...don't waste your
money. I wish i had the time spent
reading this book back so i could use it for
better purposes. This book wasted my life
Avante Deep Fryer, Chrome & Black
Title: lid does not work well...
I love the way the Tefal deep fryer
cooks, however, I am returning my
second one due to a defective lid
closure. The lid may close initially, but
after a few uses it no longer stays
closed. I will not be purchasing this one
again.
Running with Scissors: A Memoir
Title: Horrible book, horrible.
This book was horrible. I read half of it,
suffering from a headache the entire
time, and eventually i lit it on fire. One
less copy in the world...don't waste your
money. I wish i had the time spent
reading this book back so i could use it for
better purposes. This book wasted my life
Avante Deep Fryer, Chrome & Black
Title: lid does not work well...
I love the way the Tefal deep fryer
cooks, however, I am returning my
second one due to a defective lid
closure. The lid may close initially, but
after a few uses it no longer stays
closed. I will not be purchasing this
one again.
Error increase: 13% 26%
![Page 6: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/6.jpg)
Error increase: 3% 12%
Part of Speech Tagging
DT NN VBZ DT NN IN DT JJ NNCC
The clash is a sign of a newtoughness andNN IN NNP POS JJ JJ
NNS .divisiveness in Japan ‘s once-cozy financial
circles .
DT JJ VBN NNS IN DT NN NNS VBPThe oncogenic mutated forms of the ras proteins are
RB JJ CC VBP IN JJ NNconstitutively active and interfere with normal signalNN .transduction .
Wall Street Journal (WSJ)
MEDLINE Abstracts (biomed)
DT NN VBZ DT NN IN DT JJ NNCC
The clash is a sign of a newtoughness andNN IN NNP POS JJ
JJ NNS .divisiveness in Japan ‘s once-cozy financial
circles .
DT JJ VBN NNS IN DT NN NNS VBPThe oncogenicmutated forms of the ras proteins are
RB JJ CC VBP IN JJ NNconstitutively active and interfere with normal signalNN .transduction .
![Page 7: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/7.jpg)
Features & Linear Models
0.3
0
horrible
read_half
waste
0...
0.10...0
0.2
-1
1.1
0.1...
-20...
-0.3-1.2
Problem: If we’ve only trained on book reviews, then w(defective) = 0
0
![Page 8: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/8.jpg)
Structural Correspondence Learning (SCL)
• Cut adaptation error by more than 40%
• Use unlabeled data from the target domain
• Induce correspondences among different features
• read-half, headache defective, returned
• Labeled data for source domain will help us build a good classifier for target domain
Maximum likelihood linear regression (MLLR) for speaker adaptation (Leggetter & Woodland, 1995)
![Page 9: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/9.jpg)
SCL: 2-Step Learning Process
Unlabeled.
Learn
Labeled. Learn
• should make the domains look as similar as possible
• But should also allow us to classify well
Step 1: Unlabeled – Learn correspondence mapping
Step 2: Labeled – Learn weight vector
0.1
0
0...
0.30.3
0.7
-1.0...
-2.10
0
-1...
-0.7
![Page 10: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/10.jpg)
SCL: Making Domains Look Similar
defective lidIncorrect classification of kitchen review
• Do not buy the Shark portable steamer …. Trigger mechanism is defective.
• the very nice lady assured me that I must have a defective set …. What a disappointment!
• Maybe mine was defective …. The directions were unclear
Unlabeled kitchen contexts
• The book is so repetitive that I found myself yelling …. I will definitely not buy another.
• A disappointment …. Ender was talked about for <#> pages altogether.
• it’s unclear …. It’s repetitive and boring
Unlabeled books contexts
![Page 11: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/11.jpg)
SCL: Pivot Features
Pivot Features
• Occur frequently in both domains
• Characterize the task we want to do
• Number in the hundreds or thousands
• Choose using labeled source, unlabeled source & target data
SCL: words & bigrams that occur frequently in both domains
SCL-MI: SCL but also based on mutual information with labels
book one <num> so all very about they like good
when
a_must a_wonderful loved_it weak don’t_waste awful
highly_recommended and_easy
![Page 12: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/12.jpg)
SCL Unlabeled Step: Pivot Predictors
Use pivot features to align other features
• Mask and predict pivot features using other features
• Train N linear predictors, one for each binary problem
• Each pivot predictor implicitly aligns non-pivot features from source & target domains
Binary problem: Does “not buy” appear here?
(2) Do not buy the Shark portable steamer …. Trigger mechanism is defective.
(1) The book is so repetitive that I found myself yelling …. I will definitely not buy another.
![Page 13: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/13.jpg)
SCL: Dimensionality Reduction
• gives N new features
• value of ith feature is the propensity to see “not buy” in the same document
• We still want fewer new features (1000 is too many) • Many pivot predictors give similar information
• “horrible”, “terrible”, “awful”
• Compute SVD & use top left singular vectors
Latent Semantic Indexing (LSI), (Deerwester et al. 1990)
Latent Dirichlet Allocation (LDA), (Blei et al. 2003)
![Page 14: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/14.jpg)
Back to Linear Classifiers
0.3
0
0...
0.1
0.3
0.7
-1.0...
-2.1
Classifier
• Source training: Learn & together
• Target testing: First apply , then apply and
![Page 15: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/15.jpg)
Inspirations for SCL
1. Alternating Structural Optimization (ASO)• Ando & Zhang (JMLR 2005)
• Inducing structures for semi-supervised learning
2. Correspondence Dimensionality Reduction• Verbeek, Roweis, & Vlassis (NIPS 2003).
Ham, Lee, & Saul (AISTATS 2003).
• Learn a low-dimensional representation from high-dimensional correspondences
![Page 16: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/16.jpg)
Sentiment Classification Data
• Product reviews from Amazon.com– Books, DVDs, Kitchen Appliances, Electronics– 2000 labeled reviews from each domain– 3000 – 6000 unlabeled reviews
• Binary classification problem – Positive if 4 stars or more, negative if 2 or less
• Features: unigrams & bigrams
• Pivots: SCL & SCL-MI
• At train time: minimize Huberized hinge loss (Zhang, 2004)
![Page 17: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/17.jpg)
negative vs. positive
plot <#>_pages predictable fascinating
engaging must_read
grisham
the_plastic
poorly_designed
leaking
awkward_to espresso
are_perfect
years_now
a_breeze
books
kitchen
Visualizing (books & kitchen)
![Page 18: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/18.jpg)
65
70
75
80
85
90
D->B E->B K->B B->D E->D K->D
baseline SCL SCL-MIbooks
72.8
76.8
79.7
70.7
75.4 75.4
70.966.1
68.6
80.4
82.4
77.2
74.0
75.8
70.6
74.376.2
72.7
75.476.9
dvd
Empirical Results: books & DVDs
baseline loss due to adaptation: 7.6%
SCL-MI loss due to adaptation: 0.7%
![Page 19: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/19.jpg)
65
70
75
80
85
90
B->E D->E K->E B->K D->K E->K
baseline SCL SCL-MIelectronics kitchen
70.8
77.5
75.9
73.0
74.174.1
82.7
83.7
86.884.4
87.7
74.5
78.778.9
74.0
79.4
81.4
84.0
84.4
85.9
Empirical Results: electronics & kitchen
![Page 20: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/20.jpg)
65
70
75
80
85
90
D->B E->B K->B B->D E->D K->D
baseline SCL SCL-MIbooks
72.8
76.8
79.7
70.7
75.4 75.4
70.966.1
68.6
80.4
82.4
77.2
74.0
75.8
70.6
74.376.2
72.7
75.476.9
dvd
Empirical Results: books & DVDs
• Sometimes SCL can cause increases in error
• With only unlabeled data, we misalign features
![Page 21: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/21.jpg)
Using Labeled Data
50 instances of labeled target domain data
Source data, save weight vector for SCL features
Target data, regularize weight vector to be close to
Huberized hinge loss
Avoid using high-dimensional features
Keep SCL weights close to source weights
Chelba & Acero, EMNLP 2004
![Page 22: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/22.jpg)
65
70
75
80
85
90
E->B K->B B->D K->D B->E D->E B->K E->K
base+50-targ SCL-MI+50-targ
books kitchen
70.9
76.0
70.7
76.8
78.5
72.7
80.4
87.7
76.6
70.8
76.6
73.0
77.9
74.3
80.7
84.3
dvd electronics
82.484.4
73.2
85.9
Empirical Results: labeled data
• With 50 labeled target instances, SCL-MI always improves over baseline
![Page 23: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/23.jpg)
Average Improvements
modelbase
base+targ scl scl-mi
scl-mi+targ
Avg Adaptation Loss 9.1 9.1 7.1 5.8 4.9
• scl-mi reduces error due to transfer by 36%
• adding 50 instances [Chelba & Acero 2004] without SCL does not help
• scl-mi + targ reduces error due to transfer by 46%
![Page 24: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/24.jpg)
PoS Tagging: Data & Model
• Data• 40k Wall Street Journal (WSJ) training sentences• 100k unlabeled biomedical sentences• 100k unlabeled WSJ sentences
• Supervised Learner • MIRA CRF: Online max-margin learner• Separate correct label from top k=5 incorrect labels• Crammer et al. JMLR 2006• Pivots: Common left/middle/right words
![Page 25: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/25.jpg)
nouns vs. adjs & dets
receptors mutation
assays
lesions metastatic
neuronal transient
functional
company
transaction
investors
officials political
short-term
your
pretty
MEDLINE
Wall Street Journal
Visualizing PoS Tagging
![Page 26: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/26.jpg)
100 500 1k 5k 40k75
80
85
90
supervisedsemi-ASOSCL
Empirical Results
561 MEDLINE test sentences
# of WSJ training sentences
ModelAll
WordsUnk
wordsMXPOST 87.2 65.2
super 87.9 68.4
semi-ASO 88.4 70.9
SCL 88.9 72.0
Null Hyp p-value
semi vs. super <0.0015
SCL vs. super <10-12
SCL vs. semi <0.0003
Acc
urac
y McNemar’s test
![Page 27: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/27.jpg)
50 100 200 50085
87
89
91
93
95
notargetnosource1k-super1k-SCL
Results: Some labeled target domain data
# of MEDLINE training sentences
Model Accuracy
1k-SCL 95.0
1k-super 94.5
Nosource 94.5
Acc
urac
y • Use source tagger output as a feature (Florian et al. 2004)
• Compare SCL with supervised source tagger
561 MEDLINE test sentences
![Page 28: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/28.jpg)
Adaptation & Machine Translation
• Source: Domain specific parallel corpora (news, legal text)
• Target: Similar corpora from the web (i.e. blogs)
• Learn translation rules / language model parameters for the new domain
• Pivots: common contexts
![Page 29: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/29.jpg)
Adaptation & Ranking
• Input: query & list of top-ranked documents
• Output: Ranking
• Score documents based on editorial or click-through data
• Adaptation: Different markets or query types
• Pivots: common relevant features
![Page 30: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/30.jpg)
Learning Theory & Adaptation
Analysis of Representations for Domain Adaptation.
Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira.
NIPS 2006.
Learning Bounds for Domain Adaptation.
John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, Jenn Wortman.
NIPS 2007 (To Appear).
Bounds on the error of models in new domains
![Page 31: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/31.jpg)
100 500 1k 5k 40k
58
62
66
70
74
78
82
supervisedSCLgold
Pipeline Adaptation: Tagging & Parsing
Accuracy for different tagger inputs
# of WSJ training sentences
Acc
urac
y
Dependency Parsing
• McDonald et al. 2005
• Uses part of speech tags as features
• Train on WSJ, test on MEDLINE
• Use different taggers for MEDLINE input features
![Page 32: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/32.jpg)
Measuring Adaptability
• Given limited resources, which domains should we label?
• Idea: Train a classifier to distinguish instances from different domains
• Error of this classifier is an estimate of loss due to adaptation
![Page 33: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/33.jpg)
A-distance vs Adaptation loss
0
2
4
6
8
10
12
14
60 65 70 75 80 85 90 95 100
Proxy A-distance
Ad
apta
tio
n L
oss
EK
BD
DE
DKBE, BK
Suppose we can afford to label 2 domains
Then we should label 1 of electronics/kitchen and 1 of books/DVDs
![Page 34: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/34.jpg)
Features & Linear Models
1
0
LW=normal
MW=signal
RW=transduction1...
10
0.5
-2
0.7...
1.10
Problem: If we’ve only trained on financial news, then w(RW=transduction) = 0
0
normal signal transduction
normal
signal
transduction
![Page 35: Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.](https://reader036.fdocuments.in/reader036/viewer/2022062304/56649d165503460f949eb769/html5/thumbnails/35.jpg)
Future Work
• SCL for other problems & modalities– named entity recognition
– vision (aligning SIFT features)
– speaker / acoustic environment adaptation
• Learning low-dimensional representations for multi-part prediction problems– natural language parsing, machine translation,
sentence compression