End-to-End Neural Ad-hoc Ranking with Kernel Poolingzhuyund/papers/end-end-neural-slide.pdf ·...
Transcript of End-to-End Neural Ad-hoc Ranking with Kernel Poolingzhuyund/papers/end-end-neural-slide.pdf ·...
End-to-EndNeuralAd-hocRankingwithKernelPooling
Chenyan Xiong1,ZhuyunDai1,JamieCallan1,Zhiyuan Liu2,andRussellPower3
1 :Language Technologies Institute, Carnegie Mellon University2:Tsinghua University
3:Semantic Scholar, AllenInstituteforAI
©2017ZhuyunDai
MotivationSoftMatch• SoftmatchbetweenqueriesanddocumentshasdrawnmuchattentioninIR
• Bridgesthevocabularygap:`you-know-who’`Voldemort’
ARecentSuccess:TheDeepRelevanceMatchingModel(DRMM)• Word2vec formatchingquerytermstodocumentterms• HistogramPoolingto`count’ matchesofdifferentqualities
Arethecurrentwordembeddings therightchoiceforIR?• sim(hotel, motel)=0.9, sim(Tokyo, London)=0.9• Query: `Tokyohotels’• Documents:`Tokyo motels’ `Hotel in London’
1©2017ZhuyunDai☺ ☹
KeyIdeasinOurWork• BuildontheideasintheDeepRelevanceMatchingModel(DRMM)• Kernel based Neural Ranking Model (K-NRM)• Learnaword-similaritymetrictailoredformatchingqueryanddocumentinad-hocranking.
End-to-endlearningawordembeddingsupervisedby
relevancesignals.
withanewkernelpoolinglayer
thatenforcesmulti-level
softmatchpatterns
2©2017ZhuyunDai
…
300
…
EmbeddingLayer
…
…
…
…
TranslationLayer
Translation MatrixM!×#
…
Soft-TF RankingFeatures
KernelPooling
Learning-To-Rank
W , b
KernelsQuery(! words)
Document(# words)
012032
0(2
014034
054
064
FinalRankingScore
…
…
…
012
0(2
…
…
…7�
�
300
300
300300300
300
Embedding-BasedTranslationModelQuery-wordtodocument-wordtranslationmodel.• Embedsawordintoacontinuousvector• Cosinesimilaritiesastranslationscores
3©2017ZhuyunDai
K-NRMModel:
…
300
…
EmbeddingLayer
…
…
…
…
TranslationLayer
Translation MatrixM!×#
…
Soft-TF RankingFeatures
KernelPooling
Learning-To-Rank
W , b
KernelsQuery(! words)
Document(# words)
012032
0(2
014034
054
064
FinalRankingScore
…
…
…
012
0(2
…
…
…7�
�
300
300
300300300
300
KernelPooling• Howmanyscoresaresoftlyin[1,0.8]?in [0.8, 0.6]..?• [1, 0.8]• RBFKernel𝜇 = 0.9, 𝜎 = 0.1• Soft-TF:softlycountingsoft-matchtermfrequencies
• 𝐾+ 𝑀- = ∑ exp(−456789:;9<
)>?@A
• 𝐾 𝑀- = {𝐾A 𝑀- , … , 𝐾D(𝑀-)}• Ksoft-TFfeaturesforeachquery-term.
KernelPooling(1)
Soft- TF
Word Pair Similarity
K1 K2 K3
Similarityof𝑞Ato{𝑑A, … , 𝑑>}
4©2017ZhuyunDai
K-NRMModel:
…
300
…
EmbeddingLayer
…
…
…
…
TranslationLayer
Translation MatrixM!×#
…
Soft-TF RankingFeatures
KernelPooling
Learning-To-Rank
W , b
KernelsQuery(! words)
Document(# words)
012032
0(2
014034
054
064
FinalRankingScore
…
…
…
012
0(2
…
…
…7�
�
300
300
300300300
300
KernelPooling(2)Aggregate per-querySoft-TFfeatures withlog-sum.
Output:𝜙(𝑀),Krankingfeatures for aquerydocument pair
5©2017ZhuyunDai
K-NRMModel:
…
300
…
EmbeddingLayer
…
…
…
…
TranslationLayer
Translation MatrixM!×#
…
Soft-TF RankingFeatures
KernelPooling
Learning-To-Rank
W , b
KernelsQuery(! words)
Document(# words)
012032
0(2
014034
054
064
FinalRankingScore
…
…
…
012
0(2
…
…
…7�
�
300
300
300300300
300
Learning-To-Rank
Learning-To-Rank
Combine rankingfeatures into theranking score• 𝑓 𝑞, 𝑑 =tanh(𝑤 O 𝜙 𝑀 + 𝑏)
Trainedinapairwisemanner.
6©2017ZhuyunDai
K-NRMModel:
• Asampleofonlinesearchlogfrom Sogou, a major Chinesesearch engine
Dataset
NoOverlap
7©2017ZhuyunDai
ExperimentalMethodology:
Wetrain and test ourmodelusinguserclicks(Manuallabelsnotavailable)Testing Scenarios:
DCTR:DocumentClickThroughRatemodel[A.Chuklin etal.2015]TACM:Time-AwareClickModel[Y.Liu etal.2016]RAWClick:Onlytheclickeddocisrelevant(singleclicksessions)
TrainingandTestingLabels
Testing-SAME
Train:DCTR
Test:DCTR
Testing-DIFF
Train:DCTR
Test:TACM
Testing-Raw
Train:DCTR
Test:RawClick
8©2017ZhuyunDai
ExperimentalMethodology:
K-NRMModel&BaselinesK-NRM- Embedding• 300-d,initializedwithword2vectrainedontitlesK-RNM- 11Kernels• 1exact-matchkernel:𝜇 = 1, 𝜎 = 0.0001• 10soft-matchkernels• equallydistributedin[-1,1](cosinesimilarityvaluerange)• 𝜇 = −0.9, −0.7, … , 0.7, 0.9, 𝜎 = 0.1
9©2017ZhuyunDai
Baselines1. Unsupervisedword-basedretrieval:Lm,BM252. Word-basedLeToR:RankSVM,Coor-Ascent.20IR-fusionfeatures.3. RecentNeuralIRmethods:Trans,DRMM,CDSSM
ExperimentalMethodology:
• Testandtrainusingthesameclickmodel• Easiertask
OverallPerformance:
Testing-SAME
Train:DCTR
Test:DCTR
10©2017ZhuyunDai
Testing-SAME
• Testwithamoreaccurateclickmodel• Hardertask
OverallPerformance:
Testing-DIFF
Train:DCTR
Test:TACM
11©2017ZhuyunDai
Testing-DIFF
• Onlythe(only)clickeddocumentisrelevant• Themostdifficulttask• Theclickeddocument: 1 position higher• K-NRMfittheunderlyingrelevancesignalrather than the training click model
OverallPerformance:
Testing-RAW
Train:DCTR
Test:RawClick
Rank=1/MRR=4.1
Rank=1/MRR=3.0
12©2017ZhuyunDai
Testing-RAW
Embeddings withdifferentlevelsofIR-specialization0. exact-match: K-NRM using only exact-match kernel.1. Word2vec: pre-trained on surrounding text2. Click2vec: pre-trained with (query-term, clicked-doc-term)3. Full model:trainedend-to-end,pairwiseuserpreference(3) >(2) >(1) > (0):moreIR-specialized embeddings aremore effective.
*Testing-SAMEandTesting-DIFFhadsimilarperformance.
13©2017ZhuyunDai
IR-SpecializedSoft-MatchSourcesofEffectiveness#1:
Threedifferentpoolingmethods.Allend-to-end1. Max-pool: use the best-match2. Mean-pool: allsoft-matchscoresaremixedtogether3. Full model:kernel poolingenforcesmulti-level soft match
(3) >(2) >(1):multi-levelsoft-matchprovidesmoreinformation
*Testing-SAMEandTesting-DIFFhadsimilarperformance.
14©2017ZhuyunDai
Multi-LevelSoft-MatchSourcesofEffectiveness#2:
Recallourmotivation:“Content-basedwordembeddingmaynotfitad-hocsearch”
• Thisprovestrue:58% of word2vec wordpairsweremovedacrosskernels byK-NRM.
• HowdoesK-NRMmovewordpairs?
15©2017ZhuyunDai
EffectsonWordEmbeddings
WordPairMovements1. Wordpairsdecoupled.• consideredrelatedby
word2vec,butnotbyK-NRM• >90%aredecoupled!
2. New soft match discovered.• lessfrequentlyappearinthe
samesurroundingcontext,butconveysimilarsearchintent
3. Matchingstrengthlevel• Strong<->Weak
(wife,husband),(son,daughter),(China-Unicom,China-Mobile),(Maserati,car),(website,homepage)
Decouple
(MH370,search),(pdf,reader),(BMW,contact-us),(192.168.0.1,router),
Newsoftmatch
(MH370,truth),(cloud,share)(oppor9,OPPOR),(10086,www.10086.com)
ChangeLevels Weak->Strong
Strong->Weak
16©2017ZhuyunDai
EffectsonWordEmbeddings:
• Embeddings trainedforsearcharemoreeffectivethanembeddings trainedfromsurroundingtext• Embeddings andsoft-matchbinsmustbetunedtogether.• Weproposeanewkernelpoolingtechnique:
• Allowsend-to-endtrainingofthewordembedding• Guidestheembeddingtoformeffectivemulti-levelsoft-matchpatternstailoredforad-hocranking
• Deliversrobustsoftmatchbetweenqueryanddocuments• Moved58%ofword2vecwordpairsacrosskernels• Decouples>90%ofwordpairsthatwereconsideredrelatedbyword2vec• Discoversnewsoftmatchpatternsofdifferenttypes
Conclusion
17©2017ZhuyunDai
Thankyou!
Questions?
©2017ZhuyunDai
Testing labels
• Cut-offthresholdchosentomakesurethelabeldistributionthesameasTREC’s
21©2017ZhuyunDai
Performance of tail queries
22©2017ZhuyunDai
Requirement of training data
23©2017ZhuyunDai
Sensitivity to kernel parameters
24©2017ZhuyunDai
…
300
…
EmbeddingLayer
…
…
…
…
TranslationLayer
Translation MatrixM!×#
…
Soft-TF RankingFeatures
KernelPooling
Learning-To-Rank
W , b
KernelsQuery(! words)
Document(# words)
012032
0(2
014034
054
064
FinalRankingScore
…
…
…
012
0(2
…
…
…7�
�
300
300
300300300
300
KernelPooling:revisitHistogramPoolingDRMM:HistogramPooling• Howmanywordpairs’similarityscorearein[1,0.8],[0.8,0.6]...?
• 𝐵+ 𝑀- = ∑ I{𝑀-?>?@A in bink}
• 𝐵 𝑀- = 𝐵A 𝑀- , … , 𝐵D 𝑀-
25©2017ZhuyunDai
K-NRMModel: