Kernels for Structured Output

20
Kernels for Structured Output SPFLODD November 10, 2011

Transcript of Kernels for Structured Output

KernelsforStructuredOutput

SPFLODDNovember10,2011

PlanforToday

1.  Treekernels(CollinsandDuffy,2002)2.  Why(input,output)andoutputkernels

aren’treallyavailable

3.  Reranking4.  KernelizingCRFs5.  RaPonalkernels6.  KerneldependencyesPmaPon

KernelsonStructures

•  LastPme,Williamtalkedaboutkernelsonfactorialobjects(treepaths),andalsoaboutstringkernels.–  IdidnotmenPonitinSeptember,buttheM3Npaper(generalizingSVMstostructuredoutputs)useskernelsaswell–oninputs.

•  Theideageneralizesnicelytotrees.•  KeyassumpPon:learningandinferencecanbeaccomplishedifwecanefficientlycalculatef(x)Tf(x’),wherefisourimpliedfeaturespace.

ABitofHistory:“DOP”

•  Data‐orientedparsing:abadnameforaninteresPngidea(Bod,1998).–  EveryconPguoussubtreeisafeature.–  Lotsofpapersonhowtodothisefficiently.– Mostcloselyrelatedtomemory‐basedorinstance‐basedlearning(alongthelinesofKNN).

– Goodman(1996)approximatedwithaPCFG.

•  Theparttoremember:every treefragmentisafeature.

•  Relatedtotreesubs*tu*ongrammar.

AllTreeFragmentsFeatureVector

•  Everypossiblefragmentcorrespondstoadimensioninthevectorf(x).

•  fi(x)=thenumberofPmestheithfragmentoccursinx.

•  f(x)Tf(x’)=numberofexactlymatchingfragmenttokensinxandx’

TreeKernel(CollinsandDuffy,2002)

f(x)!f(x′) =∑

i

fi(x)fi(x′)

=∑

i

(∑

n∈x

[ith fragment matches at n]

)

︸ ︷︷ ︸Ii(n)

(∑

n′∈x′

[ith fragment matches at n′]

)

︸ ︷︷ ︸Ii(n′)

=∑

i

n∈x

Ii(n)∑

n′∈x′

Ii(n′)

=∑

n∈x

n′∈x′

i

Ii(n)Ii(n′)

︸ ︷︷ ︸∆(n,n′)

∆(n, n′) =

0 if productions at n and n′ differ1 if n and n′ are preterminals#kids(n)∏

j=1

(1 + ∆(jth child of n, jth child of n′) otherwise

Notes

•  O(|x||x’|)runPme(numberofnodesineachtree).–  CollinsandDuffyclaimit’sclosertolinearinpracPce.

•  Labeledsequencesareakindoftree.•  YoucanusewordsimilarityfuncPonsinsteadof0/1formatchingwords.

•  CollinsandDuffyusedtheCollinsparser(model2)to:–  providealikelihoodtousealongsidethekernelasafeature

–  provide“mulPplehypotheses”foruseinthevotedperceptronalgorithm

•  ParsinggainsonWSJPennTreebanktask.

“MulPpleHypotheses”?•  Structuredperceptronaswelearnedit(andalsoCRF,SSVM,etc.)

assumewereasonabouttheenPresetofpossibleoutputsyforeachinputx.–  Decoding,summing,cost‐augmenteddecoding.

•  Here,arerankingapproachisassumed. –  Usesomeothermodeltoprovidecandidates.–  DiscriminaPve,kernelizedmodel(here,perceptroninthedual)only

getstorerankcandidates.–  CharniakandJohnson(2005)ranwiththererankingideabutwent

backtolog‐linearmodels,andbyengineeringgoodfeaturesdidquitewell.

•  Reranking:apopularideaintheearly2000s,regardlessofwhetheryouusekernels.

•  Understudiedchallenge:diversityofthen‐bestlist.

GrumpyAside:KnowThyKernel

•  Kernel=setoffeatures•  You’repremymuchalwaysusingakernel.•  Empiricallyitseemsthat:–  knowingyourproblemanddesiginggoodfeaturestoaddtoyour“kernel”isawin

–  tryingallthedifferentkernelsimplementedinSVMlight(withoutunderstandingthedifferences)mayhelpalimle,butnobodycares.

•  Forlanguage,anythingbeyondalinearkernelusuallyneedssomejusPficaPon.

KernelsandDecoding

•  Ideally,wewouldlikekernelsonenPreinputsandoutputs,asinCollinsandDuffy,butlearndirectlyfromthedata,notasasecondaryrerankingstage.

•  Whywon’tthiswork?

decode(x) = arg maxy

w!f(x,y)

= arg maxy

N∑

i=1

y′∈Y(xi)

αi,y′K((xi,y′), (x,y))

KernelsonOutputs

•  InpracPce,apartfromreranking,thisisnotdoneyet.

•  ThereareafewinteresPngpapersthatexplorevariouspossibiliPes,andIwanttodiscusssomeofthem.– KernelCRFs– RaPonalkernels– KerneldependencyesPmaPon

KernelCRFs(Laffertyetal.,2004)

•  Don’ttryforanarbitraryK((x,y),(x’,y’)).•  Instead,defineyourstructureyasanassignmentofvaluestovariablesYinaMarkovnetwork.

•  Kernelsarenowoncliques:K((x,yc),(x’,yc’’)).– Anytwocliquesassignmentsinanytwographs.

•  Representertheorem:inthemodelthatmaximizesregularizedlog‐loss:

score(x,y) =N∑

i=1

c∈cliques(graph(xi))

y′c∈Yc

αi,c,y′cKc((xi,y

′c), (x,yc))

LearningAlgorithm

•  Toomanycliques!•  GreedyforwardselecPon(muchlikeolderfeatureselecPonalgorithms,e.g.,DellaPietraetal.,1997).

•  Basicideaistoiterate:–  Foreverylabeledcliqueinthetrainingdata,calculatethefirstderivaPveoftheobjecPve(regularizedlog‐likelihood)w.r.t.theclique.•  Thisisdoneapproximately,forefficiency.

–  AddthecliquewiththelargestgradienttotheacPveset.–  OpPmizelikelihoodforthecurrentacPvesetofcliques;thisisdoneinthedual.

But…

•  Thistechniqueisnotwidelyused.•  InNLP,mostreportedresultssPckwithlinearkernels;lotsofresultsincludesome“featureengineering.”– Someresearcherssee“featureengineering”asgood,honestwork.

– OthersseeitasadistracPonfrom“general”methods.

– Whatdoyouthink?

RaPonalKernels(Cortesetal.,2004)

•  UndersomecondiPons,youcanuseWFSTstodefineakernelbetweenstrings.– OrbetweensetsofstringsrepresentedasFSAs.

•  ThekernelfuncPonisdefinedbydoingweightedcomposiPonx∘T∘y,andthentakingthesemiringpathsum.– Editdistanceusesmin‐plus.– Stringkernelsuseplus‐Pmes.

PDSKernels

•  NotallkernelsareposiPvedefiniteandsymmetric.–  ThosearenecessarycondiPonsforlearningalgorithmsto“work”withakernel.

•  Cortesetal.definesomeformalproperPes(closureundervariousoperaPons).

•  TheycharacterizesomeexisPngkernelsasPDS.

•  Experimentsincluded,butnotforstructuredoutputs.

KernelDependencyEsPmaPon

PCAandKernelPCA

•  Principalcomponentanalysis(Pearson,1901):transformmulP‐dimensionaldataintouncorrelateddimensions.– EigenvaluedecomposiPonofthecovariancematrix

– SingularvaluedecomposiPonofthedatamatrix

•  KernelPCA(Schoelkopfetal.,1998):doitinaRKHS!– Onlyinnerproductsareneeded.

KernelDependencyEsPmaPon(Westonetal.,2003)

Fornow,imaginejustkernelsonoutputs,K(y,y’).

X

Y

inputs

outputs

“outputfeaturespace”

“pre‐image”problem

kernelPCAmap:principleaxesinRKHSfeaturespace

mulPvariateregression

Punchline

•  YoushouldunderstandthatkernelsareaformalizaPonofthenoPonoffeatures.

•  AbstracPngfeaturesintoakernelcanopenupthepossibilityofusingsomecoollearningalgorithms.

•  ButyouruntheriskofgesngtoofarfromthedataandapplicaPon.

•  Kernelsontheoutput sidecreatesignficantcomputaPonalchallengesthatremaintobesolvedforpracPcaluse.