Yanjun Qi , Pavel P. Kuksa , Ronan Collobert Kunihiko ...qyj/papersA08/ICDM09-slidesv2.pdf ·...

1MachineLearningDepartment,NECLaboratoriesAmerica,Inc.2ComputerScienceDepartment,RutgersUniversity3ComputerScienceDepartment,NewYorkUniversity4GoogleResearchNewYork

YanjunQi1,PavelP.Kuksa2,RonanCollobert1,KunihikoSadamasa,KorayKavukcuoglu3,JasonWeston4

Roadmap

 Background Method(Self‐LearnedFeatures:SLF)

 BaselineSystems

 ExperimentalResults

2

Background:Learning

UsageSupervisedlearning

Unsupervisedlearning

{(x,y)}labeleddata

{x*}unlabeleddata

Yes No

No Yes

 Naturallanguageprocessing(NLP)involvesmanymachinelearningtasks,especiallysequenTallearning

 Learning:Supervised(classificaTon,regression,etc.)vs.Unsupervised(clustering,etc)

3

Background:Semi‐SupervisedLearning

 LabeleddataareoWenhardtoobtain UnlabeleddataareoWeneasytoobtain:ALot

4

Semi‐supervisedlearningUsage

Supervisedlearning

Unsupervisedlearning

{(x,y)}labeleddata Yes No

No Yes

Yes

Yes

 Forinstance,“Self‐Training”•  Popularsemi‐supervisedmethodusedinNLP•  Induceself‐labeled“pseudo”training“examples”fromunlabeledset

{x*}unlabeleddata

Background:Semi‐SupervisedLearning(Cont’)

 Semi‐supervisedLearning(mostnotapplicableforlargescaleNLPtasks)•  Self‐trainingorco‐training•  TransducTveSVM

•  Graph‐basedregularizaTon•  EntropyregularizaTon•  EMwithgeneraTvemixturemodels

•  AuxiliarytaskonunlabeledsetthroughmulT‐tasklearning

•  Semi‐supervisedlearningwith“labeledfeatures”•  “Labeledfeatures”Priorclass‐biasoffeaturesfromhumanannotaTon

•  Using“labeledfeatures”toinduce“pseudo”examplesorenforcesoWconstraintsonpredicTonsofunlabeledexamples

5

Background:Word

 IndividualwordsinNLPsystems•  CarrysignificantlabelinformaTon

•  FundamentalbuildingblocksofNLP• ManybasicNLPtasksinvolvesequencemodelingwithword‐levelevaluaTon•  Forexample,namedenTtyrecogniTon(NER),part‐of‐speech(POS)tagging

 OurtargetNLPproblems:InformaTonextracTon•  Assignlabelstoeachwordinasequenceoftext•  EssenTally,classifyeachwordintomulTpleclasses

6

Example NLPTask

…formercaptain[ChrisLewis]… NameEnTty[PersonName]

…thestateof[washington]… NameEnTty[LocaTonName]

Method:Semi‐SupervisedSLF

 Provide“semi‐supervision”attheleveloffeatures(e.g.words)relatedtotargetlabels•  Throughself‐learnedfeatures(SLF)ofwords(basiccase)

7

•  SLFmodelstheprobabilitytoeachtargetclassthiswordmightbeassignedwith

•  SLFisunknown(ofcourse)re‐esTmateusingunlabeledexamplesbyapplyingatrainedclassifier

“semi‐supervised”self‐learnedfeatures(SLF)

Method:Semi‐SupervisedSLF(BasicCase)

 EmpiricalSLFisesTmatedfromunlabeledexamples•  Eachexampleisasequenceofwords•  Thus,SLFofawordw,forclassi

8

•  (#examplesincludingwordwthatarepredictedasclassi/#examplesincludingwordw)

• Where{x*}representsunlabeledexamples

• Wheref(‐)representsatrainedsupervisedsequenceclassifier

Method:Algorithm(BasicCase)9

 Pseudo‐code1. DefinethefeaturerepresentaTonforawordas,andtherepresentaTonforanexample(awindowofwords)as

2. Trainaclassifierf(∙)ontrainingexamples(xi,yi)usingthefeaturerepresentaTon

3. Usef(∙)toesTmatefromunlabeleddata{x*}

4. AugmenttherepresentaTonofwordstoandrefine,where

5. Iteratesteps2to4unTlstoppingcriterionismet.

ModifiedSLF:WordSlidingWindowCase10

x=(x1,…,x5)eachexampleawindowofwords

ExtensionI:BoundarySLF

 Rarewordsarethehardesttolabel MoTvaTon:modelthosewordshappeningfrequentlybeforeoraWeracertaintargetclass

 BoundarySLF:extendbasicSLFtoincorporatetheclassboundarydistribuTon

11

…formercaptain[ChrisLewis]…

...[Hoddle]said…

...[CRKL],anadapterprotein...

...[SH2‐SH3‐SH3]adapterprotein... *BluecolorwordscarryimportantlabelindicaTons

ExtensionII&ExtensionIII

 ExtensionII:ClusteredSLF  WordsexhibiTngsimilartargetclass

distribuTonhavesimilarSLFfeatures  GroupSLFfeaturesmightgive

strongerindicaTonsoftargetclassorclassboundary

  k‐meanstoclusterallwordsintoNclusters,andusecluster‐IDasthenewclustered‐SLFfeatures

12

 ExtensionIII:AqributeSLF  Treatdiscreteaqributeofwordsasthebasicunitof

sequenceexamples  Forinstance,‘stem‐end’forPOStask

WhyUseful?

 Noincestuousbiassincenoexamplesareadded

 Notrickyparameterstotune(notlike“self‐training”)

 SupervisedmodellearnsSLFrelevantornot

 SummarizaTonovermanypotenTallabels,henceinfrequentmistakescanbesmoothedout•  PotenTallycorrectedonthenextiteraTon

 EmpiricalSLFfeaturesforneighboringwordsarehighlyinformaTve

 Highlyscalable(addingafewfeatures,notexamples)

 Awrapperapproachapplicableonmanyothermethods

13

BaselineNLPSystemI‐NN

 Adeepneuralnetwork(NN)basedNLPsystem[Collobert08]

 Auxiliarytask“LM”providesonetypeofsemi‐supervision

 “Viterbi”trainingenforceslocallabeldependenciesamongneighborhood•  SLFenforceslocaldependencyaswell

14

BaselineNLPSystemII‐CRF

 CondiTonalRandomField(CRF)[Lafferty01]•  State‐of‐the‐artperformanceonmanysequencelabelingtasks

•  DiscriminaTveprobabilisTcmodelsoverobservaTonsequencesandlabelsequences

•  ApplySLFasawrapperonCRF++toolkit

15

ExperimentalSevng

 FourBenchmarkDataSets•  CoNLL03GermanNamedEnTtyRecogniTon(NER)

•  CoNLL03EnglishNameEnTtyRecogniTon(NER)•  EnglishPart‐of‐Speech(POS)benchmarkdata[Toutanova03]

•  GeneMenTon(GM)benchmarkdata[BioCreaTveII]

 EvaluaTonMeasurements•  EnTty‐levelF1:2(precision*recall)/(precision+recall)•  Word‐levelerrorrateforPOStask

16

TokenSize Training(Labeled) UnlabeledGermanNER 206,931 ~60M

EnglishNER 203,621 ~200M

EnglishPOS 1,029,858 ~300M

BioGM 345,996 ~900M

PerformanceComparison(GermanNER)

 IOBESstyleofclasstag/5wordsslidingwindow Allfeaturescase

•  (word,capitalizaTonflag,prefixandsuffix(lengthupto4),part‐of‐speechtags,textchunk,stringpaqerns)

 BestCoNLL03team:testF1‐74.17

 Baselineclassifier:NN

17

SeVng TestF1 +BasicSLF

wordonly 45.89 51.10

wordonly+Viterbi 50.61 53.46

allfeatures+LM 72.44 73.32

allfeatures+LM+Viterbi 74.33 75.72

PerformanceComparison(EnglishNER)

 IOBESstyleofclasstag/7wordsslidingwindow Allfeaturescase

•  (word,cap,dicTonary) BestCoNLL03team:testF1–88.76


18

SeVng TestF1 +BasicSLF

word+cap 77.82 79.38

word+cap+Viterbi 80.53 81.51

word+cap+dict+LM 86.49 86.88

word+cap+dict+LM+Viterbi 88.40 88.69

PerformanceComparison(EnglishPOS) IOBESstyleofclasstag/5wordsslidingwindow Allfeaturescase

•  (word,cap,stem‐end)

 Bestresult(weknow):testerrorrate2.76%•  WER:token‐levelerrorrate


19

SeVng WER +BasicSLF +A_ributeSLF

word 4.99 4.06 ‐

word+LM 3.93 3.89 ‐

word+cap+stem 3.28 2.99 2.86

word+cap+stem+LM 2.79 2.75 2.73

PerformanceComparison(BioGM)

 Lookforgeneorproteinnameinbio‐literature(twoclasses:geneornot)

 Allfeaturescase•  (word,cap,prefixandsuffix(lengthupto4).Stringpaqern)

 BestBioCreaTveIIteam:testF1–87.21•  Manyothercomplexfeatures+Bio‐direcTonalCRFtraining

 Baselineclassifier:CRF++

20

SeVng TestF1 +ClusteredSLF

word+cap 82.02 84.01(onBasicSLF)

word+cap 82.02 85.24(onBoundarySLF)

word+cap+pref+suf+str

86.34 87.16(onBoundarySLF)

PerformanceComparisontoSelf‐Training SelftrainingwithrandomselecTonscheme:

•  GivenLtrainingexamples,chooseL/R(Risaparametertochoose)unlabeledexamplestoaddinnextround’straining

 Self‐TrainingonGermanNER

 Self‐TrainingonEnglishNER

 SLFhasbeqerbehaviorthanself‐training(witharandomselecTonstrategy)

21

SeVng Baseline R=1 R=10 R=100Wordsonly+viterbi 50.61 47.07 47.92 47.9All+LM+Viterbi 74.33 73.42 74.41 73.9

SeVng Baseline R=1 R=20 R=100

Wordsonly+Viterbi 80.53 79.51 81.01 80.85

Word+Cap+dict+LM+Viterbi 88.40 87.64 88.07 88.17

Conclusions

 Semi‐supervisedSLFispromisingforsequencelabelingtasksinNLP

 Easilyextendableforothercases,suchaspredictedclassdistribuTons(orrelated)foreachn‐gram

 Easilyextendableforotherdomains,suchassenTmentalanalysis(word’sclassdistribuTonasthedistribuTonoflabelsofdocumentscontainingthisword)•  “cashback”toclass“shopping”

22

Yanjun Qi , Pavel P. Kuksa , Ronan Collobert Kunihiko ...qyj/papersA08/ICDM09-slidesv2.pdf ·...

Documents

Transcript of Yanjun Qi , Pavel P. Kuksa , Ronan Collobert Kunihiko ...qyj/papersA08/ICDM09-slidesv2.pdf ·...