Yanjun Qi , Pavel P. Kuksa , Ronan Collobert Kunihiko ...qyj/papersA08/ICDM09-slidesv2.pdf ·...
Transcript of Yanjun Qi , Pavel P. Kuksa , Ronan Collobert Kunihiko ...qyj/papersA08/ICDM09-slidesv2.pdf ·...
1MachineLearningDepartment,NECLaboratoriesAmerica,Inc.2ComputerScienceDepartment,RutgersUniversity3ComputerScienceDepartment,NewYorkUniversity4GoogleResearchNewYork
YanjunQi1,PavelP.Kuksa2,RonanCollobert1,KunihikoSadamasa,KorayKavukcuoglu3,JasonWeston4
Background:Learning
UsageSupervisedlearning
Unsupervisedlearning
{(x,y)}labeleddata
{x*}unlabeleddata
Yes No
No Yes
Naturallanguageprocessing(NLP)involvesmanymachinelearningtasks,especiallysequenTallearning
Learning:Supervised(classificaTon,regression,etc.)vs.Unsupervised(clustering,etc)
3
Background:Semi‐SupervisedLearning
LabeleddataareoWenhardtoobtain UnlabeleddataareoWeneasytoobtain:ALot
4
Semi‐supervisedlearningUsage
Supervisedlearning
Unsupervisedlearning
{(x,y)}labeleddata Yes No
No Yes
Yes
Yes
Forinstance,“Self‐Training”• Popularsemi‐supervisedmethodusedinNLP• Induceself‐labeled“pseudo”training“examples”fromunlabeledset
{x*}unlabeleddata
Background:Semi‐SupervisedLearning(Cont’)
Semi‐supervisedLearning(mostnotapplicableforlargescaleNLPtasks)• Self‐trainingorco‐training• TransducTveSVM
• Graph‐basedregularizaTon• EntropyregularizaTon• EMwithgeneraTvemixturemodels
• AuxiliarytaskonunlabeledsetthroughmulT‐tasklearning
• Semi‐supervisedlearningwith“labeledfeatures”• “Labeledfeatures”Priorclass‐biasoffeaturesfromhumanannotaTon
• Using“labeledfeatures”toinduce“pseudo”examplesorenforcesoWconstraintsonpredicTonsofunlabeledexamples
5
Background:Word
IndividualwordsinNLPsystems• CarrysignificantlabelinformaTon
• FundamentalbuildingblocksofNLP• ManybasicNLPtasksinvolvesequencemodelingwithword‐levelevaluaTon• Forexample,namedenTtyrecogniTon(NER),part‐of‐speech(POS)tagging
OurtargetNLPproblems:InformaTonextracTon• Assignlabelstoeachwordinasequenceoftext• EssenTally,classifyeachwordintomulTpleclasses
6
Example NLPTask
…formercaptain[ChrisLewis]… NameEnTty[PersonName]
…thestateof[washington]… NameEnTty[LocaTonName]
Method:Semi‐SupervisedSLF
Provide“semi‐supervision”attheleveloffeatures(e.g.words)relatedtotargetlabels• Throughself‐learnedfeatures(SLF)ofwords(basiccase)
7
• SLFmodelstheprobabilitytoeachtargetclassthiswordmightbeassignedwith
• SLFisunknown(ofcourse)re‐esTmateusingunlabeledexamplesbyapplyingatrainedclassifier
“semi‐supervised”self‐learnedfeatures(SLF)
Method:Semi‐SupervisedSLF(BasicCase)
EmpiricalSLFisesTmatedfromunlabeledexamples• Eachexampleisasequenceofwords• Thus,SLFofawordw,forclassi
8
• (#examplesincludingwordwthatarepredictedasclassi/#examplesincludingwordw)
• Where{x*}representsunlabeledexamples
• Wheref(‐)representsatrainedsupervisedsequenceclassifier
Method:Algorithm(BasicCase)9
Pseudo‐code1. DefinethefeaturerepresentaTonforawordas,andtherepresentaTonforanexample(awindowofwords)as
2. Trainaclassifierf(∙)ontrainingexamples(xi,yi)usingthefeaturerepresentaTon
3. Usef(∙)toesTmatefromunlabeleddata{x*}
4. AugmenttherepresentaTonofwordstoandrefine,where
5. Iteratesteps2to4unTlstoppingcriterionismet.
ExtensionI:BoundarySLF
Rarewordsarethehardesttolabel MoTvaTon:modelthosewordshappeningfrequentlybeforeoraWeracertaintargetclass
BoundarySLF:extendbasicSLFtoincorporatetheclassboundarydistribuTon
11
…formercaptain[ChrisLewis]…
...[Hoddle]said…
...[CRKL],anadapterprotein...
...[SH2‐SH3‐SH3]adapterprotein... *BluecolorwordscarryimportantlabelindicaTons
ExtensionII&ExtensionIII
ExtensionII:ClusteredSLF WordsexhibiTngsimilartargetclass
distribuTonhavesimilarSLFfeatures GroupSLFfeaturesmightgive
strongerindicaTonsoftargetclassorclassboundary
k‐meanstoclusterallwordsintoNclusters,andusecluster‐IDasthenewclustered‐SLFfeatures
12
ExtensionIII:AqributeSLF Treatdiscreteaqributeofwordsasthebasicunitof
sequenceexamples Forinstance,‘stem‐end’forPOStask
WhyUseful?
Noincestuousbiassincenoexamplesareadded
Notrickyparameterstotune(notlike“self‐training”)
SupervisedmodellearnsSLFrelevantornot
SummarizaTonovermanypotenTallabels,henceinfrequentmistakescanbesmoothedout• PotenTallycorrectedonthenextiteraTon
EmpiricalSLFfeaturesforneighboringwordsarehighlyinformaTve
Highlyscalable(addingafewfeatures,notexamples)
Awrapperapproachapplicableonmanyothermethods
13
BaselineNLPSystemI‐NN
Adeepneuralnetwork(NN)basedNLPsystem[Collobert08]
Auxiliarytask“LM”providesonetypeofsemi‐supervision
“Viterbi”trainingenforceslocallabeldependenciesamongneighborhood• SLFenforceslocaldependencyaswell
14
BaselineNLPSystemII‐CRF
CondiTonalRandomField(CRF)[Lafferty01]• State‐of‐the‐artperformanceonmanysequencelabelingtasks
• DiscriminaTveprobabilisTcmodelsoverobservaTonsequencesandlabelsequences
• ApplySLFasawrapperonCRF++toolkit
15
ExperimentalSevng
FourBenchmarkDataSets• CoNLL03GermanNamedEnTtyRecogniTon(NER)
• CoNLL03EnglishNameEnTtyRecogniTon(NER)• EnglishPart‐of‐Speech(POS)benchmarkdata[Toutanova03]
• GeneMenTon(GM)benchmarkdata[BioCreaTveII]
EvaluaTonMeasurements• EnTty‐levelF1:2(precision*recall)/(precision+recall)• Word‐levelerrorrateforPOStask
16
TokenSize Training(Labeled) UnlabeledGermanNER 206,931 ~60M
EnglishNER 203,621 ~200M
EnglishPOS 1,029,858 ~300M
BioGM 345,996 ~900M
PerformanceComparison(GermanNER)
IOBESstyleofclasstag/5wordsslidingwindow Allfeaturescase
• (word,capitalizaTonflag,prefixandsuffix(lengthupto4),part‐of‐speechtags,textchunk,stringpaqerns)
BestCoNLL03team:testF1‐74.17
Baselineclassifier:NN
17
SeVng TestF1 +BasicSLF
wordonly 45.89 51.10
wordonly+Viterbi 50.61 53.46
allfeatures+LM 72.44 73.32
allfeatures+LM+Viterbi 74.33 75.72
PerformanceComparison(EnglishNER)
IOBESstyleofclasstag/7wordsslidingwindow Allfeaturescase
• (word,cap,dicTonary) BestCoNLL03team:testF1–88.76
Baselineclassifier:NN
18
SeVng TestF1 +BasicSLF
word+cap 77.82 79.38
word+cap+Viterbi 80.53 81.51
word+cap+dict+LM 86.49 86.88
word+cap+dict+LM+Viterbi 88.40 88.69
PerformanceComparison(EnglishPOS) IOBESstyleofclasstag/5wordsslidingwindow Allfeaturescase
• (word,cap,stem‐end)
Bestresult(weknow):testerrorrate2.76%• WER:token‐levelerrorrate
Baselineclassifier:NN
19
SeVng WER +BasicSLF +A_ributeSLF
word 4.99 4.06 ‐
word+LM 3.93 3.89 ‐
word+cap+stem 3.28 2.99 2.86
word+cap+stem+LM 2.79 2.75 2.73
PerformanceComparison(BioGM)
Lookforgeneorproteinnameinbio‐literature(twoclasses:geneornot)
Allfeaturescase• (word,cap,prefixandsuffix(lengthupto4).Stringpaqern)
BestBioCreaTveIIteam:testF1–87.21• Manyothercomplexfeatures+Bio‐direcTonalCRFtraining
Baselineclassifier:CRF++
20
SeVng TestF1 +ClusteredSLF
word+cap 82.02 84.01(onBasicSLF)
word+cap 82.02 85.24(onBoundarySLF)
word+cap+pref+suf+str
86.34 87.16(onBoundarySLF)
PerformanceComparisontoSelf‐Training SelftrainingwithrandomselecTonscheme:
• GivenLtrainingexamples,chooseL/R(Risaparametertochoose)unlabeledexamplestoaddinnextround’straining
Self‐TrainingonGermanNER
Self‐TrainingonEnglishNER
SLFhasbeqerbehaviorthanself‐training(witharandomselecTonstrategy)
21
SeVng Baseline R=1 R=10 R=100Wordsonly+viterbi 50.61 47.07 47.92 47.9All+LM+Viterbi 74.33 73.42 74.41 73.9
SeVng Baseline R=1 R=20 R=100
Wordsonly+Viterbi 80.53 79.51 81.01 80.85
Word+Cap+dict+LM+Viterbi 88.40 87.64 88.07 88.17
Conclusions
Semi‐supervisedSLFispromisingforsequencelabelingtasksinNLP
Easilyextendableforothercases,suchaspredictedclassdistribuTons(orrelated)foreachn‐gram
Easilyextendableforotherdomains,suchassenTmentalanalysis(word’sclassdistribuTonasthedistribuTonoflabelsofdocumentscontainingthisword)• “cashback”toclass“shopping”
22