Download - “Eve Eat Dust Mop”mernst/advice/poster...“Eve Eat Dust Mop” Measuring Syntac6c Development in Child Language with Natural Language Processing and Machine Learning Shannon N.

Transcript
Page 1: “Eve Eat Dust Mop”mernst/advice/poster...“Eve Eat Dust Mop” Measuring Syntac6c Development in Child Language with Natural Language Processing and Machine Learning Shannon N.

“EveEatDustMop”MeasuringSyntac6cDevelopmentinChildLanguage

withNaturalLanguageProcessingandMachineLearningShannonN.Lube-ch(PomonaCollege’15)

Ins-tuteforCrea-veTechnologies,UniversityofSouthernCalifornia

Automa6cIPSynGoal:Createaprogramthat,givenatranscript,automa-callycalculatesIPSynscore.

Results:Automa6cIPSyn

AcknowledgementsIwouldliketothankProfessorKenjiSagaeforintroducingmetothisproject,andforprovidinghisguidance,advice,support,andsharedknowledgealongtheway.IthankEvanSumafororganizingmyREU,andNSFforgenerouslyprovidingthefundingforit.IwouldalsothanktothankeveryoneatICTfortrea-ngmeaspartofthefamily.

FullyData-DrivenMeasurementGoal:EliminateexplicitIPSynItemsbytrainingamachineonexpertlevelannota-onsandscoressothatitcanproduceanIPSynscorebasedsolelyonfeaturesofatranscript.

SUBJ

n:prop

n:propv

n:propSUBJv

Eveeatdustmop.n:propvnn

SUBJ OBJ

MOD

Data-DrivenApproachtoAgePredic6on

Background:IndexofProduc6veSyntax(IPSyn)Thehigherthescore,themoregramma6callycomplexthelanguage

Arewegoingtotheparty?

N1:Noun

N4:2-wordNounPhrase

N5:Articlebeforeanoun

N6:2-wordNounPhraseafterprep

N2:Pronoun

V1:Verb

V2:Preposition V3:

Prepositionalphrase

V6:AuxiliaryV7:Progressivesuffix(-ing)

Q8:Y/Nquestionwithinvertedaux

S1:2-wordcombination

IPSynmeasuresgramma-calcomplexityoflanguagebyevalua-ng100successiveuWerancesandawardingpointsbasedondefined“IPSynItems.”Thereare60itemsthatcanearnupto2pointseach,resul-nginatotalpossiblescoreof120points.Thisscorecanbebrokendownintosubsec-onsofNounPhrases,VerbPhrases,Ques-onsandNega-ons,andSentenceStructures.

Pattern:N4Description:2-wordNPGR:DETMODQUANTParentrequirements:Index:forward-1POS:n

thisunicorn

fluffyunicorn

oneunicorn

DET

MOD

QUANT

DefiningIPSynItems Preprocessing

Scarborough1990:75transcriptsHoursofmanualscoring

Automa-cIPSyn(thiswork):593transcriptsMinutestocompute

Whenmeasuringchildlanguagedevelopment,researchersoeenfacechoicesbetweeneasilycomputablemetricsfocusedonsuperficialaspectsoflanguageandmoreexpressivemetricsthatrelyongramma-calstructurebutrequiresubstan-allabor.Toadvanceresearchinchildlanguagedevelopment,wepresentanautoma-cscoringsystemfacilita-ngeasyanalysisoflargenumbersoftranscripts.Addi-onally,weexploreamachinelearningapproachtoproducescoresofgramma-calcomplexitybasedonextrac-ngmorphologicalandsyntac-cfeaturesofchilduWerances.Bothtechniquesachieveaccuracysimilartothatoflanguageresearchersandrevealtrendsinsyntac-cdevelopment,offeringpromisingresultsforfutureresearchandapplica-on.Wecanfurtherapplyourdata-drivenapproachtopredic-ngage,whichdoesnotrequireapreviously-defined,language-specificinventoryofgramma-calstructures.

AverageabsolutedifferenceManual:~5points Automa-c:3.6points

Child(corpus) MeanAbsErr Pearson(r)

Adam(Brown) 2.5 0.93

Ross(MacWhinney) 3.7 0.84

Naomi(Sachs) 3.1 0.91

Child(corpus) MLUr IPSynr Regressionr

Adam(Brown) 0.37† 0.53† 0.85†

Ross(MacWhinney) 0.19 0.34* 0.79†

Naomi(Sachs) 0.27 0.52 0.82††p<0.0001

∗p<0.05

Forchildrenatleast3yrs4monthsold

FutureWork¡  Expandtodifferentlanguages(Spanish,Japanese,Hebrew)

¡  Improvepatterns¡  Trainclassifieronmoredata

MOR(MacWhinney2000)

MorphologicalanalyzerPartofspeech

POST(Parisse&LeNormand

2000)PartofspeechtaggerDisambiguate

MEGRASP(Sagaeetal2005/2007)

DependencyparserGramma-calrela-ons

Eveeatdustmop.n:propvnn

SUBJ OBJ

MOD

FeatureExtrac-on