The TIGER Treebank

18
The TIGER Treebank Sabine Brants , Stefanie Dipper , Silvia Hansen , Wolfgang Lezius , George Smith Computational Linguistics, Saarland University, Postfach 15 11 50 Saarbr¨ ucken, Germany sabine, hansen @coli.uni-sb.de Institute of Natural Language Processing (IMS), Stuttgart University, Germany dipper, lezius @ims.uni-stuttgart.de Institut f ¨ ur Germanistik, Potsdam University, Germany [email protected] Abstract This paper reports on the TIGER Treebank, a corpus of currently 35.000 syntactically an- notated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interac- tive annotation, using the tool Annotate, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGERin, a graphical user interface for query input. The paper concludes with a summary and some directions for future work. 1 Introduction Corpus-based methods play an important role in empirical linguistics as well as in machine learning methods in NLP. In these two areas of research, large natural language corpora, enriched with syntactic information, are needed. Thus, in recent years, there has been an increasing interest in the construction of these syntactically annotated corpora, commonly called ‘treebanks’ (Lezius, 2001). For German, the first initiative in the field of treebanks was the NEGRA Corpus ((Skut, Brants, Krenn, & Uszkoreit, 1998) and (Brants, Skut, & Uszkoreit, 1999)), which contains syntactically interpreted newspaper texts. Furthermore, there is the VerbmobilCorpus (Wahlster, 2000), which covers the area of spoken language. This paper reports on the TIGER Treebank project, which aims at building the largest and most ex- haustively annotated treebank for German. The annotation format and scheme are based on the NE- GRA corpus; however, the TIGER Treebank exceeds the NEGRA corpus in size as well as in detail of annotation. Since the NEGRA Corpus is rather restricted in its size (20,000 syntactically annotated sentences) and the Verbmobil Corpus in its domains (i.e. spontaneous speech for the appointment negotiation domain), the construction of the TIGER Treebank as a comprehensive resource for the German language was a necessary step to overcome these drawbacks. This paper is structured in the following way: Section 2 describes the annotation format and provides general information on the annotation scheme. Furthermore, it contains a short overview of treebank 1

Transcript of The TIGER Treebank

Page 1: The TIGER Treebank

TheTIGERTreebank

SabineBrants�, StefanieDipper

�, Silvia Hansen

�, WolfgangLezius

�, GeorgeSmith

�ComputationalLinguistics,SaarlandUniversity, Postfach15 11 50 Saarbrucken,Germany�

sabine,hansen� @coli.uni-sb.de�Instituteof NaturalLanguageProcessing(IMS), StuttgartUniversity, Germany�

dipper, lezius� @ims.uni-stuttgart.de�Institut fur Germanistik,PotsdamUniversity, Germany

[email protected]

Abstract

This paperreportson the TIGER Treebank,a corpusof currently 35.000syntacticallyan-notatedGermannewspapersentences.We describewhat kind of information is encodedin thetreebankandintroducethe differentrepresentationformatsthat areusedfor the annotationandexploitation of the treebank.We explain the differentmethodsusedfor the annotation:interac-tive annotation,usingthe tool Annotate, andLFG parsing. Furthermore,we give an accountofthe annotationschemeusedfor the TIGER treebank.This schemeis an extendedandimprovedversionof theNEGRA annotationschemeandwe illustratein detailthelinguistic extensionsthatweremadeconcerningtheannotationin theTIGER project.Themaindifferencesareconcernedwith coordination,verb-subcategorization,expletivesaswell aspropernouns. In addition, thepaperalsopresentsthequerytool TIGERSearchthatwasdevelopedin theprojectto exploit thetreebankin anadequateway. We describethequerylanguagewhich wasdesignedto facilitateasimpleformulationof complex queries;furthermore,we shortly introduceTIGERin, a graphicaluserinterfacefor queryinput. Thepaperconcludeswith asummaryandsomedirectionsfor futurework.

1 Introduction

Corpus-basedmethodsplay an importantrole in empiricallinguisticsaswell asin machinelearningmethodsin NLP. In thesetwo areasof research,largenaturallanguagecorpora,enrichedwith syntacticinformation,areneeded.Thus,in recentyears,therehasbeenanincreasinginterestin theconstructionof thesesyntacticallyannotatedcorpora,commonlycalled‘treebanks’(Lezius,2001).

For German,thefirst initiative in thefield of treebankswastheNEGRACorpus((Skut,Brants,Krenn,& Uszkoreit, 1998)and(Brants,Skut,& Uszkoreit, 1999)),which containssyntacticallyinterpretednewspapertexts. Furthermore,thereis theVerbmobilCorpus(Wahlster, 2000),whichcoverstheareaof spokenlanguage.

This paperreportson theTIGER Treebankproject,which aimsat building the largestandmostex-haustively annotatedtreebankfor German.Theannotationformatandschemearebasedon theNE-GRA corpus;however, theTIGER TreebankexceedstheNEGRA corpusin sizeaswell asin detailof annotation.SincetheNEGRACorpusis ratherrestrictedin its size(20,000syntacticallyannotatedsentences)and the VerbmobilCorpusin its domains(i.e. spontaneousspeechfor the appointmentnegotiationdomain),the constructionof the TIGER Treebankasa comprehensive resourcefor theGermanlanguagewasanecessarystepto overcomethesedrawbacks.

This paperis structuredin thefollowing way: Section2 describestheannotationformatandprovidesgeneralinformationon theannotationscheme.Furthermore,it containsa shortoverview of treebank

1

Page 2: The TIGER Treebank

initiatives for languagesother thanGerman. In Section3, the differentmethodsusedfor the anno-tationof the treebankarepresented.The linguistic extensionsthatweremadein theTIGER projectconcerningtheannotationschemearecoveredin Section4. Section5 givesanoverview of thequerylanguageandquerytool thatweredevelopedin theprojectfor theexploitationof thetreebank.Finally,Section6 summarizesthepaperandsketchessomeideasfor futurework.

2 The TIGER Corpus

Thebasisof theTIGERTreebankaretexts from theGermannewspaperFrankfurterRundschau. Onlycompletearticleswereused,which were taken from all kind of domains1 so as to cover a broaderrangeof languagevariations. At the currentstage,the corpuscontains35.000syntacticallyanno-tatedsentences.In thesecondprojectphase,this amountis to beextendedto approximately45.000sentences.

2.1 Levels of annotation

In theNEGRA aswell asin theTIGER corpus,a hybrid framework is usedwhich combinesadvan-tagesof dependency grammarandphrasestructuregrammar. Thesyntacticstructureis representedbya tree. Thebranchesof a treemaycross,allowing theencodingof local andnon-localdependenciesandeliminatingtheneedfor traces.This approachhasconsiderableadvantagesfor free-word orderlanguagessuchas German,which show a large variety of discontinuousconstituency types(Skut,Krenn,Brants,& Uszkoreit,1997).

The linguistic annotationof eachsentencein the TIGER Treebankis representedon a numberofdifferentlevels (seefigure1): Part-of-speechinformationis encodedin terminalnodes(on thewordlevel). Non-terminalnodesarelabeledwith phrasecategories.Theedgesof a treerepresentsyntacticfunctions. Furthermore,a supplementaryannotationon the word level facilitatesthe encodingofinformationon lemmataandmorphology2. For part-of-speechtagging,theStuttgart-Tubingen-Tagset(Schiller, Teufel, & Stockert, 1999) is usedin a slightly modifiedversion((Kramp & Preis,2000)and(Smith& Eisenberg, 2000)). Informationon lemmataandmorphologywasnot annotatedin theNEGRA corpus;this is anew featurethatwasaddedto theannotationin theTIGERproject.

Syntacticstructuresareratherflat andsimplein orderto reducethepotentialfor attachmentambigui-ties.Thedistinctionbetweenargumentsandadjuncts,for instance,is notexpressedin theconstituentstructure,but is insteadencodedby meansof syntacticfunctions.

Apart from the annotationof morphologyand lemmata,anotherannotationlevel wasaddedto theTIGER corpus:Secondaryedges,i.e. labelleddirectedarcsbetweenarbitrarynodes,areusedto en-codecoordinationinformation.Currently, thesesecondaryedgesareonly employedfor theannotationof coordinatedsentencesandverbphrases;anotherpotentialusemight be thesystematicannotationof attachmentambiguities.

1We only excludedregional news and sportsnews becauseexperiencesfrom the pastshowed that thesetexts oftencontainedtables,enumerations,etc. insteadof completesentences.

2Theexamplein Figure1, however, containsno lemmataannotation,but a literal translationinstead.

2

Page 3: The TIGER Treebank

0 1 2 3 4 5 6 7 8 9

500 501

502

503

Neben

APPR�

Dat

besides

denART�

Def.*.Dat.Pl

the�

Mitteln

NN

Neut.Dat.Pl.*

means

desART�

Def.Neut.Gen.Sg

of_the�

Theaters

NN

Neut.Gen.Sg.*

theater�

benutzteVVFIN�

3.Sg.Past.Ind�

used

Moran

NE

*.Nom.Sg

Moran

dieART�

Def.Fem.Akk.Sg

the�

Toncollage

NN

Fem.Akk.Sg.*

sound_collage�

.

$.�−−

NK NK NK NK

AC NK NK

NP

GR

PP

MO HD SB

NP

OA

S

Figure1: Differentlevelsof annotation

2.2 Annotation formats

In the TIGER project, we useseveral annotationformatsfor corpusstorage,export andquerying.Thereexist scriptsthatenablethetransformationfrom oneformatto another.

First of all, the annotatedsentencesarestoredand maintainedin a MySQL database;informationabouttheannotationis containedin tables.An additionaloutputformat is usedfor theexport of thesentences.Thedatabaseentries(words,morphologicaltags,terminalnodes,non-terminalnodesandedges)canbe exportedto a tablestoredin a line-orientedandASCII-basedformat (Brants,1997).The major advantageof this export format is that it is easily readablefor humansaswell aseasilyprocessablefor machines.Sentenceboundariesare identified throughsentencestartandend tags.Furthermore,informationon sentenceorigins,editorsandusedtagsis storedat thebeginningof eachexport file.

Basedon thisexport format,theTIGERcorpuscanbetransferredinto a third format,namelyTIGERXML (Lezius, Biesinger, & Gerstenberger, 2002a). A TIGER XML file is typically split up intoheaderandbody. Thecorpusheadercontainsmeta-informationon thecorpus(suchascorpusname,date,author, etc.) anda declarationof the tagsthat areusedfor morphology, part-of-speech,non-terminal nodesand edges. In the corpusbody, directedacyclic graphsareusedas the underlyingdatamodelto encodethe linguistic annotation.Words,part-of-speechtags,morphologicaltagsandlemmataoccurasattributesof the element‘terminal’, whereasnon-terminalsarerepresentedin anadditionalelementcalled‘nonterminal’ referingto the correspondingterminalID. Secondaryedgesareencodedexplicitly aswell. By usingan XML format, the TIGER Treebankis exchangableandusablewith a large rangeof tools. TheXML format is alsothebasisfor theuseof thecorpusquerytool TIGERSearch(Lezius& Konig,2000).

2.3 Comparable treebank initiatives

Oneof the first andbestknown treebanksis the PennTreebankfor the English language(Marcuset al., 1994),which consistsof about1 million wordsof newspapertext. It containspart-of-speechtaggingandroughsyntacticandsemanticannotation.A bracketingformatis usedto encodepredicate-argumentstructureandtrace-fillermechanismsareusedto representdiscontinuousphenomena.Othercomparabletreebanksfor Englishare,for instance,theSusanneCorpus(Sampson,1995)(containingdetailedpart-of-speechinformation and phrasestructureannotation),the LancasterParsedCorpus(Leech,1992) (representingphrasestructureannotationby meansof labelledbracketing) and the

3

Page 4: The TIGER Treebank

British part of the InternationalCorpusof English (Greenbaum,1996) (about1 million words ofBritish Englishthatweretagged,parsedandafterwardschecked).

For languagesotherthanEnglish,a fairly well-known treebankis thePragueDependency Treebankfor Czech(Hajic, 1999).It containsabout450.000tokensandis annotatedonthreelevels:onthemor-phologicallevel (tags,lemmata,word forms), on the syntacticlevel (usingdependency syntax)andon the tectogrammaticallevel (encodingfunctionssuchasActor, Patient,etc.). Recently, treebankprojectsfor otherlanguageshave cometo life aswell, e.g. for French(Abeille, Clement,& Kinyon,2000), Italian (Bosco,Lombardo,Vassallo,& Lesmo,2000),Spanish(Moreno,Grishman,Lopez,Sanchez,& Sekine,2000),Turkish(Oflazer, Hakkani-Tur, & Tur, 1999),Russian(Boguslavsky, Grig-orieva, Grigoriev, Kreidlin, & Frid, 2000)andBulgarian(Simov et al., 2002). More initiatives forlinguistically interpretedcorporacanbe found in (Uszkoreit, Brants,& Krenn, 1999)and(Abeille,Brants,& Uszkoreit,2000).

3 Annotation methods

Weusetwo differentmethodsfor thesyntacticannotationof theTIGERcorpus:InteractiveannotationandLFG parsing.Thefirst oneis acombinationof probabilisticparsingandhumanintervention(Sec-tion 3.1). After theparsingis completed,morphologicalannotationis performedsemi-automatically,using the given syntacticannotationfor disambiguation.For the secondmethod,a symbolicLFGgrammaris usedto parselargepartsof thecorpus;theoutputis disambiguatedby a humanannotator(Section3.2).

3.1 Interactive Tagging and Parsing

Interactive annotationis anefficient combinationof automaticparsingandhumanannotation.Insteadof having anautomaticparseraspreprocessoranda humanannotatoraspostprocessor, thetwo stepsareinterwovenin ourapproach.Theparsergeneratesasmallpartof theannotation,which is immedi-atelypresentedvisually to thehumanannotator, who caneitheraccept,corrector rejectit. Basedontheannotator’s decision,theparserproposesthenext partof theannotation,which is againsubmittedto theannotator’s judgement.Thisprocessis repeateduntil theannotationof thesentenceis complete.

The advantageof this interactive methodis that the humandecisionscanbe usedby the automaticparser. Thus,errorsmadeby the automaticparserat lower levels arecorrectedinstantlyanddo not‘shinethrough’onhigherlevels.Thechancesgrow thattheautomaticparserproposescorrectanalyseson higherlevels.

Theinteractive annotationworksonseverallayers.Thelowestoneis thepart-of-speechlayer. Higherlayersaredefinedby the depthof the syntacticstructure. Eachlayer is representedby a differentMarkov Model, hencethe nameCascadedMarkov Models(Brants,1999). The first stepin the an-notationprocessis thegenerationof part-of-speechtags. This stepis performedusingthestatisticaltaggerTnT (Brants,2000b). In additionto thetags,TnT alsogeneratesprobabilitiesthathelp to de-cideon thereliability of a proposedtag. The lower theprobabilityof alternative tags,thehigherthereliability of thebesttagfor aword(Brants& Skut,1998).Approximately84%of all tagassignmentsareclassifiedasreliableby thetagger. Theremaining16%needto beproof-readby humanannotators.

Oncethepart-of-speechtaggingis done,Markov modelsfor higherlayersstartprocessing.Hypothet-ical phrasesaregenerated,andtheonewith thehighestprobability is displayedto theannotator. Thestructurecanbeaccepted,rejectedor manuallycorrectedby theannotator. Interventionby thehuman

4

Page 5: The TIGER Treebank

Figure2: Theannotationtool Annotate

annotatorimmediatelychangesthe setof hypothesesusedby the parser. The syntacticstructureisbuilt phraseby phrase,bottomup. About71%of thephrasessuggestedby theparserarecorrect,17%needminor intervention(i.e., at mostoneconstituentneedsto be addedor deleted).The remaining12%requiremajorinterventionby thehumanannotator.

Bothtaggerandparserareentirelytrainedonpreviously (manually)annotateddata.No manualgram-maror lexicon developmentarenecessary. The annotationschemeis learntautomaticallyby taggerandparser. In caseof changesin the annotationscheme,only a small amountof dataneedsto bechangedmanually. The taggerandparserarethentrainedon thechangeddataandareimmediatelyreadyfor annotationwith thenew scheme.

Theannotationis performedwith thehelpof thetool Annotate(Figure2), a graphicaluserinterfacewith acomprehensive setof treemanipulationfunctionsanddatabaseaccess(Plaehn& Brants,2000).AnnotaterunstheTnT taggerandtheCascadedMarkov Modelsin thebackground.

In orderto achieveahigh level of consistency andto avoid mistakes,weuseavery thoroughapproachto theannotation:First, eachsentenceis annotatedindependentlyby two annotators.With thesup-port of scripts,they afterwardscomparetheir annotationsandcorrectobvious mistakes. Remainingdifferencesaresubmittedto adiscussionbetweentheannotators.Althoughthisprocessis rathertime-consuming,it hasproven to be highly beneficialfor theaccuracy of theannotation(Brants,2000a).Furthermore,it alsosupportsthecontinuousimprovementof theannotationscheme:It is in thediscus-sion betweentheannotatorsthat discrepanciesbetweenthe annotationschemeandthe databecomeobvious. If this happensto be the case,new rulesandbettertestsfor operationalizationareaddedto theannotationscheme.Thus,thereis a cross-fertilizationbetweenthecorpusandtheannotationscheme.

Morphology and Lemmata For theanalysisof lemmataandmorphologicaltags,we useanaddi-tionalgraphicaluserinterface.This is interleavedwith atool calledTigerMorphwhichwasdeveloped

5

Page 6: The TIGER Treebank

Figure3: LFG c- andf-structure

by BertholdCrysmann.TigerMorphdisambiguatestheoutputof amorphologicalanalyseronthebasisof thealreadyexisting syntacticstructure.It proposeslemmataandmorphologicaltagsfor thewordsof a sentence,proceedingfrom left to right. The annotationof morphologyandlemmataresemblestheinteractive annotationof thesyntacticstructuredescribedabove. Ambiguoustagsarepresentedtotheannotator, who thendecideswhich oneis thecorrectalternative. This informationis returnedtoTigerMorphandusedfor thedisambiguationof themorphologicalanalysisfor theremainingwords.

3.2 Annotation by LFG Parsing

As analternative to interactive taggingandparsing,abroadcoveragesymbolicLFG grammar(LexicalFunctionalGrammar(Bresnan,1982))is usedto parsepartsof thecorpus(Dipper, 2000).Usually, theLFG grammaroutputsseveralanalysesfor acorpussentence.Theoutputis first filteredby agrammarinternal rankingmechanismandthendisambiguatedby a humanannotator. A transfercomponentconvertstheselectedanalysisinto theTIGERformat(Zinsmeister, Kuhn,& Dipper, 2002).

Oneadvantageof this approachis theaccuracy of thegrammar’s output. An LFG analysisis alwayssyntacticallyconsistent.It doesnot containinconsistenciessuchas,e.g.,missingsubject-verbagree-ment,in caseof which theparsewould have failed. On theotherhand,thegrammarcertainlyis noterror-free. But thoseerrorswhich do occuraresystematicandhenceeasierto correctthanerrorsthatoccurwith manualannotation.

Parsing The LFG grammarappliedin parsinghasbeendevelopedin the ParGramprojectat theUniversityof Stuttgart,usingtheXerox Linguistic Environment(XLE) (ParGram,2002).Theoutputof an LFG grammarbasicallyconsistsof two representations,the constituentstructure(c-structure)of thesentencebeingparsed,andits functionalstructure(f-structure).C-structureencodesinforma-tion aboutmorphology, constituency, and linear ordering. F-structurerepresentsinformationaboutpredicateargumentstructure,aboutmodification,andabouttense,moodetc. An exampleof c- andf-structureis given in Figure3 for thesentenceEin Mann kommt,der lacht (‘a manis comingwholaughs’).

Disambiguation Almost every sentenceof a newspapercorpusis syntacticallyambiguous.Hencetheoutputof apurelysymbolicgrammarhasto bedisambiguated,i.e.ahumanannotatorhasto select

6

Page 7: The TIGER Treebank

thecorrectanalysis.This taskis supportedby XLE which allows for ‘packing’ thedifferentreadingsinto onecomplex f-structurerepresentation.3

On average,however, a sentenceof the TIGER corpusreceives several thousandsof LFG analy-ses. Clearly it is impossibleto disambiguatethoseanalysesmanually. ThereforeXLE provides a(non-statistical)mechanismfor suppressingcertainambiguitiesautomatically(Frank,King, Kuhn,&Maxwell, 1998).By meansof thismechanism,theaveragenumberof solutionsdropsdown to 17, themedianbeing2.

Conversion into TIGER format All informationthatis requiredby theTIGER annotationschemeis containedin c- andf-structurerepresentationsof LFG. ComparetheLFG representation(Figure3)with theTIGER representation(Figure4) of thesentenceEin Mannkommt,der lacht.

(i) LFG c-structurecontainscategorial information(e.g.,NP, CP), lemmas(Mann), part-of-speechtags(+Noun +Common), andmorphologicaltags(+Sg +Masc +Nom); in Figure3, lemmaandtagsareonly shown for the terminalMann. In theTIGER scheme,this informationis encodedin aslightly differentterminology:thenodesandtagsmentionedabove correspondto TIGER nodesNP,S, thepart-of-speechtagNN, andthemorphologicaltagMasc.Nom.Sg, respectively.

(ii) LFG f-structurerepresentsdependency relationssuchas the headargumentrelationSUBJ andtheheadmodifierrelationADJUNCTrel (‘relative clause’).Notethatin c-structure,NP andCP (therelative clause)do not form a constituent;however, their f-structures,SUBJ andADJUNCTrel, arelinked.This linking informationis encodedby acrossingbranchin TIGER,cf. Figure4.

0�

1 2 3�

4 5�

500�

501�

502�

EinART

a

MannNN

Masc.Nom.Sg

man

kommtVVFIN

comes�

,$,

der�

PRELS

who

lacht�

VVFIN

laughs

SB�

HD

NK NK

S�

RC

NP

SB�

HD

S�

Figure4: TIGER representation

Often,thereis aone-to-onecorrespondencebetweenLFG andTIGER representations.In thesecasesthetransfercomponentsimply convertsoneformat into another, e.g.+Sg +Masc +Nom is mappedto Masc.Nom.Sg, CP to S, SUBJ to SB, etc. However, in othercasesthe transferhasto combineinformationbothfrom c- andf-structure,asin thecaseof theextraposedrelative clausein Figure3/4.Herethe transfercomponentmakesuseof the f-structurelink betweenSUBJ andADJUNCTrel toform a (discontinuous)constituent.Thecategorial label(NP) is derivedfrom c-structure.

3FurthermoreXLE providesvariousbrowsingtoolsapplyingto c-structureaswell asto f-structurewhichcanbeusedformanualdisambiguation(cf. (King, Dipper, Frank,Kuhn,& Maxwell, To Appear)wherethesetoolsaredescribedin detail).

7

Page 8: The TIGER Treebank

Results and Outlook Whenparsedwith thecurrentgrammarversion,50%of thecorpussentencesreceiveatleastoneanalysis;approx.70%of theparsedsentencesreceivethecorrectanalysis(possiblyamongothers).4 About2,000sentencesof theTIGERcorpushasbeenannotatedthis way.

Toenlargecoverage,XLE allowsfor partialparses,providing,e.g.,N orPchunks.In first experiments,N chunkswerefoundwith a precisionof 89%anda recall of 67%; for P chunks,theprecisionwas96%,andtherecallwas79%(Schrader, 2001). Furthermore,to minimizemanualeffort, a statisticaldisambiguationtool canbeintegrated(Riezleret al., 2002).

4 Extensions in the TIGER annotation scheme

The annotationin TIGER is basedon the annotationschemethat wasusedfor the NEGRA corpus(Brantset al., 1999).This annotationschemecovereda broadvarietyof phenomena.However, therewasstill room for improvementin its linguistic adequacy. A vital part of the work in the TIGERprojectis thelinguistic extensionof this annotationscheme.In thefollowing, themajorchangesthatweremadein the TIGER projectarepresented.A moredetailedaccountof thesechangesandanevaluationof theimprovedannotationschemecanbefoundin (Brants& Hansen,2002).

4.1 Coordination

An essentiallinguisticextensionin theTIGERannotationschemewasmadeconcerningtheannotationof coordinatedsentencesandverbphrases.In theNEGRA corpus,argumentsthataresharedby bothverbconjunctsof a coordination,but that areonly mentionedonce,werestructurallylinked only tothenearestpart of the coordination.Thus,theNEGRA annotationis in many casesnot suitablefortheextractionof subcategorizationinformation. In theTIGER treebank,thesesharedargumentsareprovided with secondaryedgesin orderto representtheir syntacticrelationto themoredistantverbconjuncts.

0 1 2 3 4 5 6

500 501

502 503

504

Der

ART

the�

Mann

NN

man

liest�

VVFIN

reads

und�KON

and�zerknüllt�VVFIN

rumples

die�

ART

the�

Zeitung�

NN

newspaper

NK NK NK NK

NP

SB HD HD

NP

OA

S

CJ CD

S

CJ

CS

503 502

0 1 2 3 4 5 6

500 501

502 503

504

Der

ART

the�

Mann

NN

man

liest�

VVFIN

reads

und�KON

and�zerknüllt�VVFIN

rumples

die�

ART

the�

Zeitung�

NN

newspaper

NK NK NK NK

NP

SB HD HD

NP

OA

S

CJ CD

S

CJ

CS

SB

OA

Figure5: Coordinationwith sharedargumentsin NEGRA(left) andTIGER(right)

Figure5 illustratesthedifferencebetweenthe NEGRA andtheTIGER treatmentof thesecases.IntheexamplesentenceDer Mannliestundzerknullt dieZeitung(‘the manreadandrumpledthenews-paper’), the commonsubjectof both verbsis the NP der Mann, the commonobject is the NP dieZeitung. However, in theNEGRA annotationscheme,sharedargumentsarelinked only to thenear-estverb (cf. left part of Figure5). The structureof the treewould be exactly the sameif the first

410% of thesentencesfailedbecauseof gapsin themorphologicalanalyser;6% failedbecauseof storageoverflow ortimeouts(with limits setto 100MB storageand100secondsparsingtime). 10%of theparsedsentenceswerenotevaluatedwrt. thecorrectanalysisbecausethey receivedmorethan20 analyses.

8

Page 9: The TIGER Treebank

verbwereintransitive anddid not have die Zeitungasits object(e.g. Der Mann lacht undzerknulltdie Zeitung(‘the manlaughsandrumplesthe newspaper’)). In contrast,the right part of Figure5shows theannotationof thesentenceaccordingto theTIGER annotationscheme,makinguseof thesecondaryedgesthat wereintroduced.Thus,the TIGER schemeallows the differentiationbetweentransitive andintransitive verbsin coordinations.

4.2 Verb-subcategorization

TheNEGRA corpusprovidesno distinctionsbetweenprepositionalphraseswith respectto their syn-tactic functions;all PPs occuringin sentencesor verbphrasesareunexceptionallymarked with thelabelMO (modifier). In theTIGER project,two additionalfunctionlabelsfor PPs wereintroduced:prepositionalobjects(OP) andcollocationalverb constructions(CVC). The label OP is appliedtoconstructionslike auf jemandenwarten(‘to wait for somebody’).Thesephenomenaaremarked bythefactthattheprepositionauf (‘on’) haslost its lexical meaning.

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502

503 504

505

Verhandlungsführer

NN!

negotiators"beider#PIAT$

of_both%Parteien$

NN!

parties&haben'VAFIN have'

sich(PRF$

themselves)

nach"APPR*

according_to+Medienberichten,

NN!

press reports&auf+

APPR*

on%einen-ART*

a+Gesetzesvorschlag.

NN!

draft law/

geeinigt0VVPP agreed+

.1$.2

NK NK AC NK AC3

NK NK

NK

NP

GR OA

PP

MO

PP

MO HD

NP

SB HD

VP

OC

S

Figure6: Annotationof PPsin NEGRA:MO

0 1 2 3 4 5 6 7 8 9 10 11

500 501 502

503 504

505

Verhandlungsführer

NN!

negotiators

beider#PIAT$

of_both%Parteien

NN!

parties&haben

VAFIN have

sich(PRF$

themselves)

nach

APPR*

according_to+Medienberichten

NN! auf+

APPR* einen-

ART* Gesetzesvorschlag

.NN! geeinigt0

VVPP .

$.2

NK NK AC NK AC3

NK NK

NK

NP

AG OA

PP

MO

PP

OP HD

NP

SB HD

VP

OC

S

press reports& on% a+ draft law/

agreed+

Figure7: Annotationof PPsin TIGER: MO andOP

Figure6 exemplifiesthefactthatNEGRAdid notallow thedistinctionbetweencomplementsandad-junctsonthelevel of edgelabels5. In contrast,theTIGERannotation(Figure7) mirrorsthefunctionaldifferencebetweenthe first PP andthe secondPP in the useof differentedgelabels: the first PPis functionally independentof theverbandservesasanadverbial; it still receivesthe labelMO. ThesecondPP representsa typical examplefor a prepositionalobject(OP) in German:theprepositionauf (‘on’) hascompletelylost its lexical meaningandis purelyfunctional.

5A correctEnglishtranslationof theexamplesentenceis the following: ‘Accordingto pressreports,negotiatorsfrombothpartieshave agreedon a draft law’.

9

Page 10: The TIGER Treebank

Theothernewly introducededgelabel for prepositionalphrases,CVC (collocationalverbconstruc-tion), servesto labelverb+ PP constructionsin which themainsemanticinformationis containedinthe nounof the PP, not in the verb. This label canonly be usedwith a very limited classof verbs(usuallya semanticallyweakverbwith anoriginally directionalor local meaning,e.g. stellen,kom-men, etc. (‘to put’, ‘to come’)) thatoccurin connectionwith anequallylimited classof prepositions(mostlyzuandin (‘to’ and‘in’)). A typicalexamplefor this is theGermancollocationalexpressioninKraft treten(literal translation:‘to stepinto force’, meaning:‘to take effect’).

4.3 Expletives

In the TIGER corpus,finer distinctionswith regardto the usageof es, the Germanexpletive, havebeenintroduced.In theNEGRA annotationscheme,only onelabel (PH, meaningplace-holder)wasusedfor thenon-semanticusageof es; in theTIGERscheme,we distinguishthreetypes:

4 Vorfeld es: This typeof esis usedto fill thefirst positionof asentence,calledtheVorfeldslot. Itis markedwith thelabelPH (place-holder).Example:Esnahtein Gewitter (literal translation:‘it approachesa thunderstorm’;meaning:‘a thunderstormis approaching’).As soonasanothercomponentoccupiestheVorfeldslot, theesdisappears:Ein Gewitter naht.

4 Correlative es: This secondtypeof esis alwayscorrelatedto somepropositionalargumentinthesentence.It is usuallyoptional. Example:Mich freutes,dass... (‘it makesmehappy that...’). This typeis alsolabeledasPH but canbeeasilydistinguishedfrom thefirst typebecauseit always occursin connectionwith a propositionalsisternodefunctioning as RE (repeatedelement)(cf. Figure8).

4 Expletive es: Thelasttypeof esfunctionsasa non-thematicargument,e.g. in connectionwithweatherverbs:Heuteregnetes(‘today, it is raining’). It receivesthelabelEP (expletive).

0 1 2 3 4 5 6

500

501

502

Mich

PPER

me

freut

VVFIN5

makes_happy

es6PPER

it

,

$,7 dass

8KOUS

that9

er6PPER

he

kommt

VVFIN5comes:

CP SB HD

PH

S

RE

OA HD

NP

SB

S

Figure8: Correlative es

4.4 Proper nouns

In theNEGRA aswell asin theTIGER corpus,theparentlabelPN is usedto markordinarypropernouns,suchas‘GerhardSchroder’. Thesinglecomponentsreceive theedgelabelPNC (propernouncomponent). Furthermore,the label PN is also usedfor multi-token company names,newspapernames(e.g. ‘The SanFranciscoChronicle’)etc. In theTIGER annotationscheme,theusageof this

10

Page 11: The TIGER Treebank

label wasextendedto cover titles of films, books,exhibitions etc. that have a complex, sometimessentence-like structure.Occurencesof thesephenomenaarefirst annotatedstructurallyandthenre-ceiveanadditionalunaryparentlabelPN. Theexamplesin Figure9 illustratethedifferentannotationsin NEGRA andTIGER.

0 1 2 3 4 5 6 7

500

501

502

Der;ART

the<

Film=NN

movie>"

$(? Einer

@PIS

oneAflogB

VVFIN

flewB

übersC

APPRART

over_theAKuckucksnestD

NN

cuckoo’s_nestE"

$(?

AC NK

SB HD

PP

MO

NK NK

S

NK

NP

0 1 2 3 4 5 6 7

500

501

503

502

DerFARTGtheH

FilmINNJ

movieK"

$(L Einer

MPISNoneO

flogP

VVFINQ

flewP

übersR

APPRARTGover_theO

KuckucksnestS

NNJ

cuckoo’s_nestT"

$(L

ACU

NK

SB HD

PP

MO

S

PNC

NK NK

PN

NK

NP

Figure9: Treatmentof structuredpropernounsin NEGRA (left) andTIGER (right)

Thus,theTIGERannotationpermitstheidentificationof structuresthatfunctionasnames,but donotfeaturethecorrespondingpart-of-speechtagNE (propernoun)in oneof their terminalnodes.

5 TIGERSearch

SyntacticallyannotatedcorporasuchastheTIGER treebankprovide a wealthof informationwhichcanonly beexploitedwith anadequatequerytool andquerylanguage.Thus,apowerful searchenginefor treebankshasbeendevelopedwithin theTIGERproject(Lezius,2002).

Thesearchengineis freely availablefor researchpurposesandcanbedownloadedfrom theTIGERwebsite. To supportall popularplatforms,thetool is implementedin Java. Input filters areprovidedfor many populartreebankformats.A completelist of supportedformatsisgivenin (Lezius,Biesinger,& Gerstenberger, 2002b). In addition,an interfaceformat, TIGER-XML (Mengel& Lezius,2000;Leziuset al., 2002a),hasbeendefined.This formatcanbeusedto import treebanksin otherformats,includingformatswhich have not yet beendeveloped.Thesearchengineis thusa tool which canbeusedby theentirecommunity.

5.1 Query Language

The query language(Konig & Lezius, 2002a,2002b)hasbeendesignedto fulfill two conflictingrequirements:On the one hand, it is closeto grammarformalisms,thus easyto learn. It allowsmodular, understandablecode,evenfor complex queries.A usercanposequeriesintuitively, mappinglinguisticdescriptionsdirectly into thequerylanguage.Ontheotherhand,its expressivenesshasbeenconstrainedto guaranteeefficient queryprocessing.

The querylanguageconsistsof threelevels. On the nodelevel, nodescanbe describedby Booleanexpressionsover feature-value pairs. The following query is matchedby the terminal node lacht(‘laughs’) in Figure4:

[word="lacht" & pos="VVFIN"]

11

Page 12: The TIGER Treebank

On the noderelation level, descriptionsof two or more nodesare combinedby a relation. Sincegraphsaretwo-dimensionalobjects,we needonebasicrelationfor eachdimension.Theseareimme-diateprecedence(“.”) for thehorizontaldimensionandimmediatedominance(“>”) for theverticaldimension.6 Therearealsoderivednoderelationssuchasunderspecifieddominanceor siblings:

>* dominance(minimumpathlength1)>n dominancein V steps( VXWZY )>m,n dominancein [ steps( \^]Z[_]ZV )>L labeleddominance(edgelabelL)>@l leftmostterminalsuccessor(‘left corner’).* precedence(min. numberof intervals: 1)$ siblings

For example,thefollowing queryis matchedby asubgraphof Figure4 (theNP node):

[cat="NP"] >RC [cat="S"]

Finally, on the graph descriptionlevel we allow Booleanexpressionsover noderelations,withoutnegation.For example,asubgraphof Figure4 (thesecondS node)satisfiesthefollowing query:

([cat="S"] > [pos="PRELS"]) &([cat="S"] > [pos="VVFIN"])

Variablescanbe usedto expresscoreferenceof nodesor featurevalues.For example,the two nodedescriptions[cat="S"] in theabove querycould refer to differentnodes.A reformulationof thequeryusingvariablespreventsthis:

(#n:[cat="S"] > [pos="PRELS"]) &(#n > [pos="VVFIN"])

Thefollowing querydescribesa clausethat comprisesa personalpronounanda finite verb (cf. Fig-ure5).

#v:[cat="S"] &#t1:[pos="PPER"] & #t2:[pos="VVFIN"] &(#v > #t1) & (#v > #t2) & arity(#v,2)

In addition,theusercandefinetypehierarchies.Subtypesmayalsobeconstants,e.g.in thecaseofpart-of-speechsymbols.Hereis a partof a typehierarchyfor theSTTStagset:

nominal := noun,properNoun,pronoun.noun := "NN".properNoun := "NE".pronoun := "PPER","PDS","PRELS", ...

6Theprecedenceof two innernodesis definedastheprecedenceof their leftmostterminalsuccessors(Konig & Lezius,2000).

12

Page 13: The TIGER Treebank

Figure10: Visualizationof queryresults

Thishierarchycanbeusedto formulatequeriesin amoreadequateway:

[pos=nominal] .* [pos="VVFIN"]

Therearealsoseveralusefulpredicatessuchasdiscontinuous(#n),continuous(#n) (phrasedoes/doesnot containcrossingbranches)or arity(#n,num) (phrasecomprisesVa`b\ children).Thefollowing examplequerydeterminesextraposedrelative clausesin theTIGER treebank(cf. Fig-ure4):

(#n:[cat="NP"] >RC [cat="S"]) &discontinuous(#n)

To simplify theformulationof morecomplex queries,templatescanbedefined.Thesearedescribedin (Konig & Lezius,2002a,2002b).

5.2 Query Tool

To ensureefficient queryprocessingwe have chosenanindexed-basedapproach.A corpus,encodedin a varietyof externalformats,e.g.,NEGRA/TIGERformat,bracketingformator anXML treebankformat(Mengel& Lezius,2000;Leziusetal., 2002a)is importedandindexed.Many partialsearchesareperformedduringindexing in orderto saveprocessingtimeduringqueryprocessing.Theindexingof a corpusis realizedin a tool calledTIGERRegistry, thecorpusquerytool is calledTIGERSearch.To increaseperformancewe have also implementedqueryoptimizationstrategiesandsearchspacefilters. Thequeryprocessingstrategy is describedin detail in (Lezius,2002).

In order to facilitate corpusexploration, a combinationof searchingfor phenomenaand browsingthroughacorpus,we have developedasophisticatedgraphicaluserinterfacefor theboththeTIGER-

13

Page 14: The TIGER Treebank

Figure11: Graphicalqueryinput

Registry andtheTIGERSearchtool. We have alsodevelopedspecialcorpusvisualizationandqueryinput strategies.

TheTIGERSearchGUI comprisesagraphviewerto view thematchingsentencesof aquery. Figure10illustratesthevisualizationof acorpusgraphthatmatchestheexamplequeryabove. Userscanbrowsethroughthe forestof matchingsentencesusinga navigation bar andexport their favourite matches.Matchescanbe exportedin the TIGER XML format, but alsoasan interactive SVG image. Thus,matchforestscanbeviewedin a formatthatdoesnotdependon theTIGERSearchsoftwaresuite.

We have alsodevelopeda graphicalqueryinput front-endwhich enablesusersto ‘draw’ queriesin avery intuitive way (Voorman,2002). Queriesareexpressedby combiningnodesandnoderelations.Figure11 illustrateshow theexamplequeryabove canbeexpressedusingthegraphicalqueryeditor.

6 Summary and outlook

In this paper, we presentedthe TIGER treebank,the largestandmost comprehensive treebankforthe Germanlanguage.We explainedthe different levels of annotation:part-of-speechtags,phrasecategoriesandsyntacticfunctions.Furthermore,informationaboutlemmataandmorphologyis alsoencodedin the corpus. The methodsof annotation- interactive annotationwith AnnotateandLFGparsing- aswell asthe different representationformatsusedfor the TIGER treebankweredemon-strated.Wealsogave ashortoverview of relatedwork in comparabletreebankprojects.

Moreover, wealsodescribedtheTIGERannotationscheme,which is basedontheannotationschemeusedfor theNEGRAcorpus.Thepaperalsooutlinedthemostimportantextensionsin theTIGERan-notationscheme,which concerntheuseof secondaryedgesin coordinations,verb-subcategorization,finer distinctionsconcerningthe Germanexpletive esanda different treatmentof structuredpropernouns.

14

Page 15: The TIGER Treebank

ThelastsectionpresentedTIGERSearch,aquerytool thatwasdevelopedin theprojectandwhichcanbe usedto exploit the TIGER treebankandseveral othertreebankformats. We explainedthe querylanguage,thatwasdesignedto poseintuitive queriesto thetreebank.We alsoshortly introducedthegraphicalqueryinputTIGERin.

Futurework will beconcernedwith theextensionof theTIGERtreebankto approximately80.000sen-tencesaltogetherandadditionalimprovementsin theannotationscheme.For instance,weinvision theintroductionof furtherdistinctionsconcerningverbalargumentsin orderto facilitatetheidentificationof thematicroles(Smith,2000).For thebeginningof 2003,a first releaseis plannedwhich will con-tain 10.000sentencescompletelyannotated(part-of-speechtagging,syntacticstructure,morphology,lemmata)accordingto thenew TIGER annotationschemeandthoroughlycheckedfor consistency.

All thetoolspresentedin thispaperarefreelyavailablefor researchpurposes.For furtherinformationon thecorpus,thecorpustoolsandhow to obtainthem,pleasereferto theprojectwebpage:

http://www.coli.uni-sb.de/cl/projects/tiger.

References

Abeille, A., Brants,T., & Uszkoreit, H. (Eds.). (2000). Proceedingsof the COLING-2000Post-ConferenceWorkshopon LinguisticallyInterpretedCorpora LINC-2000.Luxembourg.

Abeille,A., Clement,L., & Kinyon,A. (2000).Building a treebankfor French.In Proceedingsof theSecondInternationalConferenceon Language ResourcesandEvaluationLREC-2000(pp. 87– 94). Athens,Greece.

Boguslavsky, I., Grigorieva, S., Grigoriev, N., Kreidlin, L., & Frid, N. (2000). Dependency tree-bankfor Russian:Concept,tools, typesof information. In 18th InternationalConferenceonComputationalLinguisticsCOLING-2000.Saarbrucken,Germany.

Bosco,C., Lombardo,V., Vassallo,D., & Lesmo,L. (2000). Building a treebankfor Italian: A data-drivenannotationschema.In Proceedingsof theSecondInternationalConferenceonLanguageResourcesandEvaluationLREC-2000(pp.99– 106). Athens,Greece.

Brants,S.,& Hansen,S. (2002).Developmentsin theTIGERannotationschemeandtheir realizationin thecorpus.In Proceedingsof theThird Conferenceon Language ResourcesandEvaluationLREC-02.LasPalmasdeGranCanaria,Spain.

Brants,T. (1997). TheNEGRAExport Format (CLAUS ReportNo. 98). Saarbrucken, Germany:Dept.of ComputationalLinguistics,SaarlandUniversity.

Brants,T. (1999). Tagging and parsing with CascadedMarkov Models - automationof corpusannotation. Saarbrucken, Germany: GermanResearchCenterfor Artificial IntelligenceandSaarlandUniversity: Saarbrucken Dissertationsin ComputationalLinguistics and LanguageTechnology(Bd. 6).

Brants,T. (2000a). Inter-annotatoragreementfor a Germannewspapercorpus. In ProceedingsofSecondInternationalConferenceonLanguage ResourcesandEvaluationLREC-2000.Athens,Greece.

Brants,T. (2000b).TnT – A StatisticalPart-of-SpeechTagger. In Proceedingsof theSixthConferenceon AppliedNatural Language ProcessingANLP-2000.Seattle,WA.

15

Page 16: The TIGER Treebank

Brants,T., Hendriks,R., Kramp, S., Krenn, B., Preis,C., Skut, W., & Uszkoreit, H. (1999). DasNEGRA-Annotationsschema(Tech.Rep.).Saarbrucken,Germany: Dept.of ComputationalLin-guistics,SaarlandUniversity.

Brants,T., & Skut,W. (1998). Automationof treebankannotation.In Proceedingsof New Methodsin Language ProcessingNeMLaP-98.Sydney, Australia.

Brants,T., Skut,W., & Uszkoreit,H. (1999).Syntacticannotationof aGermannewspapercorpus.InProceedingsof theATALATreebankWorkshop(pp.69–76).Paris,France.

Bresnan,J. (Ed.). (1982).TheMentalRepresentationof GrammaticalRelations.MIT Press.

Dipper, S. (2000). Grammar-basedCorpusAnnotation. In Proceedingsof LINC-2000(pp. 56–64).Luxembourg.

Frank,A., King, T. H., Kuhn,J.,& Maxwell, J. (1998). Optimality TheoryStyleConstraintRankingin Large-scaleLFG Grammars.In Proceedingsof theLFG98Conference. Brisbane,Australia:CSLI OnlinePublications,http://www-csli.stanford.edu/publications.

Greenbaum,S. (Ed.). (1996). ComparingEnglishworldwide: TheInternationalCorpusof English.Oxford,UK: ClarendonPress.

Hajic, J. (1999). Building a syntacticallyannotatedcorpus: The PragueDependency Treebank.In E. Hajicova (Ed.), Issuesof valencyand meaning. Studiesin honourof Jarmila Panevova.Prague,CzechRepublic:CharlesUniversityPress.

King, T. H., Dipper, S.,Frank,A., Kuhn,J.,& Maxwell, J. (To Appear).Ambiguity ManagementinGrammarWriting. Journalof Language andComputation. (Specialissue)

Konig, E., & Lezius, W. (2000). A descriptionlanguagefor syntacticallyannotatedcorpora. InProceedingsof COLING-2000(pp.1056–1060).Saarbrucken,Germany.

Konig,E.,& Lezius,W. (2002a).TheTIGERlanguage - A DescriptionLanguage for SyntaxGraphs.Part 1: User’s Guidelines.(Tech.Rep.). IMS, University of Stuttgart. (http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/papers/tigerLanguage.ps.gz)

Konig,E.,& Lezius,W. (2002b).TheTIGERlanguage - A DescriptionLanguage for SyntaxGraphs.Part 2: Formal Definition. (Tech.Rep.). IMS, University of Stuttgart. (http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/papers/tigerLangForm.ps.gz)

Kramp,S.,& Preis,C. (2000).Konventionenfur dieVerwendungdesSTTSim NEGRA-Korpus(Tech.Rep.).Saarbrucken,Germany: Dept.of ComputationalLinguistics,SaarlandUniversity.

Leech,G. (1992).TheLancasterParsedCorpus.ICAMEJournal, 16(124).

Lezius,W. (2001).Baumbanken. In K.-U. Carstensen,C. Ebert,C. Endriss,S.Jekat,R. Klabunde,&H. Langer(Eds.),ComputerlinguistikundSprachtechnologie - eineEinfuhrung(pp.377– 385).Heidelberg, Germany: SpektrumAkademischerVerlag.

Lezius,W. (2002).Ein WerkzeugzurSucheaufsyntaktisch annotiertenTextkorpora. IMS, Universityof Stuttgart.(PhDthesis,in preparation)

Lezius,W., Biesinger, H., & Gerstenberger, C. (2002a).TIGER-XMLQuick ReferenceGuide(Tech.Rep.).IMS, Universityof Stuttgart.

16

Page 17: The TIGER Treebank

Lezius,W., Biesinger, H., & Gerstenberger, C. (2002b).TIGERRegistry Manual(Tech.Rep.). IMS,Universityof Stuttgart.

Lezius,W., & Konig, E. (2000). Towardsa searchenginefor syntacticallyannotatedcorpora. InProceedingsof theFifth KONVENSConference. Ilmenau,Germany.

Marcus,M., Kim, G., Marcinkiewicz, M., MacIntyre,R., Bies,A., Gerguson,M., Katz,K., & Schas-berger, B. (1994).ThePennTreebank:Annotatingpredicateargumentstructure.In Proceedingsof theARPA HumanLanguage Technology Workshop.SanFrancisco,CA: MorganKaufman.

Mengel,A., & Lezius,W. (2000). An XML-basedencodingformat for syntacticallyannotatedcor-pora. In Proceedingsof LREC-2000(pp.121–126).Athens,Greece.

Moreno,A., Grishman,R., Lopez,S., Sanchez,F., & Sekine,S. (2000). A treebankof Spanishandits applicationto parsing.In Proceedingsof theSecondInternationalConferenceon LanguageResourcesandEvaluationLREC-2000(pp.107– 112). Athens,Greece.

Oflazer, K., Hakkani-Tur, D., & Tur, G. (1999).Designfor aTurkishtreebank.In Proceedingsof theWorkshoponLinguisticallyInterpretedCorpora LINC-99. Bergen,Norway.

ParGram.(2002).TheParGramProject. (URL: http://www2.parc.com/istl/groups/nltt/pargram/)

Plaehn,O.,& Brants,T. (2000).Annotate- anefficient interactive annotationtool. In ProceedingsoftheSixthConferenceon AppliedNatural Language ProcessingANLP-2000.Seattle,WA.

Riezler, S.,King, T. H., Kaplan,R.,Crouch,R.,Maxwell, J.,& Johnson,M. (2002).ParsingtheWallStreetJournalusingaLexical-FunctionalGrammarandDiscriminative EstimationTechniques.In Proceedingsof theACL-02,.Philadephia,PA.

Sampson,G. (1995). Englishfor thecomputer. TheSUSANNEcorpusandanalyticscheme. Oxford,UK: ClarendonPress.

Schiller, A., Teufel,S.,& Stockert,C. (1999).Guidelinesfur dasTagging deutscher Textcorpora mitSTTS(Tech.Rep.).Universityof Stuttgart,Universityof Tubingen.

Schrader, B. (2001).ModifikationeinerdeutschenLFG-Grammatikfur Partial Parsing. Studienarbeit,Universityof Stuttgart.

Simov, K., Osenova, P., Slavcheva, M., Kolkovska, S., Balabanova, E., Doikoff, D., Ivanova, K.,Simov, A., & Kouylekov, M. (2002). Building a linguistically interpretedcorpusof Bulgarian:the BulTreeBank. In Proceedingsof Third InternationalConferenceon Language ResourcesandEvaluationLREC-2002(pp.1729–1736).LasPalmasdeGranCanaria,Spain.

Skut,W., Brants,T., Krenn,B., & Uszkoreit,H. (1998).A linguistically interpretedcorpusof Germannewspapertext. In Proceedingsof the Conferenceon Language Resourcesand EvaluationLREC-98(pp.705–711).Granade,Spain.

Skut,W., Krenn,B., Brants,T., & Uszkoreit, H. (1997). An annotationschemefor freeword orderlanguages.In Proceedingsof ANLP-97.Washington,D.C.

Smith,G. (2000).Encodingthematicrolesvia syntacticcategoriesin aGermantreebank.In Proceed-ingsof theWorkshoponSyntacticAnnotationof Electronic Corpora. Tubingen,Germany.

17

Page 18: The TIGER Treebank

Smith,G.,& Eisenberg, P. (2000).KommentarezurVerwendungdesSTTSim NEGRA-Korpus(Tech.Rep.).Universityof Potsdam.

Uszkoreit,H., Brants,T., & Krenn,B. (Eds.).(1999).Proceedingsof theWorkshopon LinguisticallyInterpretedCorpora LINC-99. Bergen,Norway.

Voorman,H. (2002). TIGERin– Graphische Eingabevon Suchanfragen in TIGERSearch. IMS,Universityof Stuttgart.(DiplomaThesis,in preparation)

Wahlster, W. (Ed.). (2000). Verbmobil: Foundationsof Speech-to-Speech Translation. Heidelberg,Germany: Springer.

Zinsmeister, H., Kuhn,J., & Dipper, S. (2002). Utilizing LFG Parsesfor TreebankAnnotation. InProceedingsof theLFG02Conference. Athens,Greece.

18