COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic...

15
COLING Workshop - 2002 Nicoletta Calzolari Nicoletta Calzolari ILC - CNR - Pisa, Italy ILC - CNR - Pisa, Italy Language Resources & Semantic Language Resources & Semantic Web Web

Transcript of COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic...

Page 1: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

Nicoletta CalzolariNicoletta CalzolariILC - CNR - Pisa, ItalyILC - CNR - Pisa, Italy

Language Resources & Language Resources & Semantic WebSemantic Web

Page 2: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

To make the Semantic Web a reality ...To make the Semantic Web a reality ...

…need to tackle the twofold challenge of • content availabilitycontent availability and • multilingualitymultilinguality

Natural convergence with HLT:• multilingual semantic processingmultilingual semantic processing• ontologiesontologies• semantic-syntactic computational semantic-syntactic computational

lexiconslexicons

Page 3: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

Computational Multilingual Lexicons: Computational Multilingual Lexicons: an essential component for the an essential component for the

Semantic WebSemantic Web • Language - & lexicons - are the gateway to knowledgeLanguage - & lexicons - are the gateway to knowledge• Semantic Web developers need repositories of wordsrepositories of words & terms - &

knowledge of their relations in language use & ontological classification. • The cost of adding this structured and machine-understandable machine-understandable

lexical informationlexical information can be one of the factors that delays its full deployment.

• The effort of making available millions of ‘words’ for dozens of millions of ‘words’ for dozens of languageslanguages is something that no small groupno small group is able to afford.

• A radical shift in the lexical paradigm - whereby many radical shift in the lexical paradigm - whereby many participants add linguistic content descriptions in an open participants add linguistic content descriptions in an open distributed lexical framework - is required to make the Web distributed lexical framework - is required to make the Web

usableusable

Page 4: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

ButBut … they will never be “complete” … they will never be “complete”

Semantic network: Euro-/ItalWordNetSemantic network: Euro-/ItalWordNetLexicons: PAROLE/SIMPLE/CLIPSLexicons: PAROLE/SIMPLE/CLIPSTreeBankTreeBank

+sw+sw

Infrastructure of Language Infrastructure of Language Resources...Resources...

Lexical acquisitionLexical acquisition systems systems (syntactic & semantic) from text corporafrom text corpora

Robust systems of morphosyntactic & syntactic analysismorphosyntactic & syntactic analysis

Word-senseWord-sense disambiguation systemsdisambiguation systems

...static...static

……dynamicdynamic

International International StandardsStandards

Page 5: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

Italian Semantic NetworkItalian Semantic NetworkItalian module of EuroWordNet

(http://www.hum.uva.nl/~ewn/)

~ 50.00050.000 lemmas organized in synonym groupssynonym groups (synsetssynsets), structured in

hierarchieshierarchies & linked by ~ 130.000130.000 semantic relations

~~50.000 hyperonymy/hyponymy relations~ 16.000 relations among different POS (role, cause, derivation, etc..)~ 2.000 part-whole relations~ 1.500 antonymy relations, …etc.

•Synsets linked to the InterLingual Index linked to the InterLingual Index (ILI=Princeton WordNet),

•Through the ILIILI link to all the European European WordNets WordNets (de-facto standard) & to the common Top OntologyTop Ontology

•Possibility of plug-in with domain terminological lexiconsdomain terminological lexicons

•Usable in IR, CLIR, IE, QA, ...

Page 6: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

Domain - Semantic classDomain - Semantic classDomain - Semantic classDomain - Semantic classmangiaremangiare

Page 7: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

mangiaremangiare

Used_forObject_of_the_activity

man

giare

mangiare

tavola

FURNITURE

forchetta

posata

INSTRUMENT

ristorante

BUILDING

cucinare

cuocere

mestolo

pentola

CONTAINER

mangiare

friggere

friggitrice

bollitore

bollire

pesc

e

pesciera

Is_the_activity_of

cuoco

PROFESSION

cucin

aremangia

re

man

giar

em

angi

are

man

giar

e

coniglio

carne

mela

carota

arrosto

man

giar

e

ARTIFACT _FOOD

VEGETABLES

FRUIT

FOOD

SUBSTANCE_FOOD

+edible

zucchero

alloro

tartufo

VEGETAL_ENTITY

FLAVOURING

NATURAL_SUBSTANCE

AGENTIVE

TELIC

Created_by

cucinare

cuocerearrostirebollirelessarestufare

friggere rosolaregrigliare

……

Domain - Semantic classDomain - Semantic classDomain - Semantic classDomain - Semantic class

Page 8: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

machine language learningmachine language learning

Page 9: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

machine language learningdevelopment of conceptual networksdevelopment of conceptual networks

linguistic learninglinguistic learning

adaptive classification systemsadaptive classification systems

information extractioninformation extraction

bootstrappingbootstrapping of grammars of grammars

linguistic change modelslinguistic change models

language usage modelslanguage usage models

bootstrapping bootstrapping of lexical informationof lexical information

Page 10: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

Beyond MILE:Beyond MILE: towards open & distributed lexiconstowards open & distributed lexicons

Semantic LexiconSemantic Lexicon

URI = http://www.xxx…

Syntactic Syntactic ConstructionsConstructions

URI = http://www.yyy…

OntologyOntology

URI = http://www.zzz…

Monolingual/MultilingualMonolingual/Multilingual LexiconLexicon

Lex_object: semFeature

URI = http://www.xxx…#HUMAN

Lex_object: syntagmaNT

URI = http://www.zzz…#NP

Page 11: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

Target…..Target….. Multilingual Knowledge Multilingual Knowledge ManagementManagement Technical Technical

Feasibility:Feasibility:

Target…..Target….. Multilingual Knowledge Multilingual Knowledge ManagementManagement Technical Technical

Feasibility:Feasibility: Prerequisite:Prerequisite: is it an achievable goalachievable goal a commonly commonly

agreedagreed text/lexicon annotation protocol also for text/lexicon annotation protocol also for the semantic/conceptual levelthe semantic/conceptual level (to be able to automatically establish links among different languages)?

YesYes, at the lexicallexical level

More complex, for corpus annotation?More complex, for corpus annotation?

EAGLES/ISLEEAGLES/ISLE

Page 12: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

A few Issues for discussion:A few Issues for discussion:lexicon standardslexicon standards

• Semantic Web standardsSemantic Web standards and the needs of content content processing technologies: processing technologies:

– importance of reaching consensus on (linguistic and non-linguistic) “content”“content”, in addition to agreement on formats and encoding issues (…words convey content & knowledge)

– short/medium term requirements wrt standards for standards for multilingual lexicons & content encodingmultilingual lexicons & content encoding, also industrial requirementsindustrial requirements

• Relation with Spoken language Relation with Spoken language community

• MILE & MILE & Asian languagesAsian languages: : how to cooperate how to cooperate concretely?concretely?

• Define further stepsfurther steps necessary to converge on common prioritiespriorities

• ….

Page 13: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

A few Issues for discussion:A few Issues for discussion:“content”, priorities...“content”, priorities...• For which type of resources to invest?For which type of resources to invest? wrt short vs.

medium term results?

• Need for robust systems, able to acquire/tune robust systems, able to acquire/tune lexical/linguistic lexical/linguistic (also multilingual) knowledge knowledge, to auto-enrich static basic resources?

• What the relation betw. lexical standards and text relation betw. lexical standards and text annotation protocols?annotation protocols?

• Knowledge management is critical. For “content” For “content” interoperabilityinteroperability, is the field ‘mature’ enough to converge‘mature’ enough to converge around agreed standards also for the semantic/conceptual level (e.g. to automatically establish links among different languages)?

• Is the field of multilingual lexical resources ready to tackle ready to tackle the challenges set by the Semantic Webthe challenges set by the Semantic Web development?

Towards a new paradigm??Towards a new paradigm??

Page 14: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

A new paradigm for LR?A new paradigm for LR?

Where the focus is on cooperationcooperation

New Strategic Vision?New Strategic Vision?

towards a Distributed Open Lexical Distributed Open Lexical Infrastructure?Infrastructure?

• for distributed & cooperative creationdistributed & cooperative creation, management, etc. of Lexical Resources

• technical & organisational requirementstechnical & organisational requirements

Page 15: COLING Workshop - 2002 Nicoletta Calzolari ILC - CNR - Pisa, Italy Language Resources & Semantic Web.

COLING Workshop - 2002

““ELITE” ELITE” (expression of interest for the 6thFP)(expression of interest for the 6thFP)

“European Lexical Infrastructure and Lexical Infrastructure and TechnologyTechnology”

New proposed paradigm for lexicon development:

Open & Distributed Lexical InfrastructureOpen & Distributed Lexical Infrastructurefor content description and content interoperability, to make lexical resources usable within the emerging Semantic WebSemantic Web scenario

Language Resources & Language Resources & Semantic WebSemantic Web