Kybots, knowledge yielding robotsadimen.si.ehu.es/~rigau/research/Doctorat/LSKBs/09-NLP... · 2010....
Transcript of Kybots, knowledge yielding robotsadimen.si.ehu.es/~rigau/research/Doctorat/LSKBs/09-NLP... · 2010....
ICT-211423
KYOTO (ICT-211423) Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organizationhttp://www.kyoto-project.eu/
Kybots, knowledge yielding robotsGerman RigauIXA group, UPV/EHU
ICT-211423
KYOTO Overview
Title: Knowledge Yielding Ontologies for Transition-Based Organization Funded:
7th Framework Program-ICT of the European Union: Intelligent Content and Semantics Taiwan and Japan funded by national grants
Goal: Platform for knowledge sharing across languages and cultures Knowledge transition and information across different target groups, transgressing
linguistic, cultural and geographic boundaries. Open text mining and deep semantic search Wiki environment that allows people in the field to maintain their knowledge and agree
on meaning without knowledge engineering skills URL: http://www.kyoto-project.eu/ Duration:
March 2008 – March 2011 Effort:
364 person months of work.
ICT-211423
KYOTO Overview
Languages: English, Dutch, Italian, Spanish, Basque, Chinese, Japanese
Domain: Environmental domain, BUT usable in any domain
Global: Both European and non-European languages
Available: Free: as open source system and data (GPL)
Future perspective: Content standardization that supports world wide communication Global Wordnet Grid
ICT-211423
Consortium
1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2. Consiglio Nazionale delle Ricerche (Pisa, Italy), 3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,
Germany), 4. Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5. Academia Sinica (Tapei, Taiwan), 6. National Institute of Information and Communications Technology
(Kyoto, Japan), 7. Irion Technologies (Delft, The Netherlands), 8. Synthema (Rome, Italy), 9. European Centre for Nature Conservation (Tilburg, The
Netherlands), • Subcontractors:
– World Wide Fund for Nature (Zeist, The Netherlands), – Masaryk University (Brno, Czech)
ICT-211423
ICT-211423
ICT-211423
Ultimate goal
Global standardisation and anchoring of meaning such that: Machines can approach text understanding ->
semantic web connects to the current web Communities can dynamically maintain knowledge,
concepts and their terms in an easy to use system Cross-linguistic and cross-cultural sharing and
communication of knowledge is enabled Comparable to a formalization of Wikipedia for
humans AND machines across languages
ICT-211423
Work Package List
364TOTAL
36126VUADisseminationWP11
36198SYNTHEMAExploitationWP10
33420ECNCEvaluationWP9
301312ECNCDomain extensionWP8
24125CNR-ILC-IITDatabase systems and wikiWP7
244106BBAWKnowledge integrationWP6
307120EHUKnowledge miningWP5
12411IRIONIndexingWP4
9110IRIONCaptureWP3
6112SYNTHEMASystem designWP2
615VUAUser requirementsWP1
3619VUAManagementWP0
EndStart PMLead partic.Work package titleWP No
ICT-211423
ICT-211423
Major achievements
ICT-211423
KAF/ont
Capture
Document base
Job dispatcher
PipeT
Html
KAFDB
KAFDB
KAF/lp
MW-taggerSense-taggerNE-tagger
ON-taggerTybot
Sense-tagger
NE-tagger
ON-taggerKybot
LP-server
LP-server
Facts
GeoNames
Wterms
Profiles
KAF/ont
Capture
Document base
Job dispatcher
PipeT
KAFDB
KAF/lp
KAFDB
KAF/lp
NE-tagger
ON-taggerTybot
NE-tagger
ON-taggerKybot
LP-server
LP-server
FactsWterms
Profiles
Knowledge Repository
ModulesModules
LP-client
NE-taggerON-tagger
Tybot
MW-tagger
NE-tagger
Kybot
Sense-taggerUKB
KyotoCore
MW-tagger
ICT-211423
System components
Generic ontologies and databases SUMO, DOLCE Geo databases Wikipedia
Generic linguistic resources Wordnet FrameNet
Tybots: Term yielding robots Kybots: knowledge yielding robots Wikyoto: wiki system for yielding domain wordnets and
domain ontologies in social communities
ICT-211423
OWL-DL
Conceptual Pattern:
OpenCyc
FrameNet
EWN-TO
SUMO
DOLCE
KG-ONT
WN1.5
WN1.6
WN2.0WN3.0
WN1.7
Extracted TermsGeneric K-TMF
Term Editor(Wikyoto)
Domain OntologyOWL-DL
Domain WordnetK-LMF
Kybot Server(Fact Extraction)
Document BaseKAF
Kybot Editor
Kybot Profiles
Concept User
Fact User
Generic WordnetK-LMF
Expression rules:-N+N-N+prep+N-Nsubj+V
causes, Process1, Process2
patient, Quantity, Increasing
ICT-211423
KAF: Kyoto Annotation FrameworkKAF is the input of both:
Tybot: term extraction Kybot: fact extraction
Word forms Terms / items Chunks Dependencies WSD / SRL Events Quantifiers Time expressions General Relations
ICT-211423
KAF word forms
“John taught mathematics 20 minutes every Monday in New York.”
<text>
<wf wid="w1">John</wf>
<wf wid="w2">taught</wf>
<wf wid="w3">mathematics</wf>
<wf wid="w4">20</wf>
<wf wid="w5">minutes</wf>
<wf wid="w6">every</wf>
<wf wid="w7">Monday</wf>
<wf wid="w8">in</wf>
<wf wid="w9">New</wf>
<wf wid="w10">York</wf>
<wf wid="w11">.</wf>
</text>
ICT-211423
KAF terms “John taught mathematics 20 minutes every Monday in New York.”
<terms>
<term tid="t1" span="w1" type="entity" lemma="John" pos="N" netype="person"></term>
<term tid="t2" span="w2" type="open" lemma="teach" pos="V">
<senseAlt>
<sense sensecode="EN-17-00861095-v" weight="0.80"/>
<sense sensecode="EN-17-00859568-v" weight="0.20"/>
</senseAlt>
</term>
<term tid="t3" span="w3" type="open" lemma="mathematics" pos="N">
<senseAlt>
<sense sensecode="EN-17-04597590-n" weight="1.0"/>
</senseAlt>
</term>
<term tid="t4" span="w4" type="entity" lemma="20" pos="Z" netype="number"></term>
...
ICT-211423
KAF terms...
<term tid="t5" span="w5" type="open" lemma="minute" pos="N"></term>
<senseAlt>
<sense sensecode="EN-17-12621100-n" weight="0.80"/>
<sense sensecode="EN-17-12631889-n" weight="0.06"/>
<sense sensecode="EN-17-12630443-n" weight="0.01"/>
<sense sensecode="EN-17-11241911-n" weight="0.01"/>
<sense sensecode="EN-17-05339359-n" weight="0.01"/>
<sense sensecode="EN-17-04316149-n" weight="0.01"/>
</senseAlt>
<term tid="t5" span="w6" type="close" lemma="every" pos="D"></term>
<term tid="t6" span="w7" type="entity" lemma="Monday" pos="N" netype="date"/>
<senseAlt>
<sense sensecode="EN-17-12557842-n" weight="1.0"/>
</senseAlt>
<term tid="t7" span="w8" type="close" lemma="in" pos="P"></term>
<!-- multiword form -->
<term tid="t8" span="w9 w10" type="entity" lemma="New_York" pos="N”netype="location"></term>
</terms>
ICT-211423
KAF chunks
<chunks>
<!-- John -->
<chunk cid="c1" span="t1" head="t1" pos="NP"/>
<!-- mathematics -->
<chunk cid="c2" span="t3" head="t3" pos="NP"/>
<!-- in New York -->
<chunk cid="c3" span="t7 t8" head="t4" pos="PP"/>
</chunks>
ICT-211423
KAF events
<events>
<event eid="e1" span="t2" lemma="teach" pos="V" eiid="ei1" class="OCCURRENCE"
tense="PAST" aspect="NONE" polarity="POS">
<roles>
<role cid="c1" role="agent"/>
<role cid="c2" role="subject"/>
<role cid="c3" role="location"/>
</roles>
</event>
</events>
ICT-211423
KAF quantifiers & time expressions
<!-- every -->
<quantifiers>
<quantifier qid=”q1” span=”t5”/>
</quantifiers>
<!-- 20 minutes every monday -->
<timexs>
<timex3 texid="tex1" span="t4 t5" type="DURATION" value="P20TM"/>
<timex3 texid="tex2" span="t5 t6" type="SET" value="xxxx-wxx-1" quant="EVERY"/>
<tlink timeID="tex1" relatedToTime="tex2" relType="IS_INCLUDED"/>
<tlink eventInstanceID="ei1" relatedToTime="tex1" relType="SIMULTANEOUS"/>
</timexs>
ICT-211423
What Tybots do...
Input are text documents Linguistic processors generate KAF annotation:
morpho-syntactic analysis semantic roles named entities wordnet and ontology mappings
Output are term hierarchies in TMF: structural parent relations quantified structural and semantic relations statistical data generalized semantic mappings
ICT-211423
ICT-211423
Polluting activities (?)
Basque Mountain range (?)
Oaktree mixed forest (?)
ICT-211423
Infomap BNC + SSI-Dijkstraassociate -n 20 -c BNCpos3prova "tropicalpa" "speciespn"tropical|a 0.953014species|n 0.953014birds|n 0.926641mammals|n 0.908901invertebrates|n 0.889433breeding|n 0.881263temperate|a 0.876306prey|n 0.873921bird|n 0.869077whales|n 0.865983insects|n 0.861247habitat|n 0.854986predators|n 0.853619butterflies|n 0.845556frogs|n 0.827578genus|n 0.827000fauna|n 0.822362arctic|a 0.821317habitats|n 0.820968seals|n 0.818886animals|n 0.815580...
ICT-211423
Infomap + SSI-Dijkstra[rigau@adimen MCRGraphDistances]$ ./SSI-Dijkstra-en30.plReading Graph from file ...Polysemous: tropical|a 4Polysemous: species|n 2Polysemous: breeding|n 5Polysemous: temperate|a 3Polysemous: prey|n 2Polysemous: bird|n 5Monosemous: habitat|n 1Polysemous: genus|n 2Polysemous: fauna|n 2Interpretation: breeding n 00914929-n 0.464285714285714 7 the production of animals or
plants by inbreeding or hybridization Interpretation: fauna n 00015388-n 0.5 1 a living organism characterized by voluntary
movement Interpretation: temperate a 02402559-a 0.383333333333333 5 (of weather or climate) free
from extremes; mild; or characteristic of such weather or climateInterpretation: habitat n 08580583-n 0 0 the type of environment in which an organism or
group normally lives or occursInterpretation: bird n 01503061-n 0.4375 8 warm-blooded egg-laying vertebrates
characterized by feathers and forelimbs modified as wings Interpretation: species n 08110373-n 0.416666666666667 2 (biology) taxonomic group
whose members can interbreed Interpretation: tropical a 02443907-a 0.347222222222222 6 relating to or situated in or
characteristic of the tropics (the region on either side of the equator)Interpretation: prey n 02152881-n 0.555555555555555 3 animal hunted or caught for food Interpretation: genus n 08108972-n 0.583333333333333 4 (biology) taxonomic group
containing one or more species
ICT-211423
Concept mining by Tybots
SourceDocuments
LinguisticProcessors
[[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP
Morpho-syntactic analysis
English Wordnet
emission:2gas:1
area:1
greenhouse gas:1
rural area:1
geographical area:1
regio:3
location:3 substance:1
emission:3
farmland:2
natural process:1
in
ofTerm hierarchy
emissiongas
greenhouse gas
area
agricultural area
ConceptMiners
ICT-211423
Kybots, knowledge yielding robots
What kybots do? Mining module architecture Kybot profiles
Current capabilities Running kybots
XQuery Performance
Building Kybots Mining by example Machine Learning / Active Learning
Next steps
ICT-211423
Knowledge Mining Concept mining (Tybot)
Extract terms and relations in a language
Map the terms to an existing wordnet
Ontologize terms to concepts and axioms
Fact mining (Kybot) Define morpho-syntactic and semantic patterns in text
Extract events from text
Collect events and extract facts
For all languages! KAF (Kyoto Annotation Format) is the input of both:
Tybot: term extraction Kybot: fact extraction
ICT-211423
Linguistic Processors KAF (Kyoto Annotation Format)
English: Synthema Dutch: VUA Italian: Synthema Basque: EHU Spanish: EHU Chinese: AS Japanese: NICT
MW detection: VUA Word Sense Disambiguation module (UKB): EHU NE Tagger: Irion OntoTagger: CNR-ILC, EHU
ICT-211423
Linguistic Processors
KAF XML files include sections for: Word forms Terms / Items Chunks: grouping of sequences of terms Dependencies: syntactic relations between terms WSD: WN senses of the term Ontological references of the term:
Base Concepts Explicit ontology
Events Quantifiers, Time expressions, General Relations ...
ICT-211423
Fact Mining: Kybots
Tropical terrestrial species populations declined by 55 per cent on average from 1970 to 2003
Tropical terrestrial species populations declined by 55 per cent on average from 1970 to 2003
+ Linguistic Processing: POS, chunks, dependencies, ...+ Semantic Processing: WSD (=>WN => ontology)
KAF
+ Kybot profiles: morphosyntactic + semantic patterns+ Mining Module: Events / Facts
ICT-211423
Mining Module Architecture
Central XML DB stores Documents (in all languages) Kybots (organized in libraries)
Kybots are executed using Xqueries on the XML DB
ICT-211423
Mining Module capabilities
Load KAF documents Converts KAF to internal representation
Explicit boundaries: sentence, paragraphs, etc. Indexing
Exporting to KAF Application of Kybots Listing content ...
ICT-211423
Kybot application
User uploads a Kybot profile to the collection User applies a Kybot (Kybot-pipeline) to Docs
Or a subset of docs (ex. only a language) Some Kybots add information to existing Docs
Events (layer 1) Some Kybots create new facts
FactAF (layer 2) Also, keep track of which kybot created which fact
ICT-211423
Layered Kybots
Layer 1 Kybots:Input: KAF (+ MW, WSD, NE, Ontological Information)Output: events (and roles)
Layer 2 Kybots: Input: events (from different docs, languages) Output: facts
ICT-211423
Kybot profiles
Use XML syntax to define the kybots Self descriptive (for manual Kybot creation) Powerful expressions
terms: POS Lemma Senses, Base Concepts Ontological references
suffix/prefix expressions conjunction, disjunction, optionality Negation
Efficient Able to manage thousands of KAF documents
ICT-211423
Fact Mining: Kybot profiles
Kybot profiles consist of: Expression Rules
Morpho-syntactic conditions on the LPs outcomes Flexible enough for dealing with all KAF outputs
Semantic conditions: WordNets + Ontologies Inferencing on WN / ontology !
Output Template Event / Fact descriptions
ICT-211423
Fact Mining: Kybot profiles
For each analysed sentence : IF
Expression Rules match and Semantic Conditions hold
THEN generate the Output Template
How to make efficient inferencing on WN / ontology? ... while processing very large volumes of KAF
WN => Nominal and Verbal Base Concepts ! Ontology => Explicit Ontology !
ICT-211423
Kybot profiles
<?xml version="1.0" encoding="utf-8"?>
<Kybot id="Generate_Pollution">
<variables><var name="X" type="term" pos="N"/><var name="Y" type="term" lemma="release | produce | generate | ! create"/><var name="Z" type="term" lemma="*pollution | pollutant | contaminant"/>
</variables>
<relations><root span="X"/><rel span="Y" pivot="X" direction="following"/><rel span="Z" pivot="Y" direction="following"/>
</relations>
<events><event target="$Y/@tid" lemma="$Y/@lemma" pos="$Y/@pos"/><role target="$X/@tid" rtype="source" lemma="$X/@lemma" pos="$X/@pos"/> <role target="$Z/@tid" rtype="patient" lemma="$Z/@lemma" pos="$Z/@pos"/>
</events>
</Kybot>
ICT-211423
Kybot profiles
<?xml version="1.0" encoding="utf-8"?>
<Kybot id="Generate_Pollution">
<variables><var name="X" type="term" pos="N"/><var name="Y" type="term" lemma="release | produce | generate | ! create"/><var name="Z" type="term" lemma="*pollution | pollutant | contaminant"/>
</variables>
<relations><root span="X"/><rel span="Y" pivot="X" direction="following"/><rel span="Z" pivot="Y" direction="following"/>
</relations>
<events><event target="$Y/@tid" lemma="$Y/@lemma" pos="$Y/@pos"/><role target="$X/@tid" rtype="source" lemma="$X/@lemma" pos="$X/@pos"/> <role target="$Z/@tid" rtype="patient" lemma="$Z/@lemma" pos="$Z/@pos"/>
</events>
</Kybot>
Variables
ICT-211423
Kybot profiles
<?xml version="1.0" encoding="utf-8"?>
<Kybot id="Generate_Pollution">
<variables><var name="X" type="term" pos="N"/><var name="Y" type="term" lemma="release | produce | generate | ! create"/><var name="Z" type="term" lemma="*pollution | pollutant | contaminant"/>
</variables>
<relations><root span="X"/><rel span="Y" pivot="X" direction="following"/><rel span="Z" pivot="Y" direction="following"/>
</relations>
<events><event target="$Y/@tid" lemma="$Y/@lemma" pos="$Y/@pos"/><role target="$X/@tid" rtype="source" lemma="$X/@lemma" pos="$X/@pos"/> <role target="$Z/@tid" rtype="patient" lemma="$Z/@lemma" pos="$Z/@pos"/>
</events>
</Kybot>
Relations
ICT-211423
Kybot profiles
<?xml version="1.0" encoding="utf-8"?>
<Kybot id="Generate_Pollution">
<variables><var name="X" type="term" pos="N"/><var name="Y" type="term" lemma="release | produce | generate | ! create"/><var name="Z" type="term" lemma="*pollution | pollutant | contaminant"/>
</variables>
<relations><root span="X"/><rel span="Y" pivot="X" direction="following"/><rel span="Z" pivot="Y" direction="following"/>
</relations>
<events><event target="$Y/@tid" lemma="$Y/@lemma" pos="$Y/@pos"/><role target="$X/@tid" rtype="source" lemma="$X/@lemma" pos="$X/@pos"/> <role target="$Z/@tid" rtype="patient" lemma="$Z/@lemma" pos="$Z/@pos"/>
</events>
</Kybot>
Output Template
ICT-211423
Kybot profiles: Output
<?xml ver si on="1. 0"?><kybot Out > <doc shor t name="1534. mw. wsd. ne. ont o. kaf "> <event t ar get ="t 886" l emma="gener at e" pos="V" ei d="e1" / > <r ol e t ar get ="t 884" r t ype="sour ce" l emma="wat er shed" . . . / > <r ol e t ar get ="t 892" r t ype="pat i ent " l emma="pol l ut i on" . . . / > </ doc> <doc shor t name="17795. mw. wsd. ne. ont o. kaf "> <event t ar get ="t 9690" l emma="r el ease" pos="V" ei d="e1" / > <r ol e t ar get ="t 9691" r t ype="pat i ent " l emma="pol l ut ant " . . . / > <r ol e t ar get ="t 9678" r t ype="sour ce" l emma="f uel " . . . / > <r ol e t ar get ="t 9680" r t ype="sour ce" l emma="heat i ng" . . . / > <r ol e t ar get ="t 9681" r t ype="sour ce" l emma="machi ner y" . . . / > <r ol e t ar get ="t 9683" r t ype="sour ce" l emma="equi pment " . . . / > <r ol e t ar get ="t 9686" r t ype="sour ce" l emma="househol d" . . . / > <r ol e t ar get ="t 9688" r t ype="sour ce" l emma="busi ness" . . . / > </ doc></ kybot Out >
ICT-211423
Kybot profiles: Ontological references<?xml ver si on="1. 0" encodi ng="ut f - 8"?><! - - N pr oduces changes of pol l ut i on l evel - - >
<Kybot i d="Changes_Pol l ut i on"><var i abl es>
<var name="A" t ype="t er m" pos="N" / ><var name="B" t ype="t er m" r ef er ence="Kyot o#measur e__quant i t y__amount - eng- 3. 0- 00033615- n" r ef t ype="SubCl assOf " / ><var name="C" t ype="t er m" pos="P" / ><var name="D" t ype="t er m" r ef er ence="Kyot o#cont ami nat i on__pol l ut i on- eng- 3. 0- 00276987- n" r ef t ype="SubCl assOf " / >
</ var i abl es><r el at i ons>
<r oot span="D" / ><r el span="C" pi vot ="D" di r ect i on="pr ecedi ng" / ><r el span="B" pi vot ="C" di r ect i on="pr ecedi ng" / ><r el span="A" pi vot ="B" di r ect i on="pr ecedi ng" / >
</ r el at i ons><event s>
<event t ar get ="$B/ @t i d" l emma="$B/ @l emma" pos="$B/ @pos" / ><r ol e t ar get ="$A/ @t i d" r t ype="agent " l emma="$A/ @l emma"
pos="$A/ @pos" / > <r ol e t ar get ="$D/ @t i d" r t ype="pat i ent " l emma="$D/ @l emma"
pos="$D/ @pos" / ></ event s></ Kybot >
ICT-211423
Kybots and OntoTagger
Ontology events roles Fillers e.g. BirdMigration (role: agent; filler: Bird)
Linguistic realizations: migration of birds bird migration birds migrate ...
ICT-211423
Explicit Ontology Explicit knowledge:
Kyoto#migration SubClassOf Kyoto#active-change-of-location Kyoto#migration Kyoto#done-by Collections.owl#physical-plurality
Implicit knowledge: Kyoto#migration SubClassOf Kyoto#change_of_location__movement_11-eng-3.0-00280586n
inherited Kyoto#migration SubClassOf Kyoto#change-eng-3.0-00191142-n inherited Kyoto#migration SubClassOf DOLCE-Lite.owl#accomplishment inherited Kyoto#migration SubClassOf DOLCE-Lite.owl#event inherited Kyoto#migration SubClassOf DOLCE-Lite.owl#perdurant inherited Kyoto#migration SubClassOf DOLCE-Lite.owl#spatio-temporal-particular inherited Kyoto#migration SubClassOf DOLCE-Lite.owl#particular inherited Kyoto#migration Kyoto#has-path DOLCE-Lite.owl#particular inherited Kyoto#migration Kyoto#has-destination DOLCE-Lite.owl#particular inherited Kyoto#migration Kyoto#has-source DOLCE-Lite.owl#particular inherited Kyoto#migration DOLCE-Lite.owl#has-quality DOLCE-Lite.owl#temporal-location_q inherited Kyoto#migration DOLCE-Lite.owl#specific-constant-constituent DOLCE-Lite.owl#perdurant
inherited Kyoto#migration DOLCE-Lite.owl#participant DOLCE-Lite.owl#endurant inherited Kyoto#migration DOLCE-Lite.owl#part DOLCE-Lite.owl#perdurant inherited Kyoto#migration DOLCE-Lite.owl#has-quality DOLCE-Lite.owl#temporal-quality inherited
ICT-211423
Kybot profiles: Output
<?xml ver si on="1. 0"?><kybot Out > <doc shor t name="11767. mw. wsd. ne. ont o. kaf "> <event t ar get ="t 3494" l emma="be" pos="V" ei d="e1" / > <r ol e t ar get ="t 3493" r t ype="agent " l emma="pol l ut i on" . . . / > <r ol e t ar get ="t 3504" r t ype="pat i ent " l emma=" i ndust r i al _f aci l i t y" . . . / > <event t ar get ="t 3687" l emma="change" pos="N" ei d=”e2”/ > <r ol e t ar get ="t 3683" r t ype="agent " l emma="pr eci pi t at i on" . . . / > <r ol e t ar get ="t 3690mw" r t ype="pat i ent " l emma="pol l ut i on l evel " . . . / > <event t ar get ="t 3737" l emma="be" pos="V" ei d="e3" / > <r ol e t ar get ="t 3736" r t ype="agent " l emma="pi pe" . . . / > <r ol e t ar get ="t 3742mw" r t ype="pat i ent " l emma="pol l ut i on l evel " . . . / > <event t ar get ="t 5833" l emma="change" pos="V" ei d="e4" / > <r ol e t ar get ="t 5826" r t ype="agent " l emma="al ga" . . . / > <r ol e t ar get ="t 5836mw" r t ype="pat i ent " l emma="pol l ut i on l evel " / > <event t ar get ="t 7378" l emma="be" pos="V" ei d="e5" / > <r ol e t ar get ="t 7377" r t ype="agent " l emma="t her e" / > <r ol e t ar get ="t 7383mw" r t ype="pat i ent " l emma="pol l ut ed st r eam" . . . / > </ doc></ kybot Out >
ICT-211423
Kybot profiles: Performance
Benckmark database 3 documents 26,137 word forms 96Mb KAF documents, 741Mb dbxml index
Estuary database 4.624 documents 3,091,181 word forms 8.2Gb KAF documents, 45Gb dbxml index
ICT-211423
Next steps Selecting the most appropriate senses Improve KAF representation for explicit ontology
Ontology concepts are coarser than senses Chunk level queries
Search for a term and then a chunk whose head is ... Inter-chunk searches
Search for a term and then, in the same chunk, another one which ...
Layer-2 Kybots Amalgamate events from several documents and
languages Generic Kybots Creating Kybots
Mining by example Machine learning / Active Learning
ICT-211423
Generic Kybots: Kybots and Ontology
Ontology Events, roles, Fillers e.g. BirdMigration (role: agent; filler: Bird)
Linguistic realizations: migration of birds bird migration birds migrate ...
OntoTagger: “migration” --> BirdMigration event (role: agent; filler: Bird) “migrate” --> BirdMigration event (role: agent; filler: Bird) “robin” --> Bird
ICT-211423
Generic Kybot: Rules
migration of birds N1 of N2 --> O1 event (role: agent; filler: O2) IF
N1 is event O1 AND N2 is concept O2 ANDO1 has role agent ANDO2 is (subsumed by) filler of agent of O1
robin migrate N V --> O1 event (role: agent; filler: O2) IF
V is event O1 ANDN is concept O2 ANDO1 has role agent ANDO2 is (subsumed by) filler of agent of O1
ICT-211423
Building Kybots: Mining by example
Kybots perform a complex Information Extraction (IE) task requiring expertise on: linguistic engineering knowledge engineering ...
but ... all this complexity could be hidden to the end-user
Our proposal is to build complex kybots using an advanced wiki system following a new approach: Mining by example
ICT-211423
Building Kybots: Mining by example
Kybot editor allows to mine by example the domain corpus for helping users to define Kybot profiles
Users define kybots of their interest ... Input:
a collection of captured domain documents a set of information needs or questions a set of textual snippets which support the
answers to the questions Output:
a collection of Kybot profiles
ICT-211423
Building Kybots: Mining by example
a) Use a basic IR system consulting the domain corpus. input: "population decline", "decrease population", ...
b) Inspecting the resulting snippets.
c) A kybot profile is defined selecting the relevant information from each snippet how many, where, when, ...
d) Kybots are applied on the document collection. Kybots use all the capabilities of the linguistic
processors, including domain wordnet, general wordnets, ontologies, inferencing, etc.
ICT-211423
Building Kybots: Mining by example information need:
“reduction of populations"
Looking for answers to the following questions: Which species? Degree of the reduction? Period of time?
Textual snippet supporting the answers: “Tropical terrestrial species populations declined by 55
percent on average from 1970 to 2003”
Resulting Kybot profile: kybot_decrease_of_population
ICT-211423
Building Kybots: Mining by example “Tropical terrestrial species populations declined by 55
per cent on average from 1970 to 2003”
declined is enriched now with KAF information: Word form: “declined” Part-of-speech: Verb Lemma: “decline” Linguistic references to other elements in text ... Ranked list of senses Wordnet information: Base Concepts, ... Ontological information, ... ...
ICT-211423
Building Kybots: Mining by example
http://xmlgroup.iit.cnr.it/cocoon/kybot/index.xql
ICT-211423
Building Kybots: Mining by example
http://xmlgroup.iit.cnr.it/cocoon/kybot/index.xql
ICT-211423
ICT-211423
WP5 Fact Mining
ICT-211423
WP5 Fact Mining
ICT-211423
Linguistic Processors
http://adimen.si.ehu.es/web/MCR
ICT-211423
Building Kybots: Mining by example
A Wiki system will allow users to select/edit KAF information for building kybot profiles
general linguistic and semantic patterns
For instance: kybot_decrease_of_population Looking for the degree of decrement:
55% 75 percent ...
when it is a decrement of population ... decline, worsen, ... concepts, base concepts, ontologies ... The class of verb of change followed by preposition followed
by... ...
ICT-211423
Open issues Expressivity of the Kybot profiles
Focussing on Dependencies ... Focusing on Chunks ... Combination of terms/dependencies/chunks Output templates / KAF transformations ...
Running kybots XSLT / XQUERY scripts Eficiency vs. expressivity Internal KAF representation for efficiency / indexing Combination of kybots ...
ICT-211423
KYOTO (ICT-211423) Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organizationhttp://www.kyoto-project.eu/
Kybots, knowledge yielding robotsGerman RigauIXA group, UPV/EHU
First Review MeetingMarch 17, 2009, Luxembourg