TEMPLATE-DRIVEN KNOWLEDGE MINING.
KNOWLEDGE PROSPECTOR.NET
Project team (Knowledge.Net)Anton V. NovikovMaxim V. Sigalin Alexey L. SmolyakovDmitry G. Cherepanov
Saint-Petersburg State University
SpeakerAlexey L. Smolyakov
Scientific Adviserprof. Vladimir V. Safonov
Project goals
Flexible framework Supporting different languages Integration with Knowledge.Net
Algorithm
Getting documents and first-step text analysis Morphological analysis of text blocks Semantic analysis of entities sets using templates Optimizing resulting graph Saving results
Getting documents and first-step text analysis
Getting documents from providers
Divide document into articles (just text, list, table etc.)
Divide text into blocks
…
Текстовый формат – этоочень гибкий путь для описания различных типов информации…
1) Один2) Два3) Три
Страна. Столица.Англия. Лондон.Украина. Киев.
Morphological analysis of text blocks
Language recognition
Morphological form recognition using dictionaries
Creating entities
Word(«Documents»)
«Documents» current m. f. :Noun, plural«Document» base m. f.:Noun, singular
Russian English …
MRD XML …
Entity Class(«Document»)
Morphological analysis >Entities types >“Simple” entities Entity “separator". Example «.,;:!?()[]{}
…» Entity “unknown" Entity “changeable". Example «good» Entity “relationship". Example «Planet
Earth is LESS then Sun»
Morphological analysis > Entities types >“True” entities Entity “class" (class). Example
«document». Entity “property". Example «useful». Entity “datatype".
Datetime Integer
Semantic analysis >Goals
Creating relationships between entities
Creating new entities Adding true entities
into resulting graphProperty(«comfortable»)
Class(«house»)
Class(«building»)
Property(«brick»)
Subclass
Property-Class
Property-Class
Semantic analysis >Relationship types Relationship between property and class Relationship “subclass” Relationship “subproperty” Relationship “equality” Relationship between two classes Relationship “conditional rule”
Semantic analysis >Template description Priority Pattern Handlers
<Template Priority="10000" Pattern="#E.P #E.C ,? а? значить #E.P"><Handler Name=“PropertyRelationship" Arguments="0, 1" /><Handler Name="PropertyRelationship" Arguments="5, 1" /><Handler Name="ConditionalRule" Arguments="1, 0, 5" />
</Template>
Semantic analysis >Pattern description Logical operands: «&»(and), «|»(or), «^»(not). Occurrence: not set (once), «+», «*», «?» #E.P, #E.C, #E.S, #E.U, #E.Int, #E.DateTime #M.Noun, #M.Adjective, #M.Verb, … #W.Month, #W.Number, … - words holder #H.Class, …- clauses holder
[#E.P #M.Adjective]+ [#E.C #M.Noun]
Semantic analysis >Pattern description > Words holder
<ClauseHolder Name="Class"><Item Pattern="[#E.P #M.Adjective]* #E.C" Index="1" /><Item Pattern="[#E.P #M.Adjective] , [#E.P #M.Adjective] #E.C" Index="2" />
</ClauseHolder>
Clauses holder
<WordHolder Name="Month"><Item Word=“JANUARY" Value="1" /><Item Word=“FEBRUARY" Value="2" /><Item Word=“MARCH" Value="3" />...
</WordHolder>
Semantic analysis >Handlers
Replace Create datetime entity Create «property-class» relationship Create «subclass» relationship Create «subproperty» relationship Create «conditional rule» relationship Create «class-class» relationship
Semantic analysis >Creating relationships
Property(«useful») Class(«document»)
+
<Template Priority=“4" Pattern="[#E.P #M.Adjective]+ [#E.C #M.Noun]"><Handler Name=“PropertyRelationship" Arguments="0, 1" />
</Template>
=
Property(«useful») Class(«document»)
«property-class» relationship
Semantic analysis >Creating new entities
Integer(«7») Class(«December»)
+<Template Priority="11000" Pattern="#E.INT #W.Month #E.INT year">
<Handler Name="Replace" From="0" Count="4" ><CreateEntityHandler Name="CreateDateTime«
Arguments="day=0, month=1, year=2" /></Handler>
</Template>
=
Datetime (7.12.2006)
Integer(«2006») Class(«Year»)
Optimizing resulting graph
Removing redundant «subclass» relationships
Removing redundant relationships between properties and classesClass(«bus»)
Class(«transport») Property(«fast»)
subclass Property-class
Class(«vehicle»)
SubclassSubclass Property-class
Saving results
Saving acquired knowledge into Knowledge.Net format
Into OWL Saving (and loading) knowledge from
own binary format files
Current project status
Developed working prototype Created test temples Attached «Mrd» dictionary (Russian and
English)
Plans
Support creating «compound» entities (compound from several words: «creation of human hands»)
Functionality extension (adding new entities, relationships, templates, handlers, …)
Program for generating templates Developing good examples
?Contact information:[email protected]://www.knowledge-net.ruhttp://polyhimnie.math.spbu.ru
Top Related