TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V....

Post on 16-Jan-2016

213 views 0 download

Transcript of TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V....

TEMPLATE-DRIVEN KNOWLEDGE MINING.

KNOWLEDGE PROSPECTOR.NET

Project team (Knowledge.Net)Anton V. NovikovMaxim V. Sigalin Alexey L. SmolyakovDmitry G. Cherepanov

Saint-Petersburg State University

SpeakerAlexey L. Smolyakov

Scientific Adviserprof. Vladimir V. Safonov

Project goals

Flexible framework Supporting different languages Integration with Knowledge.Net

Algorithm

Getting documents and first-step text analysis Morphological analysis of text blocks Semantic analysis of entities sets using templates Optimizing resulting graph Saving results

Getting documents and first-step text analysis

Getting documents from providers

Divide document into articles (just text, list, table etc.)

Divide text into blocks

Текстовый формат – этоочень гибкий путь для описания различных типов информации…

1) Один2) Два3) Три

Страна. Столица.Англия. Лондон.Украина. Киев.

Morphological analysis of text blocks

Language recognition

Morphological form recognition using dictionaries

Creating entities

Word(«Documents»)

«Documents» current m. f. :Noun, plural«Document» base m. f.:Noun, singular

Russian English …

MRD XML …

Entity Class(«Document»)

Morphological analysis >Entities types >“Simple” entities Entity “separator". Example «.,;:!?()[]{}

…» Entity “unknown" Entity “changeable". Example «good» Entity “relationship". Example «Planet

Earth is LESS then Sun»

Morphological analysis > Entities types >“True” entities Entity “class" (class). Example

«document». Entity “property". Example «useful». Entity “datatype".

Datetime Integer

Semantic analysis >Goals

Creating relationships between entities

Creating new entities Adding true entities

into resulting graphProperty(«comfortable»)

Class(«house»)

Class(«building»)

Property(«brick»)

Subclass

Property-Class

Property-Class

Semantic analysis >Relationship types Relationship between property and class Relationship “subclass” Relationship “subproperty” Relationship “equality” Relationship between two classes Relationship “conditional rule”

Semantic analysis >Template description Priority Pattern Handlers

<Template Priority="10000" Pattern="#E.P #E.C ,? а? значить #E.P"><Handler Name=“PropertyRelationship" Arguments="0, 1" /><Handler Name="PropertyRelationship" Arguments="5, 1" /><Handler Name="ConditionalRule" Arguments="1, 0, 5" />

</Template>

Semantic analysis >Pattern description Logical operands: «&»(and), «|»(or), «^»(not). Occurrence: not set (once), «+», «*», «?» #E.P, #E.C, #E.S, #E.U, #E.Int, #E.DateTime #M.Noun, #M.Adjective, #M.Verb, … #W.Month, #W.Number, … - words holder #H.Class, …- clauses holder

[#E.P #M.Adjective]+ [#E.C #M.Noun]

Semantic analysis >Pattern description > Words holder

<ClauseHolder Name="Class"><Item Pattern="[#E.P #M.Adjective]* #E.C" Index="1" /><Item Pattern="[#E.P #M.Adjective] , [#E.P #M.Adjective] #E.C" Index="2" />

</ClauseHolder>

Clauses holder

<WordHolder Name="Month"><Item Word=“JANUARY" Value="1" /><Item Word=“FEBRUARY" Value="2" /><Item Word=“MARCH" Value="3" />...

</WordHolder>

Semantic analysis >Handlers

Replace Create datetime entity Create «property-class» relationship Create «subclass» relationship Create «subproperty» relationship Create «conditional rule» relationship Create «class-class» relationship

Semantic analysis >Creating relationships

Property(«useful») Class(«document»)

+

<Template Priority=“4" Pattern="[#E.P #M.Adjective]+ [#E.C #M.Noun]"><Handler Name=“PropertyRelationship" Arguments="0, 1" />

</Template>

=

Property(«useful») Class(«document»)

«property-class» relationship

Semantic analysis >Creating new entities

Integer(«7») Class(«December»)

+<Template Priority="11000" Pattern="#E.INT #W.Month #E.INT year">

<Handler Name="Replace" From="0" Count="4" ><CreateEntityHandler Name="CreateDateTime«

Arguments="day=0, month=1, year=2" /></Handler>

</Template>

=

Datetime (7.12.2006)

Integer(«2006») Class(«Year»)

Optimizing resulting graph

Removing redundant «subclass» relationships

Removing redundant relationships between properties and classesClass(«bus»)

Class(«transport») Property(«fast»)

subclass Property-class

Class(«vehicle»)

SubclassSubclass Property-class

Saving results

Saving acquired knowledge into Knowledge.Net format

Into OWL Saving (and loading) knowledge from

own binary format files

Current project status

Developed working prototype Created test temples Attached «Mrd» dictionary (Russian and

English)

Plans

Support creating «compound» entities (compound from several words: «creation of human hands»)

Functionality extension (adding new entities, relationships, templates, handlers, …)

Program for generating templates Developing good examples

?Contact information:smlkvalex@mail.ruhttp://www.knowledge-net.ruhttp://polyhimnie.math.spbu.ru