TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V....

20
TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team ( Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G. Cherepanov Saint-Petersburg State University Speaker Alexey L. Smolyakov Scientific Adviser prof. Vladimir V. Safonov

Transcript of TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V....

Page 1: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

TEMPLATE-DRIVEN KNOWLEDGE MINING.

KNOWLEDGE PROSPECTOR.NET

Project team (Knowledge.Net)Anton V. NovikovMaxim V. Sigalin Alexey L. SmolyakovDmitry G. Cherepanov

Saint-Petersburg State University

SpeakerAlexey L. Smolyakov

Scientific Adviserprof. Vladimir V. Safonov

Page 2: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Project goals

Flexible framework Supporting different languages Integration with Knowledge.Net

Page 3: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Algorithm

Getting documents and first-step text analysis Morphological analysis of text blocks Semantic analysis of entities sets using templates Optimizing resulting graph Saving results

Page 4: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Getting documents and first-step text analysis

Getting documents from providers

Divide document into articles (just text, list, table etc.)

Divide text into blocks

Текстовый формат – этоочень гибкий путь для описания различных типов информации…

1) Один2) Два3) Три

Страна. Столица.Англия. Лондон.Украина. Киев.

Page 5: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Morphological analysis of text blocks

Language recognition

Morphological form recognition using dictionaries

Creating entities

Word(«Documents»)

«Documents» current m. f. :Noun, plural«Document» base m. f.:Noun, singular

Russian English …

MRD XML …

Entity Class(«Document»)

Page 6: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Morphological analysis >Entities types >“Simple” entities Entity “separator". Example «.,;:!?()[]{}

…» Entity “unknown" Entity “changeable". Example «good» Entity “relationship". Example «Planet

Earth is LESS then Sun»

Page 7: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Morphological analysis > Entities types >“True” entities Entity “class" (class). Example

«document». Entity “property". Example «useful». Entity “datatype".

Datetime Integer

Page 8: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Semantic analysis >Goals

Creating relationships between entities

Creating new entities Adding true entities

into resulting graphProperty(«comfortable»)

Class(«house»)

Class(«building»)

Property(«brick»)

Subclass

Property-Class

Property-Class

Page 9: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Semantic analysis >Relationship types Relationship between property and class Relationship “subclass” Relationship “subproperty” Relationship “equality” Relationship between two classes Relationship “conditional rule”

Page 10: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Semantic analysis >Template description Priority Pattern Handlers

<Template Priority="10000" Pattern="#E.P #E.C ,? а? значить #E.P"><Handler Name=“PropertyRelationship" Arguments="0, 1" /><Handler Name="PropertyRelationship" Arguments="5, 1" /><Handler Name="ConditionalRule" Arguments="1, 0, 5" />

</Template>

Page 11: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Semantic analysis >Pattern description Logical operands: «&»(and), «|»(or), «^»(not). Occurrence: not set (once), «+», «*», «?» #E.P, #E.C, #E.S, #E.U, #E.Int, #E.DateTime #M.Noun, #M.Adjective, #M.Verb, … #W.Month, #W.Number, … - words holder #H.Class, …- clauses holder

[#E.P #M.Adjective]+ [#E.C #M.Noun]

Page 12: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Semantic analysis >Pattern description > Words holder

<ClauseHolder Name="Class"><Item Pattern="[#E.P #M.Adjective]* #E.C" Index="1" /><Item Pattern="[#E.P #M.Adjective] , [#E.P #M.Adjective] #E.C" Index="2" />

</ClauseHolder>

Clauses holder

<WordHolder Name="Month"><Item Word=“JANUARY" Value="1" /><Item Word=“FEBRUARY" Value="2" /><Item Word=“MARCH" Value="3" />...

</WordHolder>

Page 13: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Semantic analysis >Handlers

Replace Create datetime entity Create «property-class» relationship Create «subclass» relationship Create «subproperty» relationship Create «conditional rule» relationship Create «class-class» relationship

Page 14: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Semantic analysis >Creating relationships

Property(«useful») Class(«document»)

+

<Template Priority=“4" Pattern="[#E.P #M.Adjective]+ [#E.C #M.Noun]"><Handler Name=“PropertyRelationship" Arguments="0, 1" />

</Template>

=

Property(«useful») Class(«document»)

«property-class» relationship

Page 15: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Semantic analysis >Creating new entities

Integer(«7») Class(«December»)

+<Template Priority="11000" Pattern="#E.INT #W.Month #E.INT year">

<Handler Name="Replace" From="0" Count="4" ><CreateEntityHandler Name="CreateDateTime«

Arguments="day=0, month=1, year=2" /></Handler>

</Template>

=

Datetime (7.12.2006)

Integer(«2006») Class(«Year»)

Page 16: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Optimizing resulting graph

Removing redundant «subclass» relationships

Removing redundant relationships between properties and classesClass(«bus»)

Class(«transport») Property(«fast»)

subclass Property-class

Class(«vehicle»)

SubclassSubclass Property-class

Page 17: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Saving results

Saving acquired knowledge into Knowledge.Net format

Into OWL Saving (and loading) knowledge from

own binary format files

Page 18: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Current project status

Developed working prototype Created test temples Attached «Mrd» dictionary (Russian and

English)

Page 19: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

Plans

Support creating «compound» entities (compound from several words: «creation of human hands»)

Functionality extension (adding new entities, relationships, templates, handlers, …)

Program for generating templates Developing good examples

Page 20: TEMPLATE-DRIVEN KNOWLEDGE MINING. KNOWLEDGE PROSPECTOR.NET Project team (Knowledge.Net) Anton V. Novikov Maxim V. Sigalin Alexey L. Smolyakov Dmitry G.

?Contact information:[email protected]://www.knowledge-net.ruhttp://polyhimnie.math.spbu.ru