1 ESCRIRE: Embedded Structured Content Representation In Repositories Jérôme Euzenat INRIA...

25
1 ESCRIRE: Embedded Structured Content Representation In Repositories Jérôme Euzenat INRIA Rhône-Alpes Jerome.Euzenat@inrialpes. fr
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    1

Transcript of 1 ESCRIRE: Embedded Structured Content Representation In Repositories Jérôme Euzenat INRIA...

1

ESCRIRE:Embedded Structured ContentRepresentation In Repositories

Jérôme Euzenat

INRIA Rhône-Alpes

[email protected]

2

ESCRIRE: Motivations

Embedding a simplified but formal representation of content in documents :

• search on structured criteria;

• document comparison (genericity, similarity…);

• automatic classification and organisation.

3

Knowledge based queries

(and book (about "Agatha Christie"))

vs. book AND "Agatha Christie"

(and flat (location  "Alps"))

…including those in Val d’Isère!

(and bookshop (location "London"))

…bookstore included.

4

Query languages

level 3 Semiotic

level 2 Semantic (F-logic, Escrire…)

level 1 Structural (SQL, XQL)

level 0 Full-text search

5

ESCRIRE: Goals

Comparison of several knowledge representation techniques

in order to find the type of situation to which they are most suited (indexing, classifying, filtering…).

6

ESCRIRE: Consortium

“Coordinated research action (ARC)” involving

Acacia (Sophia-Antipolis): conceptual graphs

Sherpa/Exmo (Rhône-Alpes): object-based representations

Orpailleur (Lorraine): terminological logics.

Usinor: application.

7

“Ontology”

Description

ESCRIRE: Acquisition

Globalanalysis

Individualanalysis

IntegrationXML

document

Document

Tr-schema

Tr-object

8

ESCRIRE: Queries

“Ontology”

Queryhelper

XMLdocument

Tr-schema

Tr-query TroepsXML

document

9

ESCRIRE: Problem statement

Given:A set of (HTML) documents annotated by a description of their content in a pivotal langageAn ontology of the domainA set of queries about the subject.

Retrieve:the adequate documents.

10

ESCRIRE: Software variation

Knowledge representation + query evaluation

Translated from a pivotal language in

Conceptual graphs, Object-based representation, Description logic

Translated by hand in CG, OKR, DL

12

ESCRIRE: Quantitative criteria

• Precision: rate of correct answers

• Recall: rate of complete answers

• Acuracy=(precision+recall)/2

• Performances in time

• Coverage of the query language

• Ordering of answers

13

ESCRIRE: Qualitative criteria

Given by external users (query designers):

• Naturalness of queries

• Adequacy of answers

• Overall appreciation (aggregation).

14

ESCRIRE: Scaling

Multiplying the size by orders of magnitude:

• Corpus

• Ontology

• Queries.

15

ESCRIRE: Reference comparisons

• Dublin core metadata

• Full-text search

16

ESCRIRE: Ontology elements (1)<esc:ontology>

<esc:defclass name="gene">

<esc:classref name="adn-part"/>

<esc:defattribute name="length">

<esc:typeref name="integer"/>

</esc:defattribute>

<esc:defattribute name="protein">

<esc:classref name="protein"/>

</esc:defattribute>

</esc:defclass>

17

ESCRIRE: Ontology elements (2)

<esc:descrelation name="interaction">

<esc:relref name="bio-process"/>

<esc:defattribute name="effect">

<esc:typeref name="string"/>

</esc:defattribute>…

<esc:defrole name="promoter">

<esc:classref name="gene"/>

</esc:defrole>…

</esc:descrelation>…

</esc:ontology>

18

ESCRIRE: Content descriptions

<esc:content ontology="biointer.xml" url=".">

<esc:object type="gene" id="bcd"/>

<esc:relation type="interaction">

<esc:attribute name="effect">

inhibition

</esc:attribute>

<esc:role name="promoter">

<esc:objref id="Bcd"/>

</esc:role>

</esc:relation>…

</esc:content>

19

ESCRIRE: Knowledge embedding

<html>… <!-- xhtml -->

<rdf:RDF>

<rdf:Description about="/">

<!-- dublin core -->

<dc:title>…</dc:title>…

<!-- pivot language -->

<esc:content>… </esc:content>

<!-- conceptual graphs -->

<gc:graphs>…</gc:graphs>

</rdf:Description>…

</rdf:RDF>…

</html>

20

ESCRIRE: Queries

• Stated on objects, but results are documents

(concerning these topics)

• Document similarity by content similarity

21

ESCRIRE: Query language

SELECT / FROM / WHERE / ORDERBY

+

AND / OR / NOT / ALL / EXISTS

<path> <relop> <path>|<value>

IN <class>

ALIKE <document>

22

ESCRIRE: Corpus 1

Subject: genetic interaction

Text source: MedLine abstracts

Annotations: manual

Ontology: Knife knowledge base + other

23

ESCRIRE: Corpus 2

Subject: Psychological stress

Text source: MedLine abstracts

Annotation: manual annotations

Ontology: UMLS/MeSH

24

ESCRIRE: Where are we?

• Building translators from pivot to actual formats

• 1st part of Corpus 1 available (other data shall folow quikly)

25

ESCRIRE: Calls

• Other corpora

• Natural language technology

• Other representation systems

starting from september 2000

26

For more information…

http://escrire.inrialpes.fr/

[email protected]