diesirae presentation copyarcslab.dei.polimi.it/wp-content/uploads/diesirae-demo3.pdf · OmniFind...

6
DIESIRAE A semantic search engine based on NLP L. Sbattella and R. Tedesco

Transcript of diesirae presentation copyarcslab.dei.polimi.it/wp-content/uploads/diesirae-demo3.pdf · OmniFind...

Page 1: diesirae presentation copyarcslab.dei.polimi.it/wp-content/uploads/diesirae-demo3.pdf · OmniFind tag-based query W l Admin Winery Consultant Sales agent Restaurateur Retail Wine

DIESIRAE A semantic search engine

based on NLP

L. Sbattella and R. Tedesco

Page 2: diesirae presentation copyarcslab.dei.polimi.it/wp-content/uploads/diesirae-demo3.pdf · OmniFind tag-based query W l Admin Winery Consultant Sales agent Restaurateur Retail Wine

Knowledge Management

OmniFind

tag-based query

Con

text

-aw

are

Web

Por

tal

Admin

Winery

Consultant

Sales agent

Restaurateur

Retail

Wine lover

Oenologist

Trend setter

Text Extractor

Relat.DBWeb Apps

Cont

ext-b

ased

Info

rmat

ion

Filte

rs

PDFDOC

...

Domain OntologySemantic Network(s)Domain Model

NL query

DW & DB query

Data Feed Extractor

PerLa Extractor

Ontology Extractor

Taxonomy

KnowledgeIndexing & Extraction

AD-DDIS

XML

DB DW

Internal Enterprise Data PROM

Process Data Extractor

Process

on-the-fly queries to external sources

processquery

Enterprise ApplicationsMapping

Model

Page 3: diesirae presentation copyarcslab.dei.polimi.it/wp-content/uploads/diesirae-demo3.pdf · OmniFind tag-based query W l Admin Winery Consultant Sales agent Restaurateur Retail Wine

Knowledge Indexing & Extraction: Goals •  Domain model à Ontology (W3C OWL standard)

–  Describes the concepts of the domain

•  Domain vocabulary à Semantic Network –  Describes the lemmas of the domain

•  Mapping model à Stochastic model –  2° order HMM-inspired model –  Transition probs approximated by means of MaxEnt models –  Solves mapping ambiguities

•  Queries: –  Keyword-based (AND/OR; max probability/exaustive) –  Phrase-based (Disambiguated Word queries and Ontological

queries)

Page 4: diesirae presentation copyarcslab.dei.polimi.it/wp-content/uploads/diesirae-demo3.pdf · OmniFind tag-based query W l Admin Winery Consultant Sales agent Restaurateur Retail Wine

Knowledge indexing & extraction: Functionalities

Information Extraction Engine

Conceptual Index

Document Repository

Query EngineNL query

doc. / URLupload

Text Extractor

WebPDFDOC

...

document file + text

Domain Modelplain text

concepts

concepts

domainknowledge

domainknowledge

Ontology Extender

Training set

Training Procedure

Domain Ontology

Semantic Network(s)

Domain Model

Document Repositoryfor training

TestingProcedure

Test set

Linguistic Context Extractor

Expert

MappingModel

Con

text

-aw

are

Web

Por

tal

Domain Ontology

Semantic Network(s)

MappingModel

To the Internal Enterprise Data module

Exporter

Training Indexing, querying, and extending

Page 5: diesirae presentation copyarcslab.dei.polimi.it/wp-content/uploads/diesirae-demo3.pdf · OmniFind tag-based query W l Admin Winery Consultant Sales agent Restaurateur Retail Wine

Keyword-based queries

•  Sequence of isolated words –  No linguistic structure

•  Exhaustive AND/OR keywords –  No concept disambiguation –  Searches for multiple tuples –  Example: light wine à several meanings found…

country à search for instances…

•  Max probability AND/OR keywords –  Searches for a single tuple –  Exploits the a-priori concept probabilities –  Example: [light wine] à max probability meaning

Page 6: diesirae presentation copyarcslab.dei.polimi.it/wp-content/uploads/diesirae-demo3.pdf · OmniFind tag-based query W l Admin Winery Consultant Sales agent Restaurateur Retail Wine

Phrase-based queries

•  Phrase –  Linguistic structure –  Context-based disambiguation

•  Disambiguated Word queries –  Context used for concept disambiguation

•  Index the phrase ( à extract concepts) •  Search for AND-ed concepts

–  Example: (fruit taste) à disambiguates fruit

•  Ontological queries –  Context used to select the request to the ontology

•  Indexes the sentences •  Select the request; searches the ontology for the mapped concepts

–  Example: “type of tannins in wine” à instance list