Michael Fuchs | How to compute semantic relationships between entities and facts out of natural...

23
How to compute semantic relationships between entities and facts out of natural texts Michael Fuchs Technology Evangelist ABBYY [email protected]

Transcript of Michael Fuchs | How to compute semantic relationships between entities and facts out of natural...

Page 1: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

How to compute semantic relationships

between entities and facts out of

natural texts

Michael Fuchs Technology Evangelist

ABBYY [email protected]

Page 2: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Agenda

1. How machines read pixels

2. Documents, words, layout & semantics

3. Syntactic & semantic text parsing

4. Live demo

5. Q&A

2

Page 3: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

How machines read pixels

3

Separate pixels to characters Pixel analysis Find text/image blocks

Page 4: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

How machines read pixels

4

Build proper words as editable text Recognize individual characters

-> Linguistics: Alphabets & Morphology Dictionaries

-> Math, AI, Statistics, Experience, and…

Requirements to make a machine read text:

Page 5: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

5

What is needed to make a machine understand the meaning

of words, sentences, texts?

Page 6: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Documents & Words

6

What is a document?

Statistics can give basic insights

-> No real semantic understanding

b) Words in order?

Layouts generate visual pattern

-> Semantics can be derived from layout

a) Bag of words?

Page 7: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Documents, Words and Layout

7

Document with layout

Text document with “simulated” layout Text with line breaks

Text only

-> Rules can extract data out of (semi-)structured texts and documents -> Layout helps to identify the semantic meaning of data

Page 8: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Text and Structure

Is “plain” natural language text unstructured?

8

-> yes, at least for almost all IT systems

-> not for humans who can read and speak the language

-> Facts and their relations can’t be reliably detected with “simple” rules

Page 9: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Text, Structure & Translation

9

Is a word by word translation enough?

-> … well – not really…

-> Semantic understanding of the words and their relationship in sentences is needed!

-> That is true for humans and machines

Page 10: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Text & Structure

10

Why is natural language text understanding difficult for machines?

-> Languages are not logical and context dependent

– different usage, e.g. as verb, noun, adjective

-> Different words – the same concept, e.g. to buy/sell something

– different meanings, e.g. run, plant, apple …

-> One word – different variants, e.g. go, went, gone

Page 11: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Basic Language Structure

11

-> Morphology = Rules how to use words

-> Semantics = meaning and the usage of words

-> Semantic Relations = reflect/organise the meaning and relations of words and sentences.

-> Syntax = Rules are used to build correct sentences

How to get to the insides of a sentence?

Page 12: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Compreno System Architecture

13

Extraction rules Interpretation

rules

Identification rules

Morphological analyzer

Syntactic and semantic analysis

Anaphora resolution

Disambiguation

Semantic representation

of text

Parser Information Extraction

Module

RDF Graph

Page 13: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Morphology Analysis

14 14

Page 14: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Sentence Analysis with Semantic Info

15

Page 15: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

17

How to get the correct semantic meaning of words?

ABBYY’s answer: Universal Semantic Hierarchy

= language independent semantic concepts

Page 16: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

ABBYY’s Universal Semantic Hierarchy

18

Semantic Meaning “Vocabulary” EN “Vocabulary” DE

Page 17: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Handling Lexical Ambiguity

19

Page 18: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Recovering Omitted Words and Links (Ellipsis)

20

Recovered Node

Ellipsis

Page 19: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Identifying Pronoun Referents (Anaphora)

21

Mary saw her students. They were wearing masks. She was surprised. (Mary → her, Mary → she, students → they).

Page 20: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

From Text to Semantic with Compreno

22

Page 21: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

DEMO

Page 22: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Summary: What is ABBYY Compreno? ● … NLP technology featuring a unique model-based approach that employs

universal language models and identifies language structures.

● …. combines both syntactic and semantic analysis, as well as machine learning on untagged text corpora.

● … allows to create a semantic representation of text

● … able to resolve complex language phenomena: − lexical ambiguity − omitted words and links recovering ellipsis − identifying pronoun referents anaphora − coreference − coordination and more

● … support of English, Russian, German in progress

24

Page 23: Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

QUESTIONS?

Thank you for your attention!