Complex Relations Extraction

COMPLEX RELATIONS EXTRACTION

NAVEED AFZALSupervisor: Dr Mark Stevenson

MSc Advanced Software Engineering

Introduction Information Extraction (IE) is the process of deriving

useful information from unstructured text documents. Much of the recent research in the area of IE has

been focused on named identification and binary relations extraction.

This project investigates the problem of complex relations extraction. Complex relations are n-ary (n>2) relations between n entities.

The aim of complex relations extraction is to identify all the instances of interest in some piece of text, including incomplete sentences.

System’s Overview

MUC-6 Data (Soderland version)

Data Pre-Processing

Factorising into Binary Relations

POS Tagging

Labelling of Relations

Classifiers Training

Building of Graph

Rebuilding of Complex Relations

System’s Overview Factorising complex relations into binary

relations Train binary classifier for relatedness Build graph among related entities Rebuild complex relations from that

graph by finding maximal clique

Classifiers Result

00.10.20.30.40.50.60.70.80.9

1

Decision TreeClassifier

MaximumEntropy

Classifier

Naïve BayesClassifier

Training DataAccuracyTesting DataAccuracy

Experimental Results of Relations Classification

0

0.2

0.4

0.6

0.8

1

1.2


MaximumEntropy

Classifier


PercisionRecallF-Score

Experimental Results of Events Classification

00.10.20.30.40.50.60.70.80.9


MaximumEntropy

Classifier


PercisionRecallF-Score

Conclusion This project has presented an approach for complex relations

extraction in which the complex relations are first factorised into binary relations then different classifiers (Maximum Entropy, Naïve Bayes and Decision Tree) are trained to learn to identify binary relations.

In second phase, complex relations are reconstructed by finding maximal cliques in graphs that represent relations between pairs of entities.

Decision Tree classifier outperforms both Naïve Bayes and Maximum Entropy classifier in terms of precision, recall and F-score. Results produced by the Naïve Bayes classifier are relatively quite poor compare to Maximum Entropy and Decision Tree classifier.

Future Work This project have looked at the modified version of MUC-6 data

in which events are completely described with in a single sentence.

It will be motivating to investigate the events described in multiple sentences, see Stevenson (2004).

Moreover, this approach can be improved by using much deeper synthetic parsing and more powerful binary classifiers based on tree kernels Zelenko et al. (2003).

At the moment, it is using supervised learning algorithms and it would be interesting to investigate how this approach performs when using unsupervised learning algorithms.

Complex Relations Extraction

Documents

Transcript of Complex Relations Extraction