Download - Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

Transcript
Page 1: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

1

Thesis Final Presentation

Semantic Role Labeling for Information Extraction of Bio-medical data

Shikha Jacob Mathew Vinith Varghese

School of Engineering, Jönköping University

03/26/2014

Page 2: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

2

Agenda

Description of the problem

Purpose of this study

Methodology

IE System

Results

Conclusion & Future Work

Page 3: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

3

Description of the problem

Biomedical domain is flooded with large amount of information.

Research questions

• How to improvise the Natural Language Processing (NLP) components with increased performance so as to extract high quality information from biomedical domain?

• To find a solution using domain specific knowledge to generate high quality relations between different entities within the domain based on Semantic Role Labeling (SRL)?

StructuredExtract relevent

and useful Manage

Page 4: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

4

Purpose of this study

Develop a useful Information Extraction (IE) system that extracts relation between entities using information obtained from SRL within the biomedical domain.

This is accomplished by introducing two features:

Name Entity Recognition Ontology

Page 5: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

5

Methodology: Research Approach (1/2)

Design Science Research

Quantitative Evaluation

Adaptive Software Development

Page 6: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

6

Methodology: Research Framework (2/2)

The general methodology of design science research

Page 7: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

7

IE System - Framework (1/2)

Page 8: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

8

IE System - Framework (2/2)

Relational Detector:The steps include:

Dependency Parser- Parser Model

Semantic Role Labeler

Page 9: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

9

Semantic Role Labeling (SRL)

• SRL process includes: Pre-processing Argument Identification Argument Classification Post-Processing

• Features used: Word Form, Lemmatizer, POS tagging, Head Word, Dependency Label. Introduced domain specific Features: Ontology, NER.

Page 10: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

10

SRL Example

A0 is Agent A1: Patient or Theme

Page 11: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

11

Domain Specific Name Entity Recognizer (NER)

The Concept Used for NER:

• Conditional Random Field (CRF): Statistical Modeling Method Pattern Recognition Structured Prediction.

• Use: Argument’s Boundary Identification• Patterns from POS Tag.

Page 12: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

12

Domain-specific NER Example

Page 13: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

13

Domain Specific Ontology

• Conceptual knowledge organised in a computer based representation.

• IE needs ontologies for interpreting the texts and extracting relevant information .

• Metathesaurus and Semantic Network• UIMA Semantic Types

Broad categories for concepts (Metathesaurus) Use: Predicate Identification: Process/function ST

Page 14: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

14

Results: Predicate Identification (1/2)

P R F10

20

40

60

80

100

Before Feature

After Feature

Evaluation criteria: Precision, Recall, F1-measure

Page 15: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

15

Results: Predicate Identification (2/2)

• More biomedical predicates identified• Drawbacks:

few false negatives missing predicates

Page 16: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

16

Results: Argument Identification (1/2)

P R F10

20

40

60

80

100

Before Feature

After Feature

Page 17: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

17

Results: Argument Identification(2/2)

• Boundary of the predicate is small

[John is playing with a bat ] and a ball.

• Lack of identifying predicates

Page 18: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

18

Conclusion (1/2)

High quality information extraction

More predicates-more arguments-more relations

Biomedical Field-Researcher:

Integrated information- further study/investigation Manage and structure Easy access

Predicate ArgumentArgument

RELATIONS

Page 19: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

19

Conclusion (2/2)

Drawbacks

Speed

Missing predicates

False Negatives

Small predicate boundary

Page 20: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

20

Further investigation

Feature Engineering based on context of the text

Name entity classification-NER

How to introduce features so as to not compromise the performance of the system

Catagories, relationships, synonyms

Clause Boundary/Proportional attachments

For making the system specific

Speed/Performance

Predicate boundary identification

Better result-Ontology Information

Page 21: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

21

Thank You!!

Questions