Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

21
1 Thesis Final Presentation Semantic Role Labeling for Information Extraction of Bio-medical data Shikha Jacob Mathew Vinith Varghese School of Engineering, Jönköping University 03/26/2014

description

Information Extraction (IE) focuses on retrieving certain type of information from natural language texts by automatic processing. IE plays an important role in biomedical domain since the knowledge within this area is significantly growing. Relation between entities within this domain can facilitate in various tasks within this domain. Thus this thesis work focuses on extracting semantic information using the concept of semantic role labeling (SRL) with the help of background knowledge sources like ontology.

Transcript of Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

Page 1: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

1

Thesis Final Presentation

Semantic Role Labeling for Information Extraction of Bio-medical data

Shikha Jacob Mathew Vinith Varghese

School of Engineering, Jönköping University

03/26/2014

Page 2: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

2

Agenda

Description of the problem

Purpose of this study

Methodology

IE System

Results

Conclusion & Future Work

Page 3: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

3

Description of the problem

Biomedical domain is flooded with large amount of information.

Research questions

• How to improvise the Natural Language Processing (NLP) components with increased performance so as to extract high quality information from biomedical domain?

• To find a solution using domain specific knowledge to generate high quality relations between different entities within the domain based on Semantic Role Labeling (SRL)?

StructuredExtract relevent

and useful Manage

Page 4: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

4

Purpose of this study

Develop a useful Information Extraction (IE) system that extracts relation between entities using information obtained from SRL within the biomedical domain.

This is accomplished by introducing two features:

Name Entity Recognition Ontology

Page 5: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

5

Methodology: Research Approach (1/2)

Design Science Research

Quantitative Evaluation

Adaptive Software Development

Page 6: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

6

Methodology: Research Framework (2/2)

The general methodology of design science research

Page 7: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

7

IE System - Framework (1/2)

Page 8: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

8

IE System - Framework (2/2)

Relational Detector:The steps include:

Dependency Parser- Parser Model

Semantic Role Labeler

Page 9: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

9

Semantic Role Labeling (SRL)

• SRL process includes: Pre-processing Argument Identification Argument Classification Post-Processing

• Features used: Word Form, Lemmatizer, POS tagging, Head Word, Dependency Label. Introduced domain specific Features: Ontology, NER.

Page 10: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

10

SRL Example

A0 is Agent A1: Patient or Theme

Page 11: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

11

Domain Specific Name Entity Recognizer (NER)

The Concept Used for NER:

• Conditional Random Field (CRF): Statistical Modeling Method Pattern Recognition Structured Prediction.

• Use: Argument’s Boundary Identification• Patterns from POS Tag.

Page 12: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

12

Domain-specific NER Example

Page 13: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

13

Domain Specific Ontology

• Conceptual knowledge organised in a computer based representation.

• IE needs ontologies for interpreting the texts and extracting relevant information .

• Metathesaurus and Semantic Network• UIMA Semantic Types

Broad categories for concepts (Metathesaurus) Use: Predicate Identification: Process/function ST

Page 14: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

14

Results: Predicate Identification (1/2)

P R F10

20

40

60

80

100

Before Feature

After Feature

Evaluation criteria: Precision, Recall, F1-measure

Page 15: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

15

Results: Predicate Identification (2/2)

• More biomedical predicates identified• Drawbacks:

few false negatives missing predicates

Page 16: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

16

Results: Argument Identification (1/2)

P R F10

20

40

60

80

100

Before Feature

After Feature

Page 17: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

17

Results: Argument Identification(2/2)

• Boundary of the predicate is small

[John is playing with a bat ] and a ball.

• Lack of identifying predicates

Page 18: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

18

Conclusion (1/2)

High quality information extraction

More predicates-more arguments-more relations

Biomedical Field-Researcher:

Integrated information- further study/investigation Manage and structure Easy access

Predicate ArgumentArgument

RELATIONS

Page 19: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

19

Conclusion (2/2)

Drawbacks

Speed

Missing predicates

False Negatives

Small predicate boundary

Page 20: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

20

Further investigation

Feature Engineering based on context of the text

Name entity classification-NER

How to introduce features so as to not compromise the performance of the system

Catagories, relationships, synonyms

Clause Boundary/Proportional attachments

For making the system specific

Speed/Performance

Predicate boundary identification

Better result-Ontology Information

Page 21: Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.

21

Thank You!!

Questions