Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.
-
Upload
vinith-varghese -
Category
Technology
-
view
86 -
download
5
description
Transcript of Semantic Role Labelling (SRL) for Information Extraction of Bio-medical data.
1
Thesis Final Presentation
Semantic Role Labeling for Information Extraction of Bio-medical data
Shikha Jacob Mathew Vinith Varghese
School of Engineering, Jönköping University
03/26/2014
2
Agenda
Description of the problem
Purpose of this study
Methodology
IE System
Results
Conclusion & Future Work
3
Description of the problem
Biomedical domain is flooded with large amount of information.
Research questions
• How to improvise the Natural Language Processing (NLP) components with increased performance so as to extract high quality information from biomedical domain?
• To find a solution using domain specific knowledge to generate high quality relations between different entities within the domain based on Semantic Role Labeling (SRL)?
StructuredExtract relevent
and useful Manage
4
Purpose of this study
Develop a useful Information Extraction (IE) system that extracts relation between entities using information obtained from SRL within the biomedical domain.
This is accomplished by introducing two features:
Name Entity Recognition Ontology
5
Methodology: Research Approach (1/2)
Design Science Research
Quantitative Evaluation
Adaptive Software Development
6
Methodology: Research Framework (2/2)
The general methodology of design science research
7
IE System - Framework (1/2)
8
IE System - Framework (2/2)
Relational Detector:The steps include:
Dependency Parser- Parser Model
Semantic Role Labeler
9
Semantic Role Labeling (SRL)
• SRL process includes: Pre-processing Argument Identification Argument Classification Post-Processing
• Features used: Word Form, Lemmatizer, POS tagging, Head Word, Dependency Label. Introduced domain specific Features: Ontology, NER.
10
SRL Example
A0 is Agent A1: Patient or Theme
11
Domain Specific Name Entity Recognizer (NER)
The Concept Used for NER:
• Conditional Random Field (CRF): Statistical Modeling Method Pattern Recognition Structured Prediction.
• Use: Argument’s Boundary Identification• Patterns from POS Tag.
12
Domain-specific NER Example
13
Domain Specific Ontology
• Conceptual knowledge organised in a computer based representation.
• IE needs ontologies for interpreting the texts and extracting relevant information .
• Metathesaurus and Semantic Network• UIMA Semantic Types
Broad categories for concepts (Metathesaurus) Use: Predicate Identification: Process/function ST
14
Results: Predicate Identification (1/2)
P R F10
20
40
60
80
100
Before Feature
After Feature
Evaluation criteria: Precision, Recall, F1-measure
15
Results: Predicate Identification (2/2)
• More biomedical predicates identified• Drawbacks:
few false negatives missing predicates
16
Results: Argument Identification (1/2)
P R F10
20
40
60
80
100
Before Feature
After Feature
17
Results: Argument Identification(2/2)
• Boundary of the predicate is small
[John is playing with a bat ] and a ball.
• Lack of identifying predicates
18
Conclusion (1/2)
High quality information extraction
More predicates-more arguments-more relations
Biomedical Field-Researcher:
Integrated information- further study/investigation Manage and structure Easy access
Predicate ArgumentArgument
RELATIONS
19
Conclusion (2/2)
Drawbacks
Speed
Missing predicates
False Negatives
Small predicate boundary
20
Further investigation
Feature Engineering based on context of the text
Name entity classification-NER
How to introduce features so as to not compromise the performance of the system
Catagories, relationships, synonyms
Clause Boundary/Proportional attachments
For making the system specific
Speed/Performance
Predicate boundary identification
Better result-Ontology Information
21
Thank You!!
Questions