Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical...

49
ALINA PETROVA EMCL WORKSHOP 18.02.2014 Learning formal definitions for biomedical concepts

Transcript of Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical...

Page 1: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

ALINA PETROVA

EMCL WORKSHOP 1 8.02.2014

Learning formal definitions for biomedical concepts

Page 2: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Examples of reasoning over structured biomedical knowledge

2

1) Covert et al. 2012: Whole-cell simulation •  computational model of all processes in a bacterium •  2 years, >1000 articles

2) King et al. 2009: Automation of science •  Adam the Robot Scientist •  generate functional genomic hypothesis

about a yeast •  used knowledge bases and ontologies

for hypothesis generation and analysis •  experimental validation

Page 3: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

The growth of biomedical scientific literature 3

Tsatsaronis et al. 2013

Page 4: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Existing biomedical ontologies 4

# concept year research/production

Definitions

UMLS 1,000,000 1986 R,P textual, triples

SNOMED CT 300,000 1965 P formal

FMA 75,000 1995 P triples

GO 42,000 1998 P textual

GALEN 29,000 1991 R formal

MeSH 25,000 1963 P textual

Great need to convert textual definitions to formal representation!

Page 5: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Formalizing biomedical knowledge 5

Atelectasis (Lung collapse) example:

Absence of air in the entire or part of a lung, such as an incompletely inflated neonate lung or a collapsed adult lung. Pulmonary atelectasis can be caused by airway obstruction, lung compression, fibrotic contraction, or other factors.

vs.

Atelectasis = Disorder of lung ⊓ ∃has_associate_morphology(Collapse) ⊓ ∃has_finding_site(Lung structure) ⊓ ∃has_episodicity(Episodicities) ⊓ ∃has_clinical_course(Courses)

Page 6: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

An example of a MeSH definition 6

Arthritis is a form of joint disorder that results from joint inflammation. When bone surfaces become less well protected by cartilage, bone may be exposed and damaged.

Page 7: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Is it easy to formalize a definition? 7

Arthritis is a form of joint disorder that results from joint inflammation. Arthritis = Joint_Disorder ⊓ ∃results_from.Joint_Inflammation

YES!

Page 8: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Is it easy to formalize a definition? 8

When bone surfaces become less well protected by cartilage, bone may be exposed and damaged.

Temporal logic?

Modal logic?

Situation calculus?

???

NO!

DL? which one?

Page 9: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Sources of problems 9

�  Conceptual modeling ¡  Joint_Inflammation or Inflammation – related_to – Joints ?

�  Expressive modeling ¡  what exactly do we want to model? to what degree of

sophistication? using which formalism?

�  Text mining ¡  how to establish the dependencies between words in a

definition?

Page 10: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

The Goal 10

A is a B that has property C.

A ≣ B ⊓ ∃property.C

Page 11: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

How to extract formal definitions? 11

CONCEPT ANNOTATION

RELATION EXTRACTION

RELATION CLASSIFICATION

Page 12: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Example 12

Abdominal Wall: the outer margins of the abdomen, extending from the osteocartilaginous thoracic cage to the pelvis.

Page 13: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Step 1: Concept annotation 13

Abdominal Wall: the outer margins of the abdomen, extending from the osteocartilaginous thoracic cage to the pelvis. the abdomen -> ‘Abdomen’ the osteocartilaginous thoracic cage -> ‘Thorax’ the pelvis -> ‘Pelvis’

Page 14: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Step 2: Relation extraction 14

Abdominal Wall: the outer margins of the abdomen, extending from the osteocartilaginous thoracic cage to the pelvis. “outer margins of” (Abdominal wall, Abdomen) “that extends from” (Abdominal wall, Thorax) “that extends to” (Abdominal wall, Pelvis)

Page 15: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Step 3: Relation classification 15

“outer margins of” (Abdominal wall, Abdomen) “that extends from” (Abdominal wall, Thorax) “that extends to” (Abdominal wall, Pelvis) location(Abdominal wall, Abdomen) starts(Abdominal wall, Thorax) ends(Abdominal wall, Pelvis)

Page 16: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

How to extract formal definitions? 16

CONCEPT ANNOTATION

RELATION EXTRACTION

RELATION CLASSIFICATION

Page 17: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

RELATION EXTRACTION

17

SUPERVISED

Page 18: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Approach #1: align existing resources 18

Atelectasis (Lung collapse) example:

Absence of air in the entire or part of a lung, such as an incompletely inflated neonate lung or a collapsed adult lung. Pulmonary atelectasis can be caused by airway obstruction, lung compression, fibrotic contraction, or other factors.

vs.

Atelectasis = Disorder of lung ⊓ ∃has_associate_morphology(Collapse) ⊓ ∃has_finding_site(Lung structure) ⊓ ∃has_episodicity(Episodicities) ⊓ ∃has_clinical_course(Courses)

Page 19: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Results 19

�  Relations: extract 3 SNOMED relations from MeSH textual definitions

�  Results: 75% success rate for single-label classification

A – relational string – B

A – relation label – B

Page 20: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Results 20

�  How to improve 75%? ¡  add new features ¡  use resources with consistent modeling

Be data-driven!

A – relational string – B

A – relation label – B

Page 21: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Approach #2: annotate a corpus 21

SemRep: �  a rule-based system for biomedical relation extraction �  26 relations �  a corpus of 500 annotated sentences �  1300 relation instances

Top relations: process_of, location_of, part_of, treats, isa, affects, causes, interacts_with, uses etc.

Page 22: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

SemRep relations 22

Page 23: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Two key improvements 23

�  Consistent modeling Before: MeSH texts VS. SNOMED CT relations After: SemRep texts VS. SemRep relations �  The use of concept types Before: lexical features (ngrams) After: ngrams + concept types of relation arguments

Page 24: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Concept types 24

Motivation: every relation has a domain and a range è  only specific types of concepts can be used as arguments

UMLS (biggest knowledge source for biomedicine, thesaurus, upper ontology etc.): 133 semantic types Tissue, Cell Function, Animal, Behavior, Physical Object, Molecular Sequence etc. Hormone – affects – Cell Function Body Substance – causes – Anatomical Abnormality

Page 25: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Why concept types are useful? 25

given concepts A, B MeSH triple: A “is in some relation with” B Before: A – relation R1 – B both are candidates!

A – relation R2 – B After: A à type Аt, B à type Bt

R1 ⊆ At x Bt R2 ⊆ Ct x Dt

Page 26: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

RESULTS 26

Before: 424 instances, top 3 relations, 75% After: 860 instances, top 5 relations, 94%

1144 instances, top 10 relations, 89% 1357 instances, all 26 relations, 83%

Page 27: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Comparison with SemRep 27

SemRep ML method

Quality top 5 95% 94%

top 10 94% 89%

all 94% 83%

Scalability

not scalable scalable

Training speed

manually annotated corpus + rules = months

annotated corpus + ML = minutes

still rely on the labeled corpus è approach #3

Page 28: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

RELATION EXTRACTION

28

UNSUPERVISED

Page 29: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Why is no annotated corpus needed? 29

Original approach:

term A – relational string – term B

concept A – formal relation – B concept Now add the concept types!

annotation

Page 30: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

The corpus is not manually annotated! 30

term A – relational string – term B

concept A – relational string – concept B

concept type A’ – formal relation – concept type B’

known from taxonomy/thesaurus!

known from the corpus

Still we use SemRep as a background. Can we do better?

Page 31: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Approach #3: unsupervised relation extraction 31

Yes! �  no manual annotation �  no predefined relations

�  only taxonomy and annotation needed �  semantic clustering

term A – relational string – term B

concept A – relational string – concept B

concept type A’ – verb – concept type B’

Page 32: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Cluster examples 32

�  {attach, bind}

�  {cause, produce, induce}

�  {transmit, convey, carry}

�  {limit, inhibit, reduce}

�  {result, lead}

etc.

Page 33: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Conclusions 33

�  decompose the task of formal definition generation ¡  review of the existing approaches ¡  adaptation/creation ¡  implementation ¡  evaluation

�  explore non-taxonomic relation extraction ¡  feature analysis ¡  performance of 94% on a par with SemRep

�  suggest workflow for unsupervised relation extraction ¡  faster ¡  less resource dependent ¡  can be generalized to different domains and applications

Page 34: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

QUESTIONS?

Thank you!

Page 35: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

35

Page 36: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

T R I P L E E X T R A C T I O N

Back-up slides

Page 37: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Example 37

Abdominal Wall: the outer margins of the abdomen, extending from the osteocartilaginous thoracic cage to the pelvis. STEP #2: triple extraction “outer margins of the” (Abdominal wall, Abdomen) “that extends from the osteocartilaginous” (Abdominal wall, Thorax) “to the” (Abdominal wall, Pelvis)

Page 38: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Triple extraction steps 38

1.  separate the definition into head and body 2.  find the parent term, if there is one 3.  group coordinated concepts together 4.  organize concepts into concept pairs 5.  extract relational string for every pair 6.  detect negation

Page 39: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Triple extraction steps 39

�  separate the definition into head and body

Head: Abdominal wall Body: the outer margins of the abdomen… �  find the parent term, if there is one �  group coordinated concepts together �  organize concepts into concept pairs �  extract relational string for every pair �  detect negation

Page 40: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Triple extraction steps 40

�  separate the definition into head and body �  find the parent term, if there is one

“Cancer is a disease that…” è IS_A(Cancer, Disease)

�  group coordinated concepts together �  organize concepts into concept pairs �  extract relational string for every pair �  detect negation

Page 41: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Triple extraction steps 41

�  separate the definition into head and body �  find the parent term, if there is one

�  group coordinated concepts together

“X causes swelling and rashes” è causes(X, Swelling), causes(X, Rash)

�  organize concepts into concept pairs �  extract relational string for every pair �  detect negation

Page 42: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Triple extraction steps 42

�  separate the definition into head and body �  find the parent term, if there is one �  group coordinated concepts together �  organize concepts into concept pairs

�  extract relational string for every pair “that extends to the osteocartilaginous” (Abdominal wall, Thorax) �  detect negation

Page 43: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Triple extraction steps 43

�  separate the definition into head and body �  find the parent term, if there is one �  group coordinated concepts together �  organize concepts into concept pairs �  extract relational string for every pair

�  detect negation “that does not respond to the ordinary” (Refractory anemia, Treatment) è NEGATION

Page 44: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

A N N O T A T I O N

44

Back-up slides

Page 45: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Attribute Alignment Annotator 45

Page 46: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Problem # 1: missing annotations 46

Page 47: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Problem #2: ambiguity 47

Page 48: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

D E T A I L S O F T H E A P P R O A C H

Back-up slides

Page 49: Learning formal definitions for biomedical concepts · Learning formal definitions for biomedical concepts . ... explore non-taxonomic relation extraction ! feature analysis ! ...

Improvements since the last meeting 49

Old approach New approach

Text source MeSH definitions MEDLINE abstracts

Relation set R source SNOMED CT UMLS

Feature sources text of a definition text of a definition + concept types

Feature representations BoW, token and character ngrams, combination

character ngrams

Weighting schemes boolean, per-class weights boolean

Classification algorithm SVMs, Random Forests, Logistic Regression, Naïve Bayes

SVMs