Workshop negations

28
Using SVMs with the Command Relation Feature to Identify Negated Events in Biomedical Literature Farzaneh Sarafraz Goran Nenadic School of Computer Science University of Manchester [email protected] [email protected]

Transcript of Workshop negations

Using SVMs with the Command Relation Feature to Identify Negated Events in Biomedical Literature

Farzaneh Sarafraz

Goran Nenadic

School of Computer Science

University of Manchester

[email protected]

[email protected]

2 / 27

Outline

• Motivation & aim

• Molecular events

• Data & experiments

• Methods

• Discussion

• Summary

3 / 27

Motivation & aim• Biomedical literature

• 2000 papers published every day

• Biomedical information extraction needed• Improve IE by negation information• Negative results are interesting and reported

• “The IKK complex, but not p90 (rsk), is responsible for the in vivo phosphorylation of I-kappa-B-alpha.”

• Resources• Shared tasks, data • Linguistic tools (syntactic parsers)

4 / 27

Problem statement

• Given• Pubmed abstracts

• Protein/gene mentions annotated

• Molecular events annotated

• Wanted for every event• Negated or not

• Classification problem

5 / 27

Molecular eventstrigger

event

trigger

participant

participant

participation type

{theme, cause}

participant type

{gene/protein, event}

participant

participant

participant type

participation type

{theme, cause}

{gene/protein, event}

event type {binding, transcription, regulation, expression}

“We further show that Nmi interacts with all STATs except Stat2.”

{theme, cause}{theme, cause}

event type {binding, transcription, regulation, expression}

{gene/protein, event} {gene/protein, event}

6 / 27

Molecular events – class I

• One theme (gene/protein)

• “The effect of this synergism was perceptible at the level of induction of the IL-2 gene.”• Trigger: induction

• Type: gene expression

• Theme: IL-2

• Types: transcription, gene expression, phosphorylation, protein catabolism, localization

7 / 27

Molecular events – class II

• One or more themes (gene/protein)

• “We further show that Nmi interacts with all STATs except Stat2.”• Trigger: interacts

• Type: binding

• Themes: Nmi, Stat2

• Negated

• Type: Binding

8 / 27

Molecular events – class III

• Types: regulation types

• 1 theme, 0 or 1 cause

• may be gene/protein or other events

• “Overexpression of full-length ALG-4 induced transcription of FasL and, consequently, apoptosis.”

Event 3Event 1Regulation“induced”Event 4

Event 2Regulation“Overexpression”Event 3

ALG-4Gene expression“Overexpression”Event 2

FasLTranscription“transcription”Event 1

CauseThemeTypeTriggerEvent

Event 3Event 1Regulation“induced”Event 4

Event 2Regulation“Overexpression”Event 3

ALG-4Gene expression“Overexpression”Event 2

FasLTranscription“transcription”Event 1

CauseThemeTypeTriggerEvent

9 / 27

Data: BioNLP’09

• Training: 800 abstracts• Test: 260 abstracts• Gold annotations

• Event trigger, type, participants, negation• Negation cue not annotated

1071,7956159,685Total

669874404,870Class III

1524944887Class II

265591312,858Class I

negatedtotalnegatedtotal

Development dataTraining dataEvent class

1071,7956159,685Total

669874404,870Class III

1524944887Class II

265591312,858Class I

negatedtotalnegatedtotal

Development dataTraining dataEvent class

Test data

10 / 27

Methodologies

• Rule-based• The command relation

• Classification• SVM on event representation

• Lexical features: negation cue, POS

• Syntactic features: command

• Semantic features: event types

• Baseline• NegEx: event triggers as “terms”

11 / 27

Evaluation measuresFP+TP

TP=Precision

FN+TP

TP=ySensitivit=Recall

Recall+Precision

RecallPrecisionF1

××2=

FP+TN

TN=ySpecificit

FP+TP

TP=Precision

FN+TP

TP=ySensitivit=Recall

Recall+Precision

RecallPrecisionF1

××2=

FP+TN

TN=ySpecificit

12 / 27

Baseline results

94%-0%-No negation detection

93%36%37%36%NegEx

81%32%78%20%any negation cue present

Spec.F1RPApproach

94%-0%-No negation detection

93%36%37%36%NegEx

81%32%78%20%any negation cue present

Spec.F1RPApproach

13 / 27

The command relation

• If a and b are nodes in the constituency parse tree of a sentence, then a X-commands b iff the lowest ancestor of a with label X is also an ancestor of b.

Ronald Langacker, On Pronominalization and the Chain of Command, in D. Reibel and S. Schane (eds.) Modern

Studies in English, Prentice-Hall, Englewood Cliffs, NJ. 160-186. 1969.

14 / 27

Example of the command relation

• a S-commands b.

• b does not S-command a.

a

b

S

S

15 / 27

X-command in action

S

We now VPshow that

S VP

failsa mutant motif that exchanges the terminal 3' C for a G

to bind the p50 homodimer.

16 / 27

Rule-based method

• An event is negated if• Negation cue exists;

and• Negation cue S-commands any participant

• Negation cue S-commands trigger

• Negation cue S-commands both

• Negation cue VP-commands both

17 / 27

Results of rule-based method

42%negation cue VP-commands both

86%35%68%23%negation cue S-commands both

85%34%68%23%negation cue S-commands trigger

84%35%76%23%negation cue S-commands any participant

Spec.F1RPApproach

42%negation cue VP-commands both

86%35%68%23%negation cue S-commands both

85%34%68%23%negation cue S-commands trigger

84%35%76%23%negation cue S-commands any participant

Spec.F1RPApproach

18 / 27

SVM features

• Semantic features• Event type

• Lexical features• Sentence contains negation cue• Negation cue

• Syntactic features• POS of neg cue• POS of event trigger• POS of the participants• Parse tree distance between trigger & cue• Type of smallest phrase containing trigger & cue• Cue S-commands any participant• Cue S-commands trigger

19 / 27

Results of single SVM, incremental feature sets

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

20 / 27

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

Results of single SVM, incremental feature sets

1. Event type

2. Sentence contains neg cue

3. Neg cue

4. POS of neg cue

5. POS of event trigger

6. POS of the participants

7. Type of smallest phrase containing trigger & cue

21 / 27

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

Results of single SVM, incremental feature sets

1. Event type

2. Sentence contains neg cue

3. Neg cue

4. POS of neg cue

5. POS of event trigger

6. POS of the participants

7. Type of smallest phrase containing trigger & cue

8. Cue S-commands any participant

22 / 27

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

Results of single SVM, incremental feature sets

1. Event type

2. Sentence contains neg cue

3. Neg cue

4. POS of neg cue

5. POS of event trigger

6. POS of the participants

7. Type of smallest phrase containing trigger & cue

8. Cue S-commands any participant

9. Cue S-commands trigger

23 / 27

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

99.2%51%38%76%Features 1-10

99.2%49%38%71%Features 1-9

99.3%30%19%73%Features 1-8

99.2%14%8%43%Features 1-7

Spec.F1RPFeature set

Results of single SVM, incremental feature sets

1. Event type

2. Sentence contains neg cue

3. Neg cue

4. POS of neg cue

5. POS of event trigger

6. POS of the participants

7. Type of smallest phrase containing trigger & cue

8. Cue S-commands any participant

9. Cue S-commands trigger

10.Parse tree distance between trigger & cue

24 / 27

Results of separate SVMs for each class

99.7%62%47%92%Macro-average(3 classes)

99.4%63%49%88%Micro-average(1,795 events)

99.2%57%44%81%Class III(987 events)

100%50%33%100%Class II (249 events)

99.8%77%65%94%Class I (559 events)

Spec.F1RPEvent class

99.7%62%47%92%Macro-average(3 classes)

99.4%63%49%88%Micro-average(1,795 events)

99.2%57%44%81%Class III(987 events)

100%50%33%100%Class II (249 events)

99.8%77%65%94%Class I (559 events)

Spec.F1RPEvent class

25 / 27

Future work

• Use class-specific features

• Study other variants of command

• Combine negation detection with automatic event detection instead of using ‘gold’ events

• Use negation detection on a larger scale dataset (MEDLINE) to find contradictions & contrasts in the biomedical literature

26 / 27

Conclusions

• SVM for extracting negated events• >99% specificity• 63% F-measure (micro average)

• Different classes of events behave differently• To detect negated molecular event

• Event trigger & surface distances not enough • Semantic & command features useful

• Event participants as important as triggers

• Apply on large scale data – MEDLINE

27 / 27

Acknowledgements

• Organisers of BioNLP’09

• GN TEAM

• Casey Bergman’s lab – Faculty of Life Sciences, University of Manchester

• James Eales – University of Manchester

• Jonathan Caruana – University College London

• Web service soon available at http://gnode1.mib.man.ac.uk/negmole

28 / 27

X-command in action

S

We now VPshow that

S VP

fails

S

a mutant motif that exchanges the terminal 3' C for a G

to bind the p50 homodimer that

is upregulated in LPS tolerant human Mono Mac 6 cells.