BioNLP09 Winners
-
Upload
farzanehs -
Category
Technology
-
view
241 -
download
3
Transcript of BioNLP09 Winners
![Page 1: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/1.jpg)
Extracting Complex Biological Eventswith Rich GraphBased Feature Sets
Jari Björne, Juho Heimonen, Filip Ginter, AnttiAirola, Tapio Pahikkala, Tapio SalakoskiBioNLP 2009 Workshop
Farzaneh Sarafraz18 June 2009
![Page 2: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/2.jpg)
BioNLP'09 Task 1
Events in abstracts Given: gene and gene products (proteins) Wanted: events
− type− trigger− participant(s)− cause (if applicable)
![Page 3: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/3.jpg)
Example
"I kappa B/MAD3 masks the nuclear localization signal of NFkappa B p65 and requires the transactivation domain to inhibit NFkappa B p65 DNA binding. "
Event: negative regulation
Trigger: masks
Theme1: the first p65
Cause: MAD3
![Page 4: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/4.jpg)
Event Types
Gene expression Transcription Protein Catabolism Localisation Phosphorylation
Binding Regulation Positive regulation Negative regulation
![Page 5: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/5.jpg)
Training and Test Data
Training data: 800 abstracts Development data: 150 abstracts Test data: 260 abstracts
![Page 6: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/6.jpg)
The System
Trigger recognition− Methods similar to NER− Classification
Argument detection− Graph edge selection− Classification
Semantic postprocessing− Rulebased
![Page 7: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/7.jpg)
Trigger Detection
Token labelling (one for each type and one ) 92% of triggers are single token
− Adjacent tokens form a trigger if they appear in the training data
Triggers that share a token:− Combined class: gene expression/pos regulation
A graph node for each trigger− Not duplicated just yet
![Page 8: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/8.jpg)
Classification SVM
Token features− Binary: capitalisation, presence of punctuation or
numeric characters− Stem− Character bigrams and trigrams− Token is known triggers in training data− All the above for linear and dependency
“neighbours”
![Page 9: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/9.jpg)
Classification SVM
Frequency features− # of named entities
In sentence In a linear window around the token Bagofwords count of token texts in the sentence (?)
Dependency chains− Up to depth of 3 from the token are constructed− At each depth both token and frequency features− Plus dep type and sequence of dep types in chain
![Page 10: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/10.jpg)
Two SVMs
“Somewhat” different feature sets Combined weighted results
“This design should be considered an artifact of the timeconstrained, experimentdriven development of the system rather than a principled design”
![Page 11: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/11.jpg)
Precision/Recall tradeoff
Undetected trigger > undetected event All triggers have events in the training data >
bias towards reporting an event for all detected triggers
Adjust P/R explicitly − multiply the negative class by β− find β experimentally
![Page 12: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/12.jpg)
Edge Detection
Multiclass SVM All potential directed edges
− Event node to named entity− Event node to event node (nested event)− Labelled as theme, cause, or negative
Each edge is predicted independently
![Page 13: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/13.jpg)
Feature Set – Central Concept
Shortest undirected path of syntactic dependencies in the Stanford scheme parse of the sentence.
![Page 14: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/14.jpg)
Feature Set
Token text, POS, entity/event class, dependency (subject)
Ngrams: merging the attributes of 24− Consecutive tokens− Consecutive dependencies− Each token and two neighbouring dependencies− Each dependency and two neighbouring tokens− One bigram showing direction
![Page 15: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/15.jpg)
Other Features
Individual component features Semantic node features Frequency features
![Page 16: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/16.jpg)
Semantic PostProcessing
Duplicate nodes− Same class and same trigger− Combined trigger
Remove improper arguments Remove directed cycles by removing the
weakest link
![Page 17: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/17.jpg)
Duplicating Event Nodes
Task restrictions− Two causes,− must have theme,− etc.
Several heuristics
xth first dependency in shortest path from the event for binding
![Page 18: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/18.jpg)
Results
![Page 19: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/19.jpg)
Compared to Us
![Page 20: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/20.jpg)
What Didn't Work/Wasn't Tried
CRF HMM Removing strong independence assumption Coreference resolution (4.8%)
![Page 21: BioNLP09 Winners](https://reader035.fdocuments.in/reader035/viewer/2022062514/55a2367c1a28ab6a668b4585/html5/thumbnails/21.jpg)
End.