Threats to reef. Cyanide Fishing Blast Fishing Reef gleaning Overfishing Muro-ami.
Gleaning Relational Information from Biomedical Text
description
Transcript of Gleaning Relational Information from Biomedical Text
![Page 1: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/1.jpg)
Gleaning Relational Gleaning Relational Information from Biomedical Information from Biomedical
TextText
Mark GoadrichMark GoadrichComputer Sciences DepartmentComputer Sciences Department
University of Wisconsin - MadisonUniversity of Wisconsin - Madison
Joint Work with Jude Shavlik and Louis OliphantJoint Work with Jude Shavlik and Louis Oliphant
CIBM Seminar - Dec 5th 2006CIBM Seminar - Dec 5th 2006
![Page 2: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/2.jpg)
OutlineOutline
The Vacation GameThe Vacation Game Formalizing with LogicFormalizing with Logic Biomedical Information ExtractionBiomedical Information Extraction Evaluating HypothesesEvaluating Hypotheses Gleaning Logical RulesGleaning Logical Rules ExperimentsExperiments Current DirectionsCurrent Directions
![Page 3: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/3.jpg)
The Vacation GameThe Vacation Game
PositivePositive
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
NegativeNegative
![Page 4: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/4.jpg)
The Vacation GameThe Vacation Game
PositivePositive– AppleApple– FeetFeet– LuggageLuggage– MushroomsMushrooms– BooksBooks– WalletWallet– BeekeeperBeekeeper
NegativeNegative– PearPear– SocksSocks– CarCar– FungusFungus– NovelNovel– MoneyMoney– HiveHive
PositivePositive– AApppplele– FFeeeett– LuLuggggageage– MushrMushroooomsms– BBooooksks– WaWalllletet– BBeeeekkeeeeperper
![Page 5: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/5.jpg)
The Vacation GameThe Vacation Game
My Secret RuleMy Secret Rule– The word must have The word must have two adjacent two adjacent
lettersletters which are the which are the same lettersame letter..
Found by using Found by using inductive logicinductive logic– Positive and Negative ExamplesPositive and Negative Examples– Formulating and Eliminating HypothesesFormulating and Eliminating Hypotheses– Evaluating Success and FailureEvaluating Success and Failure
![Page 6: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/6.jpg)
Inductive Logic Inductive Logic ProgrammingProgramming
Machine LearningMachine Learning– Classify data into categoriesClassify data into categories– Divide data into Divide data into traintrain and and test test setssets– Generate hypotheses onGenerate hypotheses on train train set and set and
then measure performance on then measure performance on testtest set set In ILP, data are In ILP, data are ObjectsObjects … …
– person, block, molecule, word, phrase, person, block, molecule, word, phrase, ……
and and RelationsRelations between them between them– grandfather, has_bond, is_member, …grandfather, has_bond, is_member, …
![Page 7: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/7.jpg)
Formalizing with LogicFormalizing with Logic
apple
a b c d e f g h i j k l mn o p q r s t u v w x y z
w2169
a p p l ew2169_1 w2169_5w2169_4w2169_3w2169_2
Objects
Relations
![Page 8: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/8.jpg)
Formalizing with LogicFormalizing with Logic
word(w2169). letter(w2169_1).word(w2169). letter(w2169_1).has_letter(w2169, w2169_2). has_letter(w2169, w2169_2). has_letter(w2169, w2169_3). has_letter(w2169, w2169_3). next(w2169_2, w2169_3).next(w2169_2, w2169_3).letter_value(w2169_2, ‘p’).letter_value(w2169_2, ‘p’).letter_value(w2169_3, ‘p’).letter_value(w2169_3, ‘p’).
pos(X) :- has_letter(X, A), has_letter(X, pos(X) :- has_letter(X, A), has_letter(X, B),B), next(A, B), letter_value(A, next(A, B), letter_value(A, C), C), letter_value(B, C).letter_value(B, C).
a b c d e f g h i j k l mn o p q r s t u v w x y z
w2169
w2169_1 w2169_5w2169_4w2169_3w2169_2
‘apple'
head body Variables
![Page 9: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/9.jpg)
Biomedical Information Biomedical Information ExtractionExtraction
*image courtesy of SEER Cancer Training Site
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
DatabaseStructured
![Page 10: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/10.jpg)
Biomedical Information Biomedical Information ExtractionExtraction
http://www.geneontology.orghttp://www.geneontology.org
![Page 11: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/11.jpg)
Biomedical Information Biomedical Information ExtractionExtraction
NPL3 encodes a nuclear protein with an RNA NPL3 encodes a nuclear protein with an RNA recognition motif and similarities to a family of recognition motif and similarities to a family of proteins involved in RNA metabolism.proteins involved in RNA metabolism.
ykuD was transcribed by SigK RNA polymerase ykuD was transcribed by SigK RNA polymerase from T4 of sporulation.from T4 of sporulation.
Mutations in the COL3A1 gene have been Mutations in the COL3A1 gene have been implicated as a cause of type IV Ehlers-Danlos implicated as a cause of type IV Ehlers-Danlos syndrome, a disease leading to aortic rupture in syndrome, a disease leading to aortic rupture in early adult life.early adult life.
![Page 12: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/12.jpg)
Biomedical Information Biomedical Information ExtractionExtraction
The dog running down the street The dog running down the street tackled and bit my little sister.tackled and bit my little sister.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 13: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/13.jpg)
Biomedical Information Biomedical Information ExtractionExtraction
NPL3 encodes a nuclear protein NPL3 encodes a nuclear protein with … with …
verbnoun article adj noun prep
sentence
prepphrase
…verb
phrasenoun
phrasenoun
phrasenoun
phrasenoun
phrase
![Page 14: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/14.jpg)
MedDict Background MedDict Background KnowledgeKnowledge
http://cancerweb.ncl.ac.uk/omd/http://cancerweb.ncl.ac.uk/omd/
![Page 15: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/15.jpg)
MeSH Background KnowledgeMeSH Background Knowledge
http://www.nlm.nih.gov/mesh/http://www.nlm.nih.gov/mesh/MBrowser.htmlMBrowser.html
![Page 16: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/16.jpg)
GO Background KnowledgeGO Background Knowledge
http://www.geneontology.orghttp://www.geneontology.org
![Page 17: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/17.jpg)
Some Prolog PredicatesSome Prolog Predicates Biomedical PredicatesBiomedical Predicates
– phrase_contains_medDict_term(Phrase, Word, WordText)phrase_contains_medDict_term(Phrase, Word, WordText)– phrase_contains_mesh_term(Phrase, Word, WordText)phrase_contains_mesh_term(Phrase, Word, WordText)– phrase_contains_mesh_disease(Phrase, Word, WordText)phrase_contains_mesh_disease(Phrase, Word, WordText)– phrase_contains_go_term(Phrase, Word, WordText)phrase_contains_go_term(Phrase, Word, WordText)
Lexical PredicatesLexical Predicates– internal_caps(Word) alphanumeric(Word)internal_caps(Word) alphanumeric(Word)
Look-ahead Phrase PredicatesLook-ahead Phrase Predicates– few_POS_in_phrase(Phrase, POS)few_POS_in_phrase(Phrase, POS)– phrase_contains_specific_word_triple(Phrase, W1, W2, W3)phrase_contains_specific_word_triple(Phrase, W1, W2, W3)– phrase_contains_some_marked_up_arg(Phrase, Arg#, Word,phrase_contains_some_marked_up_arg(Phrase, Arg#, Word, Fold)Fold)
Relative Location of PhrasesRelative Location of Phrases– protein_before_location(ExampleID)protein_before_location(ExampleID)– word_pair_in_between_target_phrases(ExampleID, W1, W2)word_pair_in_between_target_phrases(ExampleID, W1, W2)
![Page 18: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/18.jpg)
Still More PredicateStill More Predicate High-scoring words in High-scoring words in proteinprotein phrases phrases
– bifunction, repress, pmr1, … bifunction, repress, pmr1, …
High-scoring words in High-scoring words in locationlocation phrases phrases– golgi, cytoplasm, ergolgi, cytoplasm, er
High-scoring High-scoring BETWEENBETWEEN protein & location protein & location– across, cofractionate, inside, …across, cofractionate, inside, …
![Page 19: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/19.jpg)
Biomedical Information Biomedical Information ExtractionExtraction
Given:Given: Medical Journal abstracts tagged Medical Journal abstracts tagged
with biological relations with biological relations Do:Do: Construct system to extract Construct system to extract
related related phrases from phrases from unseen textunseen text
Our Gleaner ApproachOur Gleaner Approach
Develop Develop fast ensemble algorithmsfast ensemble algorithms focused on focused on recallrecall and and precisionprecision evaluation evaluation
![Page 20: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/20.jpg)
Using Modes to Chain Using Modes to Chain RelationsRelations
Phrase
Sentence
Word
alphanumeric(…)alphanumeric(…)
internal_caps(…)internal_caps(…)
verb(…)verb(…)
phrase_child(…, …)phrase_child(…, …)
long_sentence(…)long_sentence(…)
phrase_parent(…, …)phrase_parent(…, …)
noun_phrase(…)noun_phrase(…)
![Page 21: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/21.jpg)
Growing Rules From SeedGrowing Rules From Seed
NPL3 encodes a nuclear protein with … NPL3 encodes a nuclear protein with …
prot_loc(prot_loc(ab1392078_sen7_ph0ab1392078_sen7_ph0, , ab1392078_sen7_ph2ab1392078_sen7_ph2, , ab1392078_sen7ab1392078_sen7).).
phrase_contains_novelword(phrase_contains_novelword(ab1392078_sen7_ph0, ab1392078_sen7_ph0, ab1392078_sen7_ph0_w0).ab1392078_sen7_ph0_w0).
phrase_next(phrase_next(ab1392078_sen7_ph0ab1392078_sen7_ph0, , ab1392078_sen7_ph1ab1392078_sen7_ph1).).
……
noun_phrase(noun_phrase(ab1392078_sen7_ph2ab1392078_sen7_ph2).).
word_child(word_child(ab1392078_sen7_ph2ab1392078_sen7_ph2, , ab9018277_sen5_ph11_w3ab9018277_sen5_ph11_w3).).
……
avg_length_sentence(avg_length_sentence(ab1392078_sen7ab1392078_sen7).).
……
Phrase Phrase Sentence
Phrase
Word
Word
![Page 22: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/22.jpg)
Growing Rules From SeedGrowing Rules From Seed
prot_loc(prot_loc(ProteinProtein,,LocationLocation,,SentenceSentence) :-) :-
phrase_contains_some_alphanumeric(phrase_contains_some_alphanumeric(ProteinProtein,E),,E),
phrase_contains_some_internal_cap_word(phrase_contains_some_internal_cap_word(ProteinProtein,,E),E),
phrase_next(phrase_next(ProteinProtein,_), ,_),
different_phrases(different_phrases(ProteinProtein,,LocationLocation),),
one_POS_in_phrase(one_POS_in_phrase(LocationLocation,noun), ,noun),
phrase_contains_some_arg2_10x_word(phrase_contains_some_arg2_10x_word(LocationLocation,_),,_),
phrase_previous(phrase_previous(LocationLocation,_), ,_),
avg_length_sentence(avg_length_sentence(SentenceSentence). ).
![Page 23: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/23.jpg)
Rule EvaluationRule Evaluation
Prediction vs ActualPrediction vs ActualPositive or NegativePositive or Negative
True or FalseTrue or False
FNTP
TP
FPTP
TP
TP
FP FN
TN
actu
al
prediction
RP
2PR
F1 Score =F1 Score =
Focus on positive examplesFocus on positive examplesRecall = Recall =
Precision = Precision =
![Page 24: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/24.jpg)
Protein Localization Rule 1Protein Localization Rule 1
prot_loc(prot_loc(ProteinProtein,,LocationLocation,,SentenceSentence) :-) :-
phrase_contains_some_alphanumeric(phrase_contains_some_alphanumeric(ProteinProtein,E),,E),
phrase_contains_some_internal_cap_word(phrase_contains_some_internal_cap_word(ProteinProtein,,E),E),
phrase_next(phrase_next(ProteinProtein,_), ,_),
different_phrases(different_phrases(ProteinProtein,,LocationLocation),),
one_POS_in_phrase(one_POS_in_phrase(LocationLocation,noun), ,noun),
phrase_contains_some_arg2_10x_word(phrase_contains_some_arg2_10x_word(LocationLocation,_),,_),
phrase_previous(phrase_previous(LocationLocation,_), ,_),
avg_length_sentence(avg_length_sentence(SentenceSentence).).
0.15 Recall0.15 Recall 0.51 Precision0.51 Precision 0.23 F1 Score0.23 F1 Score
![Page 25: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/25.jpg)
Protein Localization Rule 2Protein Localization Rule 2
prot_loc(prot_loc(ProteinProtein,,LocationLocation,,SentenceSentence) :-) :-
phrase_contains_some_marked_up_arg2(phrase_contains_some_marked_up_arg2(LocationLocation,C),C)
phrase_contains_some_internal_cap_word(phrase_contains_some_internal_cap_word(ProteinProtein,,_),_),
word_previous(C,_).word_previous(C,_).
0.86 Recall0.86 Recall 0.12 Precision0.12 Precision 0.21 F1 Score0.21 F1 Score
![Page 26: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/26.jpg)
Precision-Focused SearchPrecision-Focused Search
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 27: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/27.jpg)
Recall-Focused SearchRecall-Focused Search
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 28: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/28.jpg)
F1-Focused SearchF1-Focused Search
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 29: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/29.jpg)
Aleph - LearningAleph - Learning
Aleph learnsAleph learns theories of rulestheories of rules (Srinivasan, v4, 2003)(Srinivasan, v4, 2003)– Pick positive seed examplePick positive seed example– Use heuristic search to find best ruleUse heuristic search to find best rule– Pick new seed from uncovered positivesPick new seed from uncovered positives
and repeat until threshold of positives and repeat until threshold of positives coveredcovered
Learning theories is time-consumingLearning theories is time-consuming Can we reduce time with ensembles?Can we reduce time with ensembles?
![Page 30: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/30.jpg)
GleanerGleaner
Definition of GleanerDefinition of Gleaner– One who gathers grain left behind by One who gathers grain left behind by
reapersreapers
Key Ideas of GleanerKey Ideas of Gleaner– Use Aleph as underlying ILP rule engineUse Aleph as underlying ILP rule engine– Search rule space with Rapid Random Search rule space with Rapid Random
RestartRestart– Keep wide range of rules usually discardedKeep wide range of rules usually discarded– Create separate theories for diverse recallCreate separate theories for diverse recall
![Page 31: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/31.jpg)
Gleaner - LearningGleaner - LearningP
reci
sion
Recall
Create Create BB Bins Bins Generate ClausesGenerate Clauses Record Best per Record Best per
BinBin
![Page 32: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/32.jpg)
Gleaner - LearningGleaner - Learning
Recall
Seed 1
Seed 2
Seed 3
Seed K
.
.
.
![Page 33: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/33.jpg)
Gleaner - EnsembleGleaner - Ensemble
.
.
.
.
.
pos1: prot_loc(…)
pos1: prot_loc(…) 12
pos2: prot_loc(…) 47
pos3: prot_loc(…) 55
neg1: prot_loc(…) 5
neg2: prot_loc(…) 14
neg3: prot_loc(…) 2
neg4: prot_loc(…) 18
12pos2: prot_loc(…) 47
Pos
Neg
Pos
Pos
Neg
Neg
Pos
Rules from bin 5
![Page 34: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/34.jpg)
Gleaner - EnsembleGleaner - Ensemble
Recall
Pre
cisi
on
1.0
1.0pos3: prot_loc(…)
neg28: prot_loc(…)
pos2: prot_loc(…)
neg4: prot_loc(…)
neg475: prot_loc(…)
.
pos9: prot_loc(…)
neg15: prot_loc(…).
55
52
47
18
17
17
16
ScoreExamples
1.00 0.05
0.50 0.05
0.66 0.10
0.12 0.85
0.13 0.90
0.12 0.90
Precision Recall
![Page 35: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/35.jpg)
Gleaner - OverlapGleaner - Overlap
For each bin, take the topmost curveFor each bin, take the topmost curve
Recall
Pre
cisi
on
![Page 36: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/36.jpg)
How to use GleanerHow to use GleanerP
reci
sion
Recall
Generate Test CurveGenerate Test Curve User Selects Recall BinUser Selects Recall Bin Return ClassificationsReturn Classifications
Ordered By Their ScoreOrdered By Their Score
Recall = 0.50Precision = 0.70
![Page 37: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/37.jpg)
Aleph EnsemblesAleph Ensembles We compare to We compare to ensembles of theoriesensembles of theories AlgorithmAlgorithm ( (Dutra Dutra et alet al ILP 2002 ILP 2002))
– Use Use KK different initial seeds different initial seeds – Learn Learn KK theories containing theories containing CC rules rules– Rank examples by the number of theoriesRank examples by the number of theories
Need to balance Need to balance CC for high for high performanceperformance– Small Small CC leads to low recall leads to low recall– Large Large CC leads to converging theories leads to converging theories
![Page 38: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/38.jpg)
Evaluation MetricsEvaluation Metrics
Area Under Recall-Area Under Recall-Precision Curve Precision Curve (AURPC)(AURPC)– All curves All curves
standardized standardized to cover full recall to cover full recall rangerange
– Averaged AURPC Averaged AURPC over 5 foldsover 5 folds
Number of clauses Number of clauses consideredconsidered– Rough estimate of Rough estimate of
timetime
Recall
Pre
cisi
on
1.0
1.0
![Page 39: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/39.jpg)
YPD Protein LocalizationYPD Protein Localization
Hand-labeled datasetHand-labeled dataset (Ray & Craven ’01)(Ray & Craven ’01)
– 7,245 sentences from 871 abstracts 7,245 sentences from 871 abstracts – Examples are phrase-phrase combinationsExamples are phrase-phrase combinations
1,810 positive & 279,154 negative1,810 positive & 279,154 negative
1.6 GB of background knowledge1.6 GB of background knowledge– Structural, Statistical, Lexical and Structural, Statistical, Lexical and
OntologicalOntological– In total, 200+ distinct background In total, 200+ distinct background
predicatespredicates
![Page 40: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/40.jpg)
Experimental MethodologyExperimental Methodology Performed five-fold cross-validationPerformed five-fold cross-validation Variation of parametersVariation of parameters
– Gleaner (20 recall bins)Gleaner (20 recall bins) # seeds = {25, 50, 75, 100}# seeds = {25, 50, 75, 100} # clauses = {1K, 10K, 25K, 50K, 100K, 250K, # clauses = {1K, 10K, 25K, 50K, 100K, 250K,
500K}500K}
– Ensembles (0.75 minacc, 1K and 35K nodes)Ensembles (0.75 minacc, 1K and 35K nodes) # theories = {10, 25, 50, 75, 100}# theories = {10, 25, 50, 75, 100} # clauses per theory = {1, 5, 10, 15, 20, 25, 50}# clauses per theory = {1, 5, 10, 15, 20, 25, 50}
![Page 41: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/41.jpg)
PR Curves - 100,000 ClausesPR Curves - 100,000 Clauses
![Page 42: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/42.jpg)
PR Curves - 1,000,000 PR Curves - 1,000,000 ClausesClauses
![Page 43: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/43.jpg)
Protein Localization ResultsProtein Localization Results
![Page 44: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/44.jpg)
Genetic Disorder ResultsGenetic Disorder Results
![Page 45: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/45.jpg)
Current DirectionsCurrent Directions
Learn diverse rules across seedsLearn diverse rules across seeds Calculate probabilistic scores for Calculate probabilistic scores for
examplesexamples Directed Rapid Random RestartsDirected Rapid Random Restarts Cache rule information to speed Cache rule information to speed
scoringscoring Transfer learning across seedsTransfer learning across seeds Explore Active Learning within ILPExplore Active Learning within ILP
![Page 46: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/46.jpg)
Take-Home MessageTake-Home Message
Biology, Gleaner and ILPBiology, Gleaner and ILP– Challenging problems in biology can be Challenging problems in biology can be
naturally formulated for Inductive Logic naturally formulated for Inductive Logic ProgrammingProgramming
– Many rules constructed and evaluated in Many rules constructed and evaluated in ILP hypothesis searchILP hypothesis search
– Gleaner makes use of those rules that Gleaner makes use of those rules that are not the highest scoring ones for are not the highest scoring ones for improved speed and performanceimproved speed and performance
![Page 47: Gleaning Relational Information from Biomedical Text](https://reader033.fdocuments.in/reader033/viewer/2022051820/568144d4550346895db1a086/html5/thumbnails/47.jpg)
AcknowledgementsAcknowledgements
USA DARPA Grant F30602-01-2-0571USA DARPA Grant F30602-01-2-0571 USA Air Force Grant F30602-01-2-0571USA Air Force Grant F30602-01-2-0571 USA NLM Grant 5T15LM007359-02USA NLM Grant 5T15LM007359-02 USA NLM Grant 1R01LM07050-01USA NLM Grant 1R01LM07050-01 UW Condor GroupUW Condor Group David Page, Vitor Santos Costa, Ines David Page, Vitor Santos Costa, Ines
Dutra, Soumya Ray, Marios Skounakis, Dutra, Soumya Ray, Marios Skounakis, Mark Craven, Burr Settles, Jesse Davis, Mark Craven, Burr Settles, Jesse Davis, Sarah Cunningham, David Haight, Ameet Sarah Cunningham, David Haight, Ameet SoniSoni