Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
2
Transcript of Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF...
![Page 1: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/1.jpg)
Semantic Relation Detection
in Bioscience Text
Marti HearstSIMS, UC Berkeley
http://biotext.berkeley.eduSupported by NSF DBI-0317510 and a gift from Genentech
![Page 2: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/2.jpg)
BioText Project Goals
Provide flexible, intelligent access to information for use in biosciences applications.
Focus on Textual Information from Journal Articles Tightly integrated with other resources
Ontologies Record-based databases
![Page 3: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/3.jpg)
Project Team
Project Leaders: PI: Marti Hearst Co-PI: Adam Arkin
Computational Linguistics Barbara Rosario Presley Nakov
Database Research Ariel Schwartz Gaurav Bhalotia (graduated)
Supported primarily by NSF DBI-0317510
and a gift from Genentech
User Interface / IR Adam Newberger Dr. Emilia Stoica
Bioscience Dr. TingTing Zhang Janice Hamerja
![Page 4: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/4.jpg)
BioText Architecture
Sophisticated Text Analysis
Annotations inDatabase
ImprovedSearch Interface
![Page 5: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/5.jpg)
The Nature of Bioscience Text
Claim: Bioscience semantics are simultaneously
easier and harder than general text.
Fewer subtletiesFewer ambiguities
“Systematic” meanings
Enormous terminologyComplex sentence structure
easier harder
![Page 6: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/6.jpg)
Sample Sentence
“Recent research, in proliferating cells,
has demonstrated that interaction of E2F1 with the p53 pathway could involve transcriptional up-regulation of E2F1 target genes such as p14/p19ARF, which affect p53 accumulation [67,68], E2F1-induced phosphorylation of p53 [69], or direct E2F1-p53 complex formation [70].”
![Page 7: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/7.jpg)
BioScience Researchers
Read A LOT! Cite A LOT! Curate A LOT! Are interested in specific relations,
e.g.: What is the role of this protein in that
pathway? Show me articles in which a comparison
between two values is significant.
![Page 8: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/8.jpg)
This Talk
Discovering semantic relations Between nouns in noun compounds Between entities in sentences
Acquiring labeled data: Idea: use text surrounding citations to
documents to identify paraphrases A new direction; preliminary work only
![Page 9: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/9.jpg)
Noun CompoundRelation Recognition
![Page 10: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/10.jpg)
Noun Compounds (NCs)
Technical text is rich with NCs
Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.
NC is any sequence of nouns that itself functions as a noun asthma hospitalizations health care personnel hand wash
![Page 11: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/11.jpg)
NCs: 3 computational tasks
Identification Syntactic analysis (attachments)
[Baseline [headache frequency]] [[Tension headache] patient]
Our Goal: Semantic analysis Headache treatment treatment for
headache Corticosteroid treatment treatment that uses
corticosteroid
![Page 12: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/12.jpg)
Descent of Hierarchy
Idea: Use the top levels of a lexical
hierarchy to identify semantic relations
Hypothesis: A particular semantic relation holds
between all 2-word NCs that can be categorized by a lexical category pair.
![Page 13: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/13.jpg)
Related work (Semantic analysis of NCs)
Rule-based Finin (1980)
Detailed AI analysis, hand-coded Vanderwende (1994)
automatically extracts semantic information from an on-line dictionary, manipulates a set of handwritten rules. 13 classes, 52% accuracy
Probabilistic Lauer (1995):
probabilistic model, 8 classes, 47% accuracy Lapata (2000)
classifies nominalizations into subject/object. 2 classes, 80% accuracy
![Page 14: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/14.jpg)
Related work (Semantic analysis of NCs)
Lexical Hierarchy Barrett et al. (2001)
WordNet, heuristics to classify a NC given the similarity to a known NC
Rosario and Hearst (2001) Relations pre-defined MeSH, Neural Network. 18 classes, 60% accuracy
![Page 15: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/15.jpg)
Linguistic MotivationCan cast NC into head-modifier relation, and assume head noun has an argument and qualia structure.
(used-in): kitchen knife (made-of): steel knife (instrument-for): carving knife (used-on): putty knife (used-by): butcher’s knife
![Page 16: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/16.jpg)
The lexical Hierarchy: MeSH
1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]
![Page 17: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/17.jpg)
The lexical Hierarchy: MeSH
1. Anatomy [A] Body Regions [A01] 2. [B] Musculoskeletal System [A02] 3. [C] Digestive System [A03] 4. [D] Respiratory System [A04] 5. [E] Urogenital System [A05] 6. [F] …… 7. [G] 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]
![Page 18: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/18.jpg)
Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen
[A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities
[A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] 9. [I] 10. [J] 11. [K] 12. [L] 13. [M]
![Page 19: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/19.jpg)
Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen
[A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities
[A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics 9. [I] Astronomy 10. [J] Nature 11. [K] Time 12. [L] Weights and Measures 13. [M] ….
![Page 20: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/20.jpg)
Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen
[A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities
[A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures 13. [M] ….
![Page 21: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/21.jpg)
Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen
[A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities
[A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures Calibration 13. [M] …. Metric System Reference
Standard
![Page 22: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/22.jpg)
Descending the Hierarchy 1. Anatomy [A] Body Regions [A01] Abdomen
[A01.047] 2. [B] Musculoskeletal System [A02] Back [A01.176] 3. [C] Digestive System [A03] Breast [A01.236] 4. [D] Respiratory System [A04] Extremities
[A01.378] 5. [E] Urogenital System [A05] Head [A01.456] 6. [F] …… Neck [A01.598] 7. [G] …. 8. Physical Sciences [H] Electronics Amplifiers 9. [I] Astronomy Electronics, Medical 10. [J] Nature Transducers 11. [K] Time 12. [L] Weights and Measures Calibration 13. [M] …. Metric System Reference
Standard
Homogeneous
Heterogeneous
![Page 23: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/23.jpg)
Mapping Nouns to MeSH Concepts
headache recurrence C23.888.592.612.441 C23.550.291.937
headache painC23.888.592.612.441 G11.561.796.444
![Page 24: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/24.jpg)
Levels of Description
headache pain
Level 0: C.23 G.11 Level 1: C23.888 G11.561 Level 1: C23.888.592 G11.561.796 … Original: C23.888.592.612.441 G11.561.796.444
![Page 25: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/25.jpg)
Descent of Hierarchy
Idea: Words falling in homogeneous MeSH
subhierarchies behave “similarly” with respect to relation assignment
Hypothesis: A particular semantic relation holds
between all 2-word NCs that can be categorized by a MeSH category pairs
![Page 26: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/26.jpg)
Grouping the NCs CP: A02 C04 (Musculoskeletal System,
Neoplasms) skull tumors, bone cysts, bone metastases, skull
osteosarcoma… CP: C04 M01 (Neoplasms, Person)
leukemia survivor, lymphoma patients, cancer physician, cancer nurses…
![Page 27: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/27.jpg)
Distribution of Category Pairs
![Page 28: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/28.jpg)
Collection ~70,000 NCs extracted from titles and
abstracts of Medline 2,627 CPs at level 0 (with at least 10 unique
NCs) We analyzed
250 CPs with Anatomy (A) 21 CPs with Natural Science (H01) 3 CPs with Neoplasm (C04)
This represents 10% of total CPs and 20% of total NCs
![Page 29: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/29.jpg)
For each CP
Divide its NCs into “training-testing” sets
“Training”: inspect NCs by hand Start from level 0 0 While NCs are not all similar
descend one level of the hierarchy Repeat until all NCs for that CP are similar
Classification Method
![Page 30: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/30.jpg)
Classification Decisions A02 C04 B06 B06 C04 M01
C04 M01.643 C04 M01.526
A01 H01 A01 H01.770 A01 H01.671
A01 H01.671.538 A01 H01.671.868
A01 M01 A01 M01.643 A01 M01.526 A01 M01.898
![Page 31: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/31.jpg)
Classification Decisions + Relations
A02 C04 Location of Disease B06 B06 Kind of Plants C04 M01
C04 M01.643 Person afflicted by Disease C04 M01.526 Person who treats Disease
A01 H01 A01 H01.770 A01 H01.671
A01 H01.671.538 A01 H01.671.868
A01 M01 A01 M01.643 A01 M01.526 A01 M01.898
![Page 32: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/32.jpg)
Classification Decisions + Relations
A02 C04 Location of Disease B06 B06 Kind of Plants C04 M01
C04 M01.643 Person afflicted by Disease C04 M01.526 Person who treats Disease
A01 H01 A01 H01.770 A01 H01.671
A01 H01.671.538 A01 H01.671.868
A01 M01 A01 M01.643 Person afflicted by Disease A01 M01.526 A01 M01.898
![Page 33: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/33.jpg)
Classification Decision Levels Anatomy: 250 CPs
187 (75%) remain first level 56 (22%) descend one level 7 (3%) descend two levels
Natural Science (H01): 21 CPs 1 ( 4%) remain first level 8 (39%) descend one level 12 (57%) descend two levels
Neoplasms (C04) 3 CPs: 3 (100%) descend one level
![Page 34: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/34.jpg)
Evaluation Test the decisions on “testing” set Count how many NCs that fall in the groups
defined in the classification decisions are similar to each other
Accuracy (for 2nd noun): Anatomy: 91% Natural Science: 79% Neoplasm: 100%
Total Accuracy : 90.8% Generalization: our 415 classification
decisions cover ~ 46,000 possible CP pairs
![Page 35: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/35.jpg)
Ambiguity – Two Types
Lexical ambiguity: mortality
state of being mortal death rate
Relationship ambiguity: bacteria mortality
death of bacteria death caused by bacteria
![Page 36: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/36.jpg)
Four CasesSingle MeSH senses Multiple MeSH senses
Only one possible relationship: abdomen radiography, aciclovir treatment
Multiple relationships: hospital databases, education efforts, kidney metabolism
Only one possible relationship: alcoholism treatment
Ambiguity of relationship
Multiple relationships bacteria mortality
![Page 37: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/37.jpg)
Four CasesSingle MeSH senses Multiple MeSH senses
Only one possible relationship: abdomen radiography, aciclovir treatment
Multiple relationships: hospital databases, education efforts, kidney metabolism
Only one possible relationship: alcoholism treatment
Ambiguity of relationship
Multiple relationships bacteria mortality
Most problematic cases
… but rare!
![Page 38: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/38.jpg)
Conclusions on NN Relation Classification
Very simple method for assigning semantic relations to two-word technical NCs 90.8% accuracy
Lexical resource (MeSH) useful for this task
Probably works because of the relative lack of ambiguity in this kind of technical text.
![Page 39: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/39.jpg)
Entity-EntityRelation Recognition
![Page 40: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/40.jpg)
Problem: Which relations hold between 2 entities?
Treatment Disease
Cure?
Prevent?
Side Effect?
![Page 41: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/41.jpg)
Hepatitis Examples
Cure These results suggest that con A-induced
hepatitis was ameliorated by pretreatment with TJ-135.
Prevent A two-dose combined hepatitis A and B
vaccine would facilitate immunization programs
Vague Effect of interferon on hepatitis B
![Page 42: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/42.jpg)
Two tasks
Relationship Extraction: Identify the several semantic relations
that can occur between the entities disease and treatment in bioscience text
Entity extraction: Related problem: identify such entities
![Page 43: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/43.jpg)
The Approach
Data: MEDLINE abstracts and titles Graphical models
Combine in one framework both relation and entity extraction
Both static and dynamic models Simple discriminative approach:
Neural network Lexical, syntactic and semantic
features
![Page 44: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/44.jpg)
Related Work We allow several DIFFERENT relations between
the same entities Thus differs from the problem statement of other
work on relations Many find one relation which holds between
two entities (many based on ACE) Agichtein and Gravano (2000), lexical patterns for location of Zelenko et al. (2002) SVM for person affiliation and
organization-location Hasegawa et al. (ACL 2004) Person-Organization -> President
“relation” Craven (1999, 2001) HMM for subcellular-location and
disorder-association Doesn’t identify the actual relation
![Page 45: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/45.jpg)
Related work: Bioscience
Many hand-built rules Feldman et al. (2002), Friedman et al. (2001) Pustejovsky et al. (2002) Saric et al.; this conference
![Page 46: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/46.jpg)
Data and Relations
MEDLINE, abstracts and titles 3662 sentences labeled
Relevant: 1724 Irrelevant: 1771
e.g., “Patients were followed up for 6 months” 2 types of Entities, many instances
treatment and disease 7 Relationships between these entities
![Page 47: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/47.jpg)
Semantic Relationships 810: Cure
Intravenous immune globulin for recurrent spontaneous abortion
616: Only Disease Social ties and susceptibility to the common
cold 166: Only Treatment
Flucticasone propionate is safe in recommended doses
63: Prevent Statins for prevention of stroke
![Page 48: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/48.jpg)
Semantic Relationships 36: Vague
Phenylbutazone and leukemia 29: Side Effect
Malignant mesodermal mixed tumor of the uterus following irradiation
4: Does NOT cure Evidence for double resistance to
permethrin and malathion in head lice
![Page 49: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/49.jpg)
Features Word Part of speech Phrase constituent Orthographic features
‘is number’, ‘all letters are capitalized’, ‘first letter is capitalized’ …
MeSH (semantic features) Replace words, or sequences of words, with
generalizations via MeSH categories Peritoneum -> Abdomen
![Page 50: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/50.jpg)
Models
2 static generative models 3 dynamic generative models 1 discriminative model (neural
network)
![Page 51: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/51.jpg)
Static Graphical Models S1: observations dependent on Role
but independent from Relation given roles
S2: observations dependent on both Relation and Role
S1 S2
![Page 52: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/52.jpg)
Dynamic Graphical Models
D1, D2 as in S1, S2
D3: only one observation per state isdependent on both the relation and the role
D1
D2
D3
![Page 53: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/53.jpg)
Graphical Models Relation node:
Semantic relation (cure, prevent, none..) expressed in the sentence
![Page 54: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/54.jpg)
Graphical Models
Role nodes: 3 choices: treatment, disease, or
none
![Page 55: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/55.jpg)
Graphical Models
Feature nodes (observed): word, POS, MeSH…
![Page 56: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/56.jpg)
Graphical Models
For Dynamic Model D1: Joint probability distribution over relation,
roles and features nodes
Parameters estimated with maximum likelihood and absolute discounting smoothing
) Role | P(f, Rela) | RoleP(Role
Rela)|oleP(Rela)P(R)f,..f,RoleleP(Rela, Ro
t
T
1t
n
j
jtt-1t
0nTT0
1
10 , ,..,
![Page 57: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/57.jpg)
Neural Network
Feed-forward network (MATLAB) Training with conjugate gradient
descent One hidden layer (hyperbolic tangent
function) Logistic sigmoid function for the output
layer representing the relationships Same features Discriminative approach
![Page 58: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/58.jpg)
Role extraction
Results in terms of F-measure Graphical models
Junction tree algorithm (BNT) Relation hidden and marginalized over
Neural Net Couldn’t run it (features vectors too large)
(Graphical models can do role extraction and relationship classification simultaneously)
![Page 59: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/59.jpg)
Role Extraction: Results
F-measuresD1 best when no smoothing
![Page 60: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/60.jpg)
Role Extraction: ResultsF-measuresD2 best with smoothing, but doesn’t boost
scores as much as in relation classification
![Page 61: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/61.jpg)
Role Extraction: ResultsStatic models better than Dynamic for
Note: No Neural Networks
![Page 62: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/62.jpg)
Relation classification: Results
With Smoothing and Roles, D1 best GM
![Page 63: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/63.jpg)
Features impact: Role Extraction
Most important features: 1)Word, 2)MeSH
Models D1 D2 All features 0.67 0.71 No word 0.58 0.61
-13.4% -14.1% No MeSH 0.63 0.65
-5.9% -8.4%
(rel. + irrel.)
![Page 64: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/64.jpg)
Most important features: Roles
Accuracy: D1 D2 NN All feat. + roles 91.6 82.0 96.9 All feat. – roles 68.9 74.9 79.6
-24.7% -8.7% -17.8% All feat. + roles – Word 91.6 79.8 96.4
0% -2.8% -0.5% All feat. + roles – MeSH 91.6 84.6 97.3
0% 3.1% 0.4%
Features impact: Relation classification
(rel. + irrel.)
![Page 65: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/65.jpg)
Relation extraction Results in terms of classification accuracy
(with and without irrelevant sentences) 2 cases:
Roles hidden Roles given
Graphical models
NN: simple classification problem
)f,..,f,,...,RoleRole,P(RelaRela nTTkRela
^
k
argmax 100
![Page 66: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/66.jpg)
Relation classification: Results
Neural Net always best
![Page 67: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/67.jpg)
Relation classification: Results
With Smoothing and No Roles, D2 best GM
![Page 68: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/68.jpg)
Relation classification: Results
Dynamic models always outperform Static
![Page 69: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/69.jpg)
Relation classification: Results
With no smoothing, D1 best Graphical Model
![Page 70: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/70.jpg)
Relation classification: Confusion Matrix
Computed for the model D2, “rel + irrel.”, “only features”
![Page 71: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/71.jpg)
Features impact: Relation classification
Most realistic case: Roles not known Most important features: 1) Mesh 2) Word for D1
and NN (but vice versa for D2)
Accuracy: D1 D2 NN All feat. – roles 68.9 74.9 79.6 All feat. - roles – Word 66.7 66.1 76.2
-3.3% -11.8% -4.3% All feat. - roles – MeSH 62.7 72.5 74.1
-9.1% -3.2% -6.9% (rel. + irrel.)
![Page 72: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/72.jpg)
Relation Recognition: Conclusions
Classification of subtle semantic relations in bioscience text Discriminative model (neural network) achieves
high classification accuracy Graphical models for the simultaneous extraction
of entities and relationships Importance of lexical hierarchy
Next Step: Different entities/relations Semi-supervised learning to discover relation types
![Page 73: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/73.jpg)
Acquiring Labeled Data using Citances
![Page 74: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/74.jpg)
A discovery is made …
A paper is written …
![Page 75: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/75.jpg)
That paper is cited …
and cited …
and cited …
… as the evidence for some fact(s) F.
![Page 76: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/76.jpg)
Each of these in turn are cited for some fact(s) …
… until it is the case that all important facts in the field can be found in citationsentences alone!
![Page 77: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/77.jpg)
Citances Nearly every statement in a bioscience journal article
is backed up with a cite. It is quite common for papers to be cited 30-100
times. The text around the citation tends to state biological
facts. (Call these citances.)
Different citances will state the same facts in different ways …
… so can we use these for creating models of language expressing semantic relations?
![Page 78: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/78.jpg)
Using Citances Potential uses of citation sentences (citances)
creation of training and testing data for semantic analysis,
synonym set creation, database curation, document summarization, and information retrieval generally.
Some preliminary results: Citances to a document align well with a hand-built
curation. Citances are good candidates for paraphrase creation.
![Page 79: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/79.jpg)
Citances for Acquiring Examples of Semantic Relations
A relationship type R between entities of type A and B can be expressed in many ways.
Use citances to build a model the different ways to express the relationship:
Seed learning algorithms with examples that mention A and B, for which relation R holds.
Train a model to recognize R when the relation is not known.
Results may extend to sentences that are not citances as well.
![Page 80: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/80.jpg)
Issues for Processing Citances
Text span Identification of the appropriate phrase, clause,
or sentence that constructs a citance. Correct mapping of citations when shown as lists
or groups (e.g., “[22-25]”). Grouping citances by topic
Citances that cite the same document should be grouped by the facts they state.
Normalizing or paraphrasing citances For IR, summarization, learning synonyms,
relation extraction, question answering, and machine translation.
![Page 81: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/81.jpg)
Related Work Traditional citation analysis dates back to the
1960’s (Garfield). Includes: Citation categorization, Context analysis, Citer motivation.
Citation indexing systems, such as ISI’s SCI, and CiteSeer. Mercer and Di Marco (2004) propose to improve
citation indexing using citation types. Bradshaw (2003) introduces Reference Directed
Indexing (RDI), which indexes documents using the terms in the citances citing them.
![Page 82: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/82.jpg)
Related Work (cont.)
Teufel and Moens (2002) identify citances to improve summarization of the citing paper..
Nanba et. al. (2000) use citances as features for classifying papers into topics.
Related field to citation indexing is the use of link structure and anchor text of Web pages. Applications include: IR, classification, Web
crawlers, and summarization.
![Page 83: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/83.jpg)
Example: protein-protein
![Page 84: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/84.jpg)
Early results:Paraphrase Creation from Citances
![Page 85: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/85.jpg)
Sample Sentences NGF withdrawal from sympathetic neurons
induces Bim, which then contributes to death.
Nerve growth factor withdrawal induces the expression of Bim and mediates Bax dependent cytochrome c release and apoptosis.
The proapoptotic Bcl-2 family member Bim is strongly induced in sympathetic neurons in response to NGF withdrawal.
In neurons, the BH3 only Bcl2 member, Bim, and JNK are both implicated in apoptosis caused by nerve growth factor deprivation.
![Page 86: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/86.jpg)
Their Paraphrases NGF withdrawal induces Bim. Nerve growth factor withdrawal induces the
expression of Bim. Bim has been shown to be upregulated
following nerve growth factor withdrawal. Bim implicated in apoptosis caused by
nerve growth factor deprivation.
They all paraphrase: Bim is induced after NGF withdrawal.
![Page 87: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/87.jpg)
Paraphrase Creation Algorithm1. Extract the sentences that cite the target.
2. Mark the NEs of interest (genes/proteins, MeSH terms)
and normalize.3. Dependency parse (MiniPar).4. For each parse
For each pair of NEs of interesti. Extract the path between them.ii. Create a paraphrase from the path.
5. Rank the candidates for a given pair of NEs.6. Select only the ones above a threshold.7. Generalize.
![Page 88: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/88.jpg)
Creating a Paraphrase
Given the path from the dependency parse:Restore the original word order. Add words to improve grammaticality.
• Bim … shown … be … following nerve growth factor withdrawal.
• Bim [has] [been] shown [to] be [upregulated] following nerve growth factor withdrawal.
![Page 89: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/89.jpg)
2-word Heuristic Demonstration
NGF withdrawal induces Bim. Nerve growth factor withdrawal induces
[the] expression of Bim. Bim [has] [been] shown [to] be
[upregulated] following nerve growth factor withdrawal.
Bim [is] induced in [sympathetic] neurons in response to NGF withdrawal.
member Bim implicated in apoptosis caused by nerve growth factor deprivation.
![Page 90: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/90.jpg)
Evaluation (1) An influential journal paper from Neuron:
J. Whitfield, S. Neame, L. Paquet, O. Bernard, and J. Ham. Dominantnegative c-jun promotes neuronal survival by reducing bim expression and inhibiting mitochondrial cytochrome c release. Neuron, 29:629–643, 2001.
99 journal papers citing it 203 citances in total 36 different types of important biological
factoids But we concentrated on one model sentence:
“Bim is induced after NGF withdrawal.”
![Page 91: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/91.jpg)
Evaluation (2) Set 1: 67 citances pointing to the target
paper and manually found to contain a good or acceptable paraphrase (do not necessarily contain Bim or NGF); (Ideal conditions)
Set 2: 65 citances pointing to the target paper and containing both Bim and NGF;
Set 3: 102 sentences from the 99 texts, containing both Bim and NGF (Do citances do better than arbitrarily chosen
sentences?)
![Page 92: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/92.jpg)
Correctness (Judgments) Bad (0.0), if:
different relation (often phosphorylation aspect); opposite meaning; vagueness (wording not clear enough).
Acceptable (0.5), If it was not Bad and: contains additional terms (e.g., DP5 protein) or
topics (e.g., PPs like in sympathetic neurons); the relation was suggested but not definitely.
Else Good (1.0)
![Page 93: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/93.jpg)
Results Obtained 55, 65 and 102 paraphrases for
sets 1, 2 and 3 Only one paraphrase from each sentence
comparison of the dependency path to that of the model sentence
% - good (1.0) or acceptable (0.5)
![Page 94: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/94.jpg)
Correctness (Recall) Calculated on Set 1 60 paraphrases (out of 67 citances) 5 citances produced 2 paraphrases system recall: 55/67, i.e. 82.09% 10 of the 67 relevant in Set 1 initially
missed by the human annotator 8 good, 2 acceptable.
human recall is 57/67, i.e. 85.07%
![Page 95: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/95.jpg)
Misses Sample system miss (no NGF):
Growth factor withdrawal was shown to cause increased Bim expression in various populations of neuronal cell types.
Sample human miss: The precise targets of c-Jun necessary for the
induction of apoptosis have been the subject of intense interest and recently, Bim and Dp5, both “BH3-domain only” family members, have been identified as pro-apoptotic genes induced in a c-Jun-dependent manner in both sympathetic neurons subjected to NGF withdrawal and in cerebellar granule cells deprived of KCl.
![Page 96: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/96.jpg)
Grammaticality Missing coordinating “and”:
“Hrk/DP5 Bim [have] [been] found [to] be upregulated after NGF withdrawal”
Verb subcategorization “caused by NGF role for Bim”
Extra subject words member Bim implicated in apoptosis caused
by NGF deprivation sentence: “In neurons, the BH3-only Bcl2
member, Bim, and JNK are both implicated in apoptosis caused by NGF deprivation.”
![Page 97: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/97.jpg)
Related Work Word-level paraphrases. Grefenstette uses a
semantic parser to compare the distributional similarity of local contexts for synonyms extraction.
Phrase-level paraphrases. Barzilay&McKeown use POS information from the local context and co-training.
Template paraphrases. Lin&Pantel apply the idea of Grefenstette to dependency tree paths. Later refined by Shinyama&al.
Sentence-level paraphrases. Barzilay&Lee use multiple sequence alignment. Pang&al. merge parse trees into a transducer.
![Page 98: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/98.jpg)
Relevant Papers Citances: Citation Sentences for Semantic
Analysis of Bioscience Text, Preslav Nakov, Ariel Schwartz, and Marti Hearst, in the SIGIR'04 workshop on Search and Discovery in Bioinformatics.
Classifying Semantic Relations in Bioscience Text, Barbara Rosario and Marti Hearst, in ACL 2004.
The Descent of Hierarchy, and Selection in Relational Semantics, Barbara Rosario, Marti Hearst, and Charles Fillmore, in ACL 2002.
![Page 99: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/99.jpg)
Thank you!
Marti HearstSIMS, UC Berkeley
http://biotext.berkeley.edu
![Page 100: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/100.jpg)
Additional slides
![Page 101: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/101.jpg)
Our D1
Thompson et al. 2003Frame classification and role
labeling for FrameNet sentencesTarget word must be observed
More relations and roles
![Page 102: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/102.jpg)
Smoothing: absolute discounting
Lower the probability of seen events by subtracting a constant from their count (ML estimate: )
The remaining probability is evenly divided by the unseen events
e
MLec
eceP
)(
)()(
0)( if
0)( if )()(
eP
ePePeP
ML
MLMLad
events)seen (
events)seen (
UNc
c
![Page 103: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/103.jpg)
F-measures for role extraction in function of smoothing factors
![Page 104: Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510 and a gift from.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d375503460f94a103f6/html5/thumbnails/104.jpg)
Relation accuracies in function of smoothing factors