Using text to build semantic networks for pharmacaogenomics2

35
Using Text to Build Semantics Networks for Pharmacogenomics George Karystianis Adrien Coulet, Nigam Shah, Yael Garten, Mark Musen, Russ B. Altman Journal of Biomedical informatics (2010)

Transcript of Using text to build semantic networks for pharmacaogenomics2

Page 1: Using text to build semantic networks for pharmacaogenomics2

Using Text to Build Semantics Networks for Pharmacogenomics

George Karystianis

Adrien Coulet, Nigam Shah, Yael Garten, Mark Musen, Russ B. Altman

Journal of Biomedical informatics (2010)

Page 2: Using text to build semantic networks for pharmacaogenomics2

2

Motivation● Manually crafted rules to define relationships

between entities.– Limited scope domains.

● Pharmacogenomics.– Semantic complexity.

● Enhance the PharmaGKB.● Large size of literature.● NLP techniques promising.

Page 3: Using text to build semantic networks for pharmacaogenomics2

3

Aim

● Automatic relationship extraction.● Entity mapping in a schema.

– Semantic network structure.

● Curation of PGx knowledge.● Resource for knowledge discovery.

Page 4: Using text to build semantic networks for pharmacaogenomics2

4

However...

Page 5: Using text to build semantic networks for pharmacaogenomics2

5

What is the meaning of Pharmacogenomics?

Page 6: Using text to build semantic networks for pharmacaogenomics2

6

Pharmacogenomics (1)

Pharmaco Genomics PGx

Φάρμακο Γίνομαι

Page 7: Using text to build semantic networks for pharmacaogenomics2

7

Pharmacogenomics (2)

● How genetic variation influences drug response in patients.

● Most of this knowledge presented in binary relationships.

R(a,b)

Relationship ObjectSubject

Page 8: Using text to build semantic networks for pharmacaogenomics2

8

Is This Something New?

● Co-occurrence approach:– Pharmexpresso.

– Tri-co-occurrences.

● Syntactic parser approach:– OpenDMAP.

– Vocabularies.

Complex relationship semantics.

Manual relationship evaluation.

Explicit relationship identification.

Large pattern sets.

Stable ontologies.

Page 9: Using text to build semantic networks for pharmacaogenomics2

9

So...

Regular gene expression networksDrug-disease networks

Molecular interaction networksGene-disease networks

Page 10: Using text to build semantic networks for pharmacaogenomics2

10

Method Overview

MEDLINEAbstracts

DependencyGraphs ofSentences

R

Ontology

PGx network

Page 11: Using text to build semantic networks for pharmacaogenomics2

11

1a. Sentence Parsing

● Implementation of lexicons for sentence retrieval.

● Stanford Parser.● Focused on sentences with at least 2 key PGX

entities.

Page 12: Using text to build semantic networks for pharmacaogenomics2

12

1b. Sentence Parsing

● Querying the sentence index using seeds.– particular terms corresponding to recognized entities.

– focus on gene-drug/gene-phenotype pairs.

● Reducing set/size of parse trees.● Parse trees -> dependency graphs.

– rooted, oriented, labelled, easy to read, process, understand than parse trees.

Page 13: Using text to build semantic networks for pharmacaogenomics2

13

Parsing Example“Several single nucleotide polymorphisms (SNPs) in VKORC1 are associated

with warfarin dose across the normal dose range”

Page 14: Using text to build semantic networks for pharmacaogenomics2

14

Dependency Graph

Page 15: Using text to build semantic networks for pharmacaogenomics2

15

2a. Relation Extraction

● Sentence analysis for raw relationship extraction.

● Seed recognition:– through PharmGKB lexicons.

● Seed expansion:– edge traversal of DG to see if the seed is a key entity

or a modified entity.

Page 16: Using text to build semantic networks for pharmacaogenomics2

16

Dependencies for Seed Expansion

● Expand the seed● End the expansion● Interrupt the expansion

Page 17: Using text to build semantic networks for pharmacaogenomics2

17

2b. Relation Extraction

● Seed coupling– Two seeds wend with a normalised verb.

– Relationship creation.

Page 18: Using text to build semantic networks for pharmacaogenomics2

18

2c. Relation Extraction

● Evaluation of precision:– manual precision evaluation of extracting raw

relationships.

– random selection of 220 raw relationships.

– classification-complete and true, incomplete and true, false.

Page 19: Using text to build semantic networks for pharmacaogenomics2

19

3. Ontology Construction

● Identification of R types.● Hierarchical organisation of R types and E.

– 4 lists: most frequent, the most frequent modified entities by genes, drugs, phenotype.

● Refine choice available.

Page 20: Using text to build semantic networks for pharmacaogenomics2

20

4a. Relationship Normalization

● Application of ontology to relationship instances.

● Creation of set of normalised relationships.● Normalization of entity names:

– modified entity name returned in normalized form according to ontology.

– Decomposition of modified entity to iterate for the construction of normalised form.

Page 21: Using text to build semantic networks for pharmacaogenomics2

21

Example

Page 22: Using text to build semantic networks for pharmacaogenomics2

22

Example● Seed: VKORC1_polymorphisms.

● Seed concept: Gene.

● Next word: polymorphism.

– refers to a concept modified by Gene.

– synonym of the concept “variant”.

● Normalised word: – VKORC1_variant.

Page 23: Using text to build semantic networks for pharmacaogenomics2

23

4b. Relation Normalization

● Normalization of relationship types.– search for a role label which matches the relationship.

– the identifier of the corresponding role is the normalized type.

– creation of knowledge base of PGX relationships.

Page 24: Using text to build semantic networks for pharmacaogenomics2

24

Did it work?

● Input: – 17.396.436 MEDLINE abstracts

● Sentences: – 87.806.828.

● Sentences with pairs of PGx entities: – 295.569.

● After pruning:– 41.134 raw relationships, 21.050 gene-drug pair,

20.084 gene-phenotype pair.

Page 25: Using text to build semantic networks for pharmacaogenomics2

25

Page 26: Using text to build semantic networks for pharmacaogenomics2

26

Results

● The 200 most frequent raw relationship types:– 80% of the extracted relationships.

● Creation of an ontology:– 200 most frequent relationship types and modified

entities called PHARE-PHArmacogenomics RElationships.

– 237 concepts and 76 roles.

Page 27: Using text to build semantic networks for pharmacaogenomics2

27

Results (2)

Page 28: Using text to build semantic networks for pharmacaogenomics2

28

Results (3)

Page 29: Using text to build semantic networks for pharmacaogenomics2

29

Page 30: Using text to build semantic networks for pharmacaogenomics2

30

Page 31: Using text to build semantic networks for pharmacaogenomics2

31

Discussion (1)

● Identification of both PGx entities.● Identification of PGx modified entities.● Use of key entity lexicons for discovery and

normalization of modified entities. ● Record and recognition of modified entities

under very general textual conditions.● Flexible, precise method.

Page 32: Using text to build semantic networks for pharmacaogenomics2

32

Discussion (2)

● Concern: lower recall due to the large corpus size.

– improve precision with full text parsing.

● Applicable to other domains.– Human effort required for the ontology creation.

Page 33: Using text to build semantic networks for pharmacaogenomics2

33

Conclusions (1)● New method for PGX relationship extraction.● Use of key PGX entities to identify modified

entities.● Capture and normalization of raw

relationships.● Automatic labelling of parsed sentences.

Page 34: Using text to build semantic networks for pharmacaogenomics2

34

Conclusions (2)

● Creation of a knowledge base.● Creation of relationship summaries between:

– Genes, drugs, phenotypes.

● Novel approach for PGX text processing.

Page 35: Using text to build semantic networks for pharmacaogenomics2

35

Questions?

Ερωτήσεις;

Questions? (in French ^_^)

Preguntas?

質問 ?