SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

23
HFE & BCR-ABL In Search of Links © 2014, TopicQuests Foundation Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Jack Park BigData Science Meetup Freemont, CA: 17 May, 2014 Shyam Sarkar, Organizer

description

SolrSherlock's HyperMembrane as an associative fabric component of a machine reading platform. The system entails topic maps, NLP, and a society of agents to support hypothesis formation, experiment planning, and Deep QA

Transcript of SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Page 1: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

HFE & BCR-ABL In Search of Links

© 2014, TopicQuests Foundation

Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Jack Park

BigData Science Meetup

Freemont, CA: 17 May, 2014

Shyam Sarkar, Organizer

Page 2: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Target Benefits

• SolrSherlock will support:

– Hypothesis formation

– Research/Experiment planning

– Deep Question Answering

• Personal medical issues

• … “Therefore psychologically we must keep all the theories in our heads, and every theoretical physicist who is any good knows six or seven different theoretical representations for exactly the same physics.” ―Richard Feynman

“Why, sometimes I've believed as many as six impossible things before breakfast.” ―The Queen: Through The Looking Glass

Page 3: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

What We Have Read: HFE

• Human hemochromatosis protein also known as the HFE protein is a protein which in humans is encoded by the HFE gene. The HFE gene is located on short arm of chromosome 6 at location 6p22.2* – Some mutations which are associated with

Hereditary Hemochromatosis (a genetic disease)**: • C282Y

• H63D *http://en.wikipedia.org/wiki/HFE_%28gene%29

**http://www.genome.gov/10001214

Page 4: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

What We Have Read: BCR-ABL aka: Philadelphia Chromosome

• Philadelphia chromosome or Philadelphia translocation is a specific chromosomal abnormality that is associated with chronic myelogenous leukemia (CML). It is the result of a reciprocal translocation between chromosome 9 and 22, and is specifically designated t(9;22)(q34;q11)*

*http://en.wikipedia.org/wiki/Philadelphia_chromosome

Page 5: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Are HFE and BCR-ABL Linked? • One document instance which suggests they

are linked:

– “We found that HFE C282Y might be associated with a protective role against CMPD. Because chronic iron deficiency or latent anemia may trigger disease susceptibility for CMPD, HFE C282Y positivity may be a genetic factor influencing this effect.”*

• Note: this response is simply evidence of a link, a signal; it leaves open many questions

CMPD: Chronic Myeloproliferative Disease

* http://www.ncbi.nlm.nih.gov/pubmed/19258483

Page 6: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Where do we go from here?

• We have read about some actors

• We seek evidence for relationships between those actors

• We have one small piece of evidence

• We turn to Literature-based Discovery (LBD)

– Read and process many papers

– Assemble an evidence field

– Determine answers and confidence levels

Page 7: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Sensemaking In Biological Research

http://www.biomedcentral.com/content/pdf/1471-2105-15-117.pdf Figure 1

© 2014 Mirel and Görg; licensee BioMed Central Ltd (cc by)

Page 8: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Literature-based Discovery

• Swanson’s ABC Model

• Two Varieties of LBD

– Closed Discovery

– Open Discovery

Page 9: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

SolrSherlock Block Level

• Models – Process Models – Conceptual Graphs – OpenBEL

• Identity – Topic Map

• Topics • Relations

• Associations – Bayes – DeepLearning – HyperMembrane

• Interface

Interface

Associations

Identity

Models

Data

Page 10: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

SolrSherlock’s HyperMembrane

• SolrSherlock Big Picture – Documents to harvest

– Sentences to parse • WordGrams from the sentences

– Lenses to interpret the sentences

» NTuples from the WordGrams

– Lenses to interpret whole documents

• HyperMembrane as a fabric woven from the Ntuples – Organizes statements read from literature into a kind

of associative fabric, linked into a topic map

Page 11: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

HyperMembrane Inspiration

http://xanadu.com/zigzag/ZZdnld/zzRefDef/

https://www.flickr.com/photos/portier/2927798222/sizes/s/

Page 12: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

HyperMembrane Internal Structure

Graph Agent

Structure Agent

Sentence Agent

Document Agent

Query Agent Information

Fabric

Page 13: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Sentence Parse

• Salient WordGrams in that sentence:

– C282Y

– might be associated with a

– protective role against

• Transforms to: protect against

– CMPD

We found that HFE C282Y might be associated with a protective role against CMPD

+-----------------MVp-----------------------------------+ | +---------Js------------+ | +---Cet------+ | | +-------Ds---------+ | +-Sp-+--TH--+ +--G-+--Ss--+--Ix---+---Pv-----+---MVp--+ | +----A---+ +--Js--+ | | | | | | | | | | | | | | we found.p that.c HFE C282Y might.v be.v associated.v with a protective.a role.n against CMPD

Parse produced by a Java implementation of Link

Grammar Parser

Page 14: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

WordGram instances created while processing

the sentence

WordGram Example

• Sentence: – CO2 causes climate change

• WordGrams – Terminals

• CO2 • causes • climate • Change

– Pairs • CO2 causes • causes climate • climate change

– Triples • CO2 causes climate • causes climate change

– Quads • CO2 causes climate change

• Parsed Result—representation of the sentence: – CO2 (terminal, noun) – cause (terminal, verb, transformed causescause) – climate change (pair, noun phrase)

• Resulting NTuple – {CO2, cause, climate change}

• Where the names are replaced with topic locators from the topic map

These WordGram instances represent the

sentence; they are wired into the fabric.

This Ntuple participates in high-level structure

formation and in question answering

WordGram instances created while processing

the sentence

WordGram instances created while processing

the sentence

WordGram instances created while processing

the sentence

Page 15: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Lenses

• Simple Interpreters – Based on Canonical Predicates – Build structures from parsed sentences and

WordGrams – Examples from biology

• Cause • Bind • Augment • Prevent • Increase • Decrease • Believe

Page 16: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Multiple Lenses

• Consider this sentence:

– We believe that A causes B

– Two Lenses in play

• Believe

• Cause

– Result is a nested NTuple

• {We, believe, {A, cause, B}}

Page 17: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Canonical Predicate

• Results from transformations on predicates

– E.g.

• A causes B, A can cause B, A will cause B A cause B

• A is caused by B B cause A

Page 18: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Actors: Named Entities

• For any given named entity, there will be one and only one WordGram – Issue of Ambiguity

• Same name string can serve different topics in the topic map – Topic map maintains identity for disambiguation

• Thus, a single WordGram might be associated with more than one individual actor

• This means: – Fibers (threads) flowing through the fabric must be

maintained in bundles according to their context (topic)

Page 19: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Lens Selection and Action

• The Lens:

– ProtectAgainst

• Selected by the WordGram for “protect against” – Is a transformation of the WordGram for “protective role

against”

• Lens Action:

– Create an NTuple

• {C282Y, protect against, CMPD}

• We will call that NTuple an Assertion

We found that HFE C282Y might be associated with a protective role against CMPD

Page 20: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Weaving an Information Fabric

• Background:

– One and only one WordGram for each Actor (named entity)

– One and only one WordGram for each canonical Predicate

– One and only one NTuple for each Assertion

• WordGrams which form an NTuple are strung together as beads on a string in the fabric.

– Thus, it is the detection of NTuple structures (Assertions) which form the HyperMembrane’s fabric.

Note: it is next to impossible to diagram the fabric, but it will likely look like a very tangled knotted structure. https://www.flickr.com/photos/fermicat/27

3539481/in/set-72157601620157588/

Page 21: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Fabric Example

• Two NTuples

– {Jack Park, AuthoredBook, The Wind Power Book}

– {Jack Park, AuthoredBook, Ohio State University Football Vault}

JP101 JP102

Book101

AuthoredBook

Wind Power Book

OSU Football…

Book102

Jack Park

Topic Map organizes fiber bundles

Page 22: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Looking Forward

• Lenses, today, are hardwired – Opportunity for adaptive learning of new lenses

• Fabric, today, is simple – Opportunity to use cardinalities, frequency counts

in the fabric for: • Probabilistic modeling

• Topological studies

• Opportunity for a Domain-Specific Language (DSL) to emerge

Page 23: SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery

Completed Representation

antioxidants kill

free radicals

Contraindicates

macrophages use free radicals to

kill bacteria

Bacterial Infection Antioxidants

Because

Appropriate For

Compromised Host

Let us co-create Cognitive Agents for Discovery [email protected]

Thanks to Mei Lin Fung, David Alexander Price, and Patrick Durusau for valuable comments

SolrSherlock at: http://debategraph.org/SolrSherlock and https://github.com/SolrSherlock