A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks...

85
A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    2

Transcript of A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks...

Page 1: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

A knowledge based approach for representing, reasoning and hypothesizing about biochemical networksChitta Baral

Arizona State University

Page 2: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Three parts to the talk

Prediction, Explanation and Planning with respect to biochemical networks

Hypothesis Generation with respect to biochemical networks

Collaborative BioCuration: CBioC

Page 3: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Motivation: purpose of interaction databases? Suppose: We have an almost exhaustive

database of the intracellular interactions (protein-protein, metabolic, etc.) of particular cells.

What next? How will we use this database? What if our knowledge is incomplete?

Page 4: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Motivation: Uses of networks & pathways Visualize the pathways Analyze the graphs of the networks Compare graphs of the networks Use pathway data in conjunction with micro-

array data analysis Do system level simulation Is that all?

Page 5: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Motivation: ultimate uses!

Prediction/System Simulation (Systems Biology?) Impact of particular perturbations (say caused by

a drug that introduces certain proteins to the cell membrane or into the cell)

Do the perturbations have the desired impact? Do they mess up something else? (side effects!)

But that’s not all!

Page 6: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Motivation: Explaining observations A phenotypical observation (leading to) OR

an observation that a particular protein or chemical has abnormally high concentration

What is wrong? What is out of the ordinary? The cause/explanation will give us

approaches to fix the problem. How deep should the explanations go? How do we compare explanations?

Page 7: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Motivation: Designing drugs & therapies What perturbations (when and where) need

to be made so as to make the cell behave in a particular way?

In case of cancer: prevent proliferation, induce apoptosis, prevent migration, etc.

Page 8: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

What if knowledge is incomplete? What kind of useful reasoning can we do with

incomplete knowledge? Drug makers don’t wait till full knowledge is

available. Answer: hypothesis formation

Page 9: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Motivation: Use summary

The ultimate uses of signaling (metabolic, etc.) interaction databases are to do: Prediction – therapy verification; determining side

effects. Explanation -- diagnosing what is wrong. Planning – therapy and drug design.

Intermediate or immediate use Generate Hypothesis

Page 10: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Initial goal of our research

Use knowledge representation and reasoning techniques to: Represent interactions Reason about these interactions: prediction,

explanation, planning and hypothesis formation.

Page 11: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Some questions

Isn’t it a little premature? We know very little about the networks New knowledge is being constantly added

Why knowledge representation and reasoning? Why not simulation Why not use Petri nets, calculus

Why a knowledge-based approach? Why not a data base approach? What’s the difference?

Page 12: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Our approach : present and future Yes, prediction is kind-of same as simulation

Incompleteness of information is an issue though! But hard to do explanation generation, or

design of therapies (planning) using simulation – guesses can be verified using simulation though

The core database query languages can not express explanation or planning queries.

Dealing with incompleteness!

Page 13: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Dealing with incompleteness – ongoing and future work Is one of the key criteria behind a `good’

knowledge representation language when building AI systems. Need to be non-monotonic. Need to be elaboration tolerant.

Proper analysis leads to hypothesizing If certain observations can not be satisfactorily

explained by the existing knowledge about the network then use general biological knowledge to hypothesize

Page 14: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Motivation -- summary

Goal: To emulate the abstract reasoning done by biologists, medical researchers, and pharmacology researchers.

Types of reasoning: prediction, explanation, planning and hypothesis formation.

Current system biology approaches: mostly prediction.

Ongoing issues: Dealing with incomplete knowledge and elaboration tolerance.

Page 15: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Related Works

Quantitative approaches. (hybrid systems, use of differential equations)

Graphical representations. Other qualitative approaches.

Petri Nets -calculus Pathway Logic Model Checking

Page 16: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Overview of our approach

Represent signal network as a knowledge base that describes actions/events (biological interactions, processes). effect of these actions/events. triggering conditions of the actions/events.

To query using the knowledge base: Prediction; explanation; planning; Hypothesis generation

BioSigNet-RR (Biological Signal Network - Representation and Reasoning) and BioSigNet-RRH systems.

Page 17: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Foundation behind our approach Research on representing and reasoning

about dynamic systems (space shuttles, mobile robots, software agents) causal relations between properties of the world effects of actions (when can they be executed) goal specification action-plans

Research on knowledge representation, reasoning and declarative problem solving – the AnsProlog language.

Page 18: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

An NFB signaling pathway

Page 19: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

An NFB signaling pathway

Page 20: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Syntax by example

bind(TNF-,TNFR1) causes trimerized(TNFR1)

trimerized(TNFR1) triggers bind(TNFR1,TRADD)

Page 21: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

General syntax to represent networks e causes f if f1; …; fk

g1; … ; gk causes g h1; … ; hm n_triggers e k1; … ; kl triggers e r1; … ; rl inhibits e e is an event (also referred to as an action)

and the rest are fluents (properties of the cell) For metabolic interactions:

e converts g1; … ; gk to f1; …; fk if h1; … ; hm

Page 22: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Semantics: queries and entailment Observation part of queries

f at t a occurs_at t

Given the Network N and observation O Predict if a temporal expression holds. Explain a set of observations. Plan to achieve a goal.

Page 23: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Importance of a formal semantics Besides defining prediction, explanation and

planning, it is also useful in identifying: Under what restrictions the answer given by a

given (graph based) algorithm will be correct. (soundness!)

Under what restrictions a given (graph based) algorithm will find a correct answer if one exists. (completeness!)

Page 24: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Utility of declarative programming languages (such as AnsProlog) Allows for quick implementation of the

semantics The specification or the definition of what is an

explanation, or what is a plan becomes a program that finds explanations and plans respectively.

Page 25: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Prediction

Given some initial conditions and observations, to predict how the world would evolve or predict the outcome of (hypothetical) interventions.

Page 26: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Back to the example

Binding of TNF- with TNFR1 leads to TRADD binding with one or more of TRAF2, FADD, RIP.

TRADD binding with TRAF2 leads to over-expression of FLIP provided NIK is phosphorylated on the way.

TRADD binding with RIP inhibits phosphorylation of NIK.

TRADD binding with FADD in the absence of FLIP leads to cell death.

Page 27: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Prediction 1.

Binding of TNF- with TNFR1 leads to TRADD binding with one or more of TRAF2, FADD, RIP.

TRADD binding with TRAF2 leads to over-expression of FLIP provided NIK is phosphorylated on the way.

TRADD binding with RIP inhibits phosphorylation of NIK.

TRADD binding with FADD in the absence of FLIP leads to cell death.

Initial Condition bind(TNF-α,TNF-R1)

occurs at t0 Query

predict eventually apoptosis

Answer Unknown! Incomplete knowledge

about the TRADD’s bindings.

Depends on if bind(TRADD, RIP) happened or not!

Page 28: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Prediction 2

Binding of TNF- with TNFR1 leads to TRADD binding with one or more of TRAF2, FADD, RIP.

TRADD binding with TRAF2 leads to over-expression of FLIP provided NIK is phosphorylated on the way.

TRADD binding with RIP inhibits phosphorylation of NIK.

TRADD binding with FADD in the absence of FLIP leads to cell death.

Initial Condition bind(TNF-α,TNF-R1)

occurs at t0 Observation

TRADD’s binding with TRAF2, FADD, RIP

Query predict eventually

apoptosis Answer: Yes!

Page 29: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Explanation

Given initial condition and observations, to explain why final outcome does not match expectation.

Page 30: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Explanation 1

Binding of TNF- with TNFR1 leads to TRADD binding with one or more of TRAF2, FADD, RIP.

TRADD binding with TRAF2 leads to over-expression of FLIP provided NIK is phosphorylated on the way.

TRADD binding with RIP inhibits phosphorylation of NIK.

TRADD binding with FADD in the absence of FLIP leads to cell death.

Initial condition: bound(TNF-,TNFR1) at

t0 Observation:

bound(TRADD, TRAF2) at t1

Query: Explain apoptosis One explanation:

Binding of TRADD with RIP Binding of TRADD with

FADD

Page 31: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Planning

Given initial conditions, to plan interventions to achieve a goal.

Application in drug and therapy design.

Page 32: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Planning requirements

In addition to the knowledge about the pathway we need additional information about possible interventions such as: What proteins can be introduced What mutations can be forced.

Page 33: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Planning example

Defining possible interventions: intervention intro(DN-TRAF2) intro(DN-TRAF2) causes present(DN-TRAF2) present(DN-TRAF2) inhibits bind(TRAF2,TRADD) present(DN-TRAF2) inhibits interact(TRAF2,NIK)

Initial condition: bound(NFκB,IκB) at 0 bind(TNF-α,TNF-R1) at 0

Goal: to keep NFκB remain inactive. Query:

plan always bound(NFκB,IκB) from 0

Page 34: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Conclusion of part 1

From paper in ISMB 2004: Our goal in this paper was to make progress towards

developing a system (and the necessary representation language and reasoning algorithms) that can be used to represent signal networks and pathways associated with cells and reason with them.

A start was made. Defined a simple language (syntax and semantics) Defined prediction, planning and explanation A prototype implementation using AnsProlog Illustration of its applicability with respect to an NFkB

pathway.

Page 35: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Issues with incomplete knowledge Often one may not be able to do much

predication, explanation or planning. What then? Can reasoning help in obtaining new

knowledge? Yes, through hypothesis generation! In fact, hypothesis generation needs

reasoning!

Page 36: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Part II: Hypothesis Generation

Page 37: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Hypothesis generation

Our observations can not be explained by our existing knowledge OR the explanations given by our existing knowledge are invalidated by experiments?

Conclusion: Our knowledge needs to be augmented or revised? How? Can we use a reasoning system to predict some hypothesis that

one can verify through experimentation? Automate the reasoning in the minds of a biologist, especially

helpful when the background knowledge is humongous.

Page 38: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Hypothesis space

Knowledge base

No cancerCancer

p53

UV leads_to cancerHigh UV

(K,I) |= O

Page 39: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Issues in this tiny example

Hypothesis formation: Theory: UV leads to cancer.

Observation: wild-type p53 resists the UV effect.

Hypothesis: p53 is a tumor-suppressor.

Elaboration tolerance: How do we update/revise “UV leads to cancer”?

Default & NM reasoning: Normally UV leads to cancer.

UV does not lead to cancer if p53 is present.

Page 40: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Related Works: some prior mention of hypothesis formation HYPGENE (Karp, 1991) TRANSGENE (Darden, 1997) GenePath (Zupan et al., 2003) Robot Scientist (King et al., 2004) Database (Doherty et al., 2004) BIOCHAM (Calzone et al., 2005) PathLogic (Karp et al. 2002) Cytoscape (Shannon et al., 2003) Integrative Scheme (Su et al., 2003) Pathway Analysis (Ingenuity)

… do not use the latest advances in knowledge representation and reasoning. (eg. lack of ways to express defaults, non-monotonicity, elaboration tolerance, problem solving rules, etc.)

Page 41: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Hypothesis formation

Knowledge base: K Set of initial conditions: I Set of (experimental) observations: O (K,I) does not entail O To expand (K,I) to (K’, I’): (K’, I’) entails O How to expand (hypothesis space)

Explanation: expand only I Diagnosis: normality assumptions about I, minimally

abandon the normality assumptions Hypothesis formation: expand K

Page 42: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Construction of hypothesis space Present: manual construction, using research

literature Future: integration of multiple data sources

Protein interactions Pathway databases Biological ontologies

……..

Provide cues, hunches such as

A may interact with B: action interact(A,B)

A-B interaction may have effect C:

interact(A,B) causes C

Page 43: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Generation of hypotheses

Enumeration of hypotheses Search: computing with Smodels (an

implementation of AnsProlog) Heuristics

A trigger statement is selected only if it is the only cause of some action occurrence that is needed to explain the novel observations.

An inhibition statement is selected only if it is the only blocker of some triggered action at some time.

Maximizing preferences of selected statements

Page 44: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Generation … (cont’): heuristics Knowledge base K

a causes g b causes g

Initial condition I = { intially f } Observation O = { eventually g } (K,I) does not entail O Hypothesis space: to expand K with rules among

f triggers a f triggers b

Hypotheses: { f triggers a }, or { f triggers b }

Page 45: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Case study: p53 network

Page 46: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Tumor suppression by p53

p53 has 3 main functional domains N terminal transactivator domain Central DNA-binding domain C terminal domain that recognizes DNA damage

Appropriate binding of N terminal activates pathways that lead to protection of cell from cancer.

Inappropriate binding (say to Mdm2) inhibits p53 induced tumor suppression.

Page 47: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

p53 knowledge base

Stress high(UV ) triggers upregulate(mRNA(p53))

Upregulation of p53 upregulate(mRNA(p53)) causes high(mRNA(p53)) high(mRNA(p53)) triggers translate(p53) translate(p53) causes high(p53)

Page 48: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

p53 knowledge base (cont.)

Tumor suppression by p53 high(p53) inhibits growth(tumor)

Page 49: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

p53 knowledge base (cont’)

Interaction between Mdm2 and p53 high(p53), high(mdm2) triggers bind(p53,mdm2) bind(p53,mdm2) causes bound(dom(p53,N)) bind(p53,mdm2) causes high([p53 : mdm2]), bind(p53,mdm2) causes ¬high(p53),¬high(mdm2)

Page 50: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Hypothesis formation

Experimental observation: I = { initially high(UV), high(mdm2), high(ARF) } O = { eventually ~ tumorous }

(K,I) does not entail O Need to hypothesize the role of ARF.

Page 51: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Constructing hypothesis space

Levels of ARF and p53 correlate high(ARF) triggers upregulate(mRNA(p53)) high(p53) triggers upregulate(mRNA(ARF))

Page 52: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Interactions of ARF with the known proteins bind(p53,ARF) causes bound(dom(p53,N))

Constructing …(cont’)

Page 53: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Influence of X (=ARF) on other interactions high(ARF) triggers upreg(mRNA(p53)) high(ARF) triggers translate(p53) high(ARF) triggers bind(p53,mdm2)

Constructing …(cont’)

Page 54: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Twelve Generated Hypothesis such as

high(UV) triggers upregulate(mRNA(ARF)) high(ARF), high(mdm2) triggers bind(ARF,mdm2)

Page 55: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Conclusion of part 2

Goal: Automation of hypothesis formation (with respect to interactions and pathways)

Approach: Viewed known qualitative aspects of cell activities as a knowledge base

Used knowledge representation language that Can express defaults Allows reasoning with incomplete knowledge Can express reasoning as well as problem solving

rules Developed a system BioSigNet-RRH:

Formalizing and reasoning about hypotheses Illustration: Hypothesizing the role of ARF protein in the p53

network.

Page 56: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Future Work on Reasoning about Biochemical Networks (Part I and II) Further development of the language Validation with respect to larger networks

Kohn’s map Networks in Reactome and other repositories

Going from prototype to deployable systems Scaling up challenges

Recent advances in automatic planning Integration with Biopax

Page 57: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Part III: CBioC

http://cbioc.org

Page 58: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Do we have enough knowledge in the various databases Some have been curated into databases. But there is much more in the literature. So what do we do?

Page 59: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Current status of curation from text About 15 million abstracts in Pubmed

3 million published by US and EU researchers during 1994-2004 (800 articles per day)

300 K articles published so far reporting protein-protein interactions in human, yeast and mouse. BIND (in 7 yrs) -- 23K ; DIP – 3K; MINT – 2.4K.

Page 60: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Premise: High cost of human curation Overwhelming cost of large curation efforts

may be unsustainable for long periods BIND: Nov 2005 bad news.

Operated for 7 years Listed over 100 curators & programmers CND $29 million received in 2003, plus other funding

Curation efforts of AFCS has recently stopped. Lack of funding for some genome annotation

projects.

Page 61: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Premise: summary

Human curation of text is expensive. Human curation of text is not scalable. Human curation of text is not sustainable.

Page 62: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Why not resort to computers? – do automatic extraction Lessons from DARPA funded MUCs (message

understanding conferences) in 90s for a decade and at the cost of tens of millions of dollars. Getting to 60% recall and precision is quick Then every 5% improvement is about a years work. Even when we get to 90% for an individual entity extraction

for recognizing 4 related entities: (.9)4 =.64 Lessons from Biomedical text extraction

No proper evaluation. Recognized that recall and precision is not very good even

in the “best” systems.

Page 63: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

What do we do?

How do we curate not only the existing articles, but also the future articles?

Too important to give up! Need to think of a new way to do it. Faster computers, better sequencing

technology and better algorithms came to the rescue of the Human Genome project.

Hmm. What resources are we overlooking?

Page 64: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Key Idea

If lots of articles are being written then lot of people are writing them and lot of people are reading them.

If only we could make these people (the authors and the readers) contribute to the curation effort …

Especially the readers; the ones who need the curated data!

Page 65: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Mass collaboration has worked in Wikipedia Project Gutenberg Netflix rating Amazon rating Etc.

Page 66: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Mass collaborative curation: initial hurdles An average reader

(S)he is not normally interested in filling a blank curation form.

We can not make an average reader go though curation training.

So it has to be very different from just making the existing curation tools available to the mass and expect them to contribute.

Page 67: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Mass collaborative curation : key initial ideas Make it very easy:

user need not remember where (which database, which web page) to put the curated knowledge.

Curation opportunity should present itself seamlessly.

Curation should not be a burden to an average user Make the curated knowledge “thin”.

There should be immediate rewards Do not start with a blank slate.

Page 68: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Realization of the key ideas: a biologist with a gene name Goes to Pubmed, types the gene name, clicks on

one of the abstracts Curation panel presents itself automatically

Our approach calls for researchers to contribute to the curation of facts as they read and research over the web

But not with a blank slate No one wants to be the first one! Automatic extraction jump-starts the process, and then

researchers improve upon the extracted data, “ironing out” inconsistencies by subsequent edits on a massive scale.

Thin Schemas Average users turned off by traditional wide schemas Wide schemas need to be broken down.

Page 69: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 70: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 71: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 72: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 73: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 74: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 75: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 76: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 77: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 78: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.
Page 79: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Summary

Information/curation window pops up automatically. Automatic extraction is used as a boot strap so that

no user is working on a blank slate. Users vote on correctness, make corrections, add

fact. Suppose 60% precision and recall of automatic extraction

system A person will have an easier time discarding 40% of

wrongly extracted text than identifying 60% of correct entries and entering them!

Page 80: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Very useful byproducts

Avoids some problems with existing human curation approach Curators’ bias Curators miss things Curators have disagreements Slow access to newest findings Researchers at large have little or no control over what

gets curated and when A large curated corpus of text gets created

Very useful to evaluate and improve automated extraction systems.

Page 81: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Current status of CBioC; future plans Basic system, as described, is ready Being populated with

Facts from existing databases (BIND etc.) Facts extracted using our extraction system

Querying mechanism Answer display

Future work Voter confidence issues …

Page 82: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Conclusion

Collecting what is known Reasoning with what is known Hypothesizing what is unknown

(based on observations)

Page 83: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Open Invitation

We are building and eager to help other groups build knowledge bases in particular domains to Predict impact of interventions Plan (therapy design) to make a pathway behave

in a desired way Explain observation Hypothesize new knowledge Further improvements to and adaptation of CBioC

Page 84: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Acknowledgements

BioSignet Nam Tran, Ph.D thesis on this, Postdoc @ Yale Karen Chancellor, Ph.D student Michael Berens and his group (Ana Joy, Nhan Tran) Lokesh Joshi and his group (Vinay Nagraj)

CBioc: Graciela Gonzalez, Lian Yu, Luis Tari, Tony Gitter, Amanda Ziegler, Ryan Wendt, Prabhdeep Singh.

Other projects: BioQA Biogenenet

Page 85: A knowledge based approach for representing, reasoning and hypothesizing about biochemical networks Chitta Baral Arizona State University.

Thank you!