CSE-291: Ontologies in Data Integration Amarnath Gupta Department of Computer Science & Engineering...
-
Upload
alexina-taylor -
Category
Documents
-
view
216 -
download
1
Transcript of CSE-291: Ontologies in Data Integration Amarnath Gupta Department of Computer Science & Engineering...
CSE-291: Ontologies in Data Integration
Amarnath GuptaAmarnath Gupta
Department of Computer Science & Engineering Department of Computer Science & Engineering University of California, San DiegoUniversity of California, San Diego
CSE-291: Ontologies in Data IntegrationCSE-291: Ontologies in Data IntegrationSpring 2004Spring 2004
Ontologies and Biological PathwaysOntologies and Biological Pathways
CSE-291: Ontologies in Data Integration
So, What is an Ontology Again?So, What is an Ontology Again?
• From previous classesFrom previous classes– [Sowa] The subject of ontology is the study of the categories of things that
exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D… A formal ontology is specified by a collection of names for concept and relation types organized in a partial ordering by the type-subtype relation.
– [Guarino] Theory of formal distinctions• among things• among relations
– Basic tools• Theory of parthood• Theory of integrity• Theory of identity• Theory of dependence
Is this good enough to characterize all concepts and relations?Is this good enough to characterize all concepts and relations?
CSE-291: Ontologies in Data Integration
Description Logics as Ontology FrameworksDescription Logics as Ontology Frameworks
• You have learnt about Description LogicsYou have learnt about Description Logics– DLs allow you to do the following:
CSE-291: Ontologies in Data Integration
Property Frames in DLsProperty Frames in DLs
• Some Description Logics like SHOQ(D)Some Description Logics like SHOQ(D)11, a progenitor of , a progenitor of OWL, allow:OWL, allow:– Roles or properties to be more powerful
– If R and S are roles, one can specify a role box that contains• role equivalence axioms: ∃component_of.⊤ ≐ ∃part_of.⊤ • role inverses (not present in SHOQ, but present in SHIQ)
• role inclusion axioms: R ⊑ S
• role transitivity axioms: Trans(R)
– Thus one can construct role hierarchies in addition to concept lattices
1Ian Horrocks and U. Sattler. “Ontology Reasoning for the Semantic Web”. In B. Nebel, editor, Proc. of the 17th Int. Joint Conf. on Articial Intelligence (IJCAI'01), Morgan Kaufmann, pages 199-204, 2001.
CSE-291: Ontologies in Data Integration
Thing-Centric OntologiesThing-Centric Ontologies
• Now let’s try these:Now let’s try these:1. sky 2. blue_sky ≡ sky ⊓ ∃ has_color.blue3. cloudy_sky ≡ sky ⊓ ∃ covered_by.cloud4. rain5. acid_rain ⊏ rain6. acid_rain_from_cloudy_sky ≡ acid_rain ⊓ ∃ drops_from.cloudy_sky
• Is this reasonable?Is this reasonable?
• How about these?How about these?1. year2. quarter ⊑ ∃⁼4 part_of.year3. mid_term ⊑ exam ⊓ ¬final_test ⊓ ∃ occurs_in.quarter
• Is it working? Why?Is it working? Why?
Not every concept and relation is thing-centric!!Not every concept and relation is thing-centric!!
CSE-291: Ontologies in Data Integration
Ontologies for Processes, Events, TimeOntologies for Processes, Events, Time
• Temporal Description LogicTemporal Description Logic22
– Allen’s interval relations
2A. Artale and E. Franconi. “A temporal description logic for reasoning about actions and plans”. Journal of Artificial Intelligence Research, 9:463--506, 1998
CSE-291: Ontologies in Data Integration
Temporal Description LogicTemporal Description Logic
• IngredientsIngredients– non-temporal concepts E– temporal concepts C
• things that change their state
– temporal qualifier C@X where X is a temporal variable– temporal constraints Tc
• (X (R) Y) where – X is any temporal variable or the “NOW” interval #– R can be Allen’s interval relations or an expression composed from it
– existential quantifiers• ⋄ (X) Tc.C
– selections p:E where p is • an atomic feature f• a parameterized feature *f
CSE-291: Ontologies in Data Integration
Applying Temporal DLApplying Temporal DL
• Translocation of a proteinTranslocation of a protein– translocation ≐⋄(x y)(x m #)(# m y) ((*Protein: InCytoplasm)@x ⊓
(*Protein: InNucleus)@y) • *Protein is the formal parameter of this action
• States of the *Protein are treated as though they are different type assignments for the same variable
– The above is a definition of the term “translocation”
– Now we can have an assertion (meaning data) of the form• translocation(tp1, MAPK-translocation), i.e., of the form translocation(Interval,
Action) to designate a specific case, thus implying
• translocation(i, a) ⇒ ∃p. *Protein(a, p) ⋀ ∃j,l. (InCytoplasm(j,p) ⋀ InNucleus(l,p) m(j,i) m(i,l))⋀ ⋀
x y
#in-cytoplasm(protein) in-nucleus(protein)translocation
CSE-291: Ontologies in Data Integration
Applying Temporal DLApplying Temporal DL• Some identitiesSome identities
– ⋄ x (x a #). C@x ≡ ⋄ xy (y mi #)(x mi y). C@x– ⋄ x (x d #). C@x ≡ ⋄ xy (y s #)(x f y). C@x– ⋄ x (x o #). C@x ≡ ⋄ xy (y s #)(x fi y). C@x
• A little more complex caseA little more complex case
z w
#PTK_ligand_binding GRB2_bindingGRB2_secondary_response
ytyrosin_phosphorylated
xtyrosin
yautophosphorylation
tyrosin_p ≐ ⋄⋄ x (x o #). (tyrosin@x ⊓ autophosphorylation)
GRB2_s_r ≐ ⋄ ⋄ (y z w)(y b w)(z b w) (tyrosine_p@y PTK_l_b@⊓ z ⊓ GRB2_b@w )
We only really need the relations s, f and mi
CSE-291: Ontologies in Data Integration
Applying Temporal DLApplying Temporal DL
• More features of the temporal DLMore features of the temporal DL– path p ○ q
• *Protein○ bound should be interpreted as• ∃ a,p,i,o1 Protein(a, p, i) bound(i, p, o1)⋀
– Agreement operator ↓• (*Protein○ bound ↓ *Receptor)@y means at the interval y the object to
which Protein is bound is Receptor)
– Substitution• Suppose A ≐ ⋄⋄ (x y z w)(…) is an axiom and
B ≐ ⋄⋄ (x u v)(…) is another axiom whose body is a part of A• The temporal substitutive qualifier (B[x]@v) renames within the defined
B action the variable x to w and it is a way of making coreference between two temporal variables, while the temporal constraints peculiar to the renamed variable x are inherited by the substituting interval w. This will eliminate x from A.
• This can be used to define one temporal concept in terms of another
CSE-291: Ontologies in Data Integration
And now on to Biological PathwaysAnd now on to Biological Pathways
The goals are: The goals are: 1. to comprehend what we need to represent before we
think about how to represent them
2. what computations we can do with them
CSE-291: Ontologies in Data Integration
What are Pathways?What are Pathways?
• A pathway is a set of linked biological components interacting with A pathway is a set of linked biological components interacting with each other over time to generate a biological effecteach other over time to generate a biological effect
• A component in a pathway can often be broken down into a finer A component in a pathway can often be broken down into a finer level of interacting components that finally get to single level of interacting components that finally get to single biochemical reactionsbiochemical reactions
• When people talk about pathways they refer toWhen people talk about pathways they refer to– signal transduction networks
– metabolic pathways
– gene regulatory pathways
– protein-protein interaction networks
CSE-291: Ontologies in Data Integration
Signal Transduction NetworksSignal Transduction Networks
What is Signal Transduction?What is Signal Transduction?
Process by which a cell converts one kind of signal or stimulus into another
CSE-291: Ontologies in Data Integration
The Big PictureThe Big Picture
• How do organisms communicate with their environment?How do organisms communicate with their environment?• How do cells exchange information?How do cells exchange information?• What information needs to be exchanged?What information needs to be exchanged?• What is the currency of information?What is the currency of information?
CSE-291: Ontologies in Data Integration
EventsEvents
• StimuliStimuli– Synthesis of signaling molecule by the signaling
cell.– Release of signaling molecule by the signaling
cell.– Transport of the signal to the target cell.– Detection of the signal by a specific receptor
protein.• ResponsesResponses
– Reception: First messenger – extracellular molecule (signal), binds to a receptor.
– Transduction• Amplification: Binding activates receptor protein,
which then activates relay protein.• Conversion: Relay protein stimulates another
membrane protein which acts as an effector (effects changes in cell).
– Induction/Response: Effector protein – enzyme that produces a secondary messenger (cytoplasmic molecule that triggers metabolic and/or structural responses within cell).
– Removal of the signal, often terminating the cellular response.
CSE-291: Ontologies in Data Integration
Types of SignalsTypes of Signals
• ExtracellularExtracellular– Signal molecules are specific to their
receptors
– Receptors, usually proteins, have N terminal face outwards and C terminal inside the cell.
– When bound to a signal molecule, a receptor changes its conformation
CSE-291: Ontologies in Data Integration
Types of SignalsTypes of Signals
• IntracellularIntracellular– Mostly triggered by the
extracellular signal– Converts the extracellular
signal into an intracellular signal
– Eg. - G protein, GTPase, cAMP, Ca++, Kinases, phosphatases and many more
– Also called second messengers
CSE-291: Ontologies in Data Integration
Types of SignalsTypes of Signals
• IntercellularIntercellular– Extracellular signalling
– Endocrinology
– Types• Endocrine – Travel through
blood
• Paracrine – In the vicinity
• Autocrine – Same cell type
• Juxtacrine – Along cell membranes
CSE-291: Ontologies in Data Integration
Types of SignalsTypes of Signals
• HormonesHormones– Between cells or tissues within an individual– Process
• Synthesis Storage and secretion Transport Recognition of hormone by its receptor change in receptor shape Relay and amplification of signal Response
• Sending cell is a specialized cell while the receiving can be of any type• A single hormone can have many receptors for different pathways or many
hormones can have same receptor to invoke same pathway• Two classes of hormone receptors
– Membrane associated – Cytoplasmic
CSE-291: Ontologies in Data Integration
Cellular ResponseCellular Response
– depends on the particular signaling pathways - may involve changes in :
• cell cycle progression
• gene expression
• protein trafficking
• cell migration
• cytoskeleton architecture
• adhesion
• metabolism
• cell survival
CSE-291: Ontologies in Data Integration
It should be noted that the RAS-RAF-MEK-MAPK pathway is only one example of so called “MAPK (Mitogen-Activated Protein Kinase)) pathways” .
Two other mammalian MAPK pathways involving JNK1 and p38, are involved in stress responses (they are also “MAPK pathways”).
Example: Example: RAS-RAF-MEK-MAPKRAS-RAF-MEK-MAPK pathwayspathwaysExample: Example: RAS-RAF-MEK-MAPKRAS-RAF-MEK-MAPK pathwayspathways
CSE-291: Ontologies in Data Integration
• Ligand binds receptor PTK
• Autophosphorylation on tyrosine
P
P
P
P
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
• Ligand binds receptor PTK
• Autophosphorylation on tyrosine
• GRB2 (a SH2- and SH3-containing protein) binds to the receptor phosphotyrosine motif Y-V/L-N-X via its SH2 domain
P
P
P
P
RAS-RAF-MEK-MAPK
SH2 SH
3
GRB2
SOS
CSE-291: Ontologies in Data Integration
• Ligand binds receptor PTK
• Autophosphorylation on tyrosine
• GRB2 (a SH2- and SH3-containing protein) binds to the receptor phosphotyrosine motif Y-V/L-N-X via its SH2 domain
• The SH3 of GRB2 binds constitutively to the proline-rich sequence in the C-terminus of SOS (a guanine nucleotide exchange factor for RAS).
P
P
P
P SH2 SH
3
GRB2
SOS
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
• Recruitment of SOS to the close proximity of RAS in the membrane
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GDP
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
• RAS becomes activated by exchanging GDP for GTP
P
P
P
P SH2 SH
3
GRB2
SOS
GDPGTPRAS
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
• The RAS-GTP effector domain interacts with the N-terminal regulatory region of the RAF (serine/threonine protein kinase), hence recruiting RAF to the membrane
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
14-3-3
• Activation of RAF (most likely by phosphorylation of RAF and binding to the scaffold protein 14-3-3)
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
14-3-3
• Activation of RAF (most likely by phosphorylation of RAF and binding to the scaffold protein 14-3-3)
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
14-3-3
• Activated RAF in turn activates MEK (also called MAPK kinase; a dual specificity kinase) by phosphorylation on two conserved serine residues in MEK. P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
MEK
P P
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
14-3-3
• Activated RAF in turn activates MEK (also called MAPK kinase; a dual specificity kinase) by phosphorylation on two conserved serine residues in MEK. P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
MEK
P P
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
14-3-3
• Activated MEK activates MAPK (a serine/threonine protein kinase) by phosphorylation of conserved threonine and tyrosine residues. P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
MEK
P P
MAPK
P P
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
14-3-3
• Activated MEK activates MAPK (a serine/threonine protein kinase) by phosphorylation of conserved threonine and tyrosine residues.
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
MEK
P P
MAPK
P P
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
14-3-3
• Activated MAPK phosphorylates a number of substrates in the plasma membrane and the cytoplasm;
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
MEK
P P
MAPK
P PSubstrates
Substrates
P
P
RAS-RAF-MEK-MAPK
CSE-291: Ontologies in Data Integration
14-3-3
• Activated MAPK phosphorylates a number of substrates in the plasma membrane and the cytoplasm; • It also translocated into the nucleus(within min) where it phosphorylates nuclear transcription factors.
P
P
P
P SH2 SH3
GRB2
SOS
RAS GTP
RAF
MEK
P P
MAPK
P P
Substrates
RAS-RAF-MEK-MAPK
MAPK
P P
CSE-291: Ontologies in Data Integration
14-3-3
• Activated MAPK phosphorylates a number of substrates in the plasma membrane and the cytoplasm;
• It also translocated into the nucleus(within min) where it phosphorylates nuclear transcription factors.
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
MEK
P P
Substrates
RAS-RAF-MEK-MAPK
MAPK
P P
MAPK
P P
CSE-291: Ontologies in Data Integration
14-3-3
• Activated MAPK phosphorylates a number of substrates in the plasma membrane and the cytoplasm;
• It also translocated into the nucleus(within min) where it phosphorylates nuclear transcription factors.
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
MEK
P P
Substrates
RAS-RAF-MEK-MAPK
MAPK
P P
MAPK
P P
CSE-291: Ontologies in Data Integration
14-3-3
• Activated MAPK phosphorylates a number of substrates in the plasma membrane and the cytoplasm;
• It also translocated into the nucleus(within min) where it phosphorylates nuclear transcription factors.
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
MEK
P P
Substrates
RAS-RAF-MEK-MAPK
MAPK
P P
MAPK
P P
P
CSE-291: Ontologies in Data Integration
14-3-3
• Activated MAPK phosphorylates a number of substrates in the plasma membrane and the cytoplasm;
• It also translocated into the nucleus(within minutes) where it phosphorylates nuclear transcription factors.
Transcription of genes important for cell proliferation.
P
P
P
P SH2 SH
3
GRB2
SOS
RAS GTP
RAF
MEK
P P
RAS-RAF-MEK-MAPK
SubstratesMAPK
P P
MAPK
P P
P
CSE-291: Ontologies in Data Integration
Metabolic PathwaysMetabolic Pathways
What is metabolism?What is metabolism?
The sum of all the chemical and physical changes that take place within the body and enable its continued growth and functioning.
Metabolism involves the breakdown of complex organic constituents of the body with the liberation of energy, which is required for other processes, and the building up of complex
substances, which form the material of the tissues and organs.
CSE-291: Ontologies in Data Integration
Chemical reactionsChemical reactions• Reactants and productsReactants and products
– together called metabolites
• Free energy change (Free energy change (ΔG) of a reaction ΔG) of a reaction A + B A + B C + D C + D
ΔG = ΔGo + RT ln [C][D] / [A][B]– depends on concentrations and nature of metabolites– ΔG < 0 for a spontaneous (exergonic) reaction– ΔG > 0 for an endergonic reaction
• Chemical equilibriumChemical equilibrium– Same rate of forward and backward reactions– ΔG = 0, let Keq = [C][D]/[A][B], the ratio of products to reactants at
equilibrium– ΔGo = - RT ln Keq
– Keq = e–ΔGo/RT
CSE-291: Ontologies in Data Integration
Rate LawRate Law
• Consider a reaction of overall stoichiometry, Consider a reaction of overall stoichiometry,
The rate, or velocity, The rate, or velocity, vv of this reaction is the amount of P formed or the amount of of this reaction is the amount of P formed or the amount of A consumed per unit time. Thus:A consumed per unit time. Thus:
Rate law states that: Rate law states that:
Where Where kk is rate constant. is rate constant. vv is a function of [A] to the first power, or the first order. is a function of [A] to the first power, or the first order. kk is called first order constant. is called first order constant.
dt
Advor
dt
Pdv
][][
PA
][][
Akdt
Adv
CSE-291: Ontologies in Data Integration
Equilibrium constant and equation Equilibrium constant and equation ratesrates
For a reversible reaction A + B C + D
the rate will be the difference between the forward and reverse rates
dC/dt = kf [A][B] - kr [C] [D]
At equilibrium,
kf [A][B] = kr [C] [D]
Keq = kf / kr = [C] [D] / [A][B]
CSE-291: Ontologies in Data Integration
EnzymesEnzymes
• usually proteins. A small number of enzymes are made of RNA (ribozymes).
• are usually quite big (compared to the portions of the reactants or substrates which are modified in the reaction to be catalyzed).
Enzyme(hexokinase)
Ribozyme(self-splicing intron)
CSE-291: Ontologies in Data Integration
Enzymes have a substrate binding site which binds the reaction substrates and brings them together in the orientations appropriate for the reaction.
This binding is usually highly specific. Often, one enzyme catalyses only one type of reaction between a specific set of substrates.
CSE-291: Ontologies in Data Integration
Enzymes have an active site—a specialized configuration of side-chain and main-chain atoms located at the substrate binding site which assist in the chemical steps of the reaction.
Triosephosphateisomerase
Active site
CSE-291: Ontologies in Data Integration
Active sitesActive sites
• 3-dimensional cleft3-dimensional cleft– can be formed by faraway residues
– Lysozyme’s active site includes residues at positions 35, 52, 62, 63, 101, 108 (out of a total of 129 residues)
• Small fraction of the total volume of an enzymeSmall fraction of the total volume of an enzyme• Substrates are bound to enzymes through multiple Substrates are bound to enzymes through multiple
weak attractionsweak attractions
CSE-291: Ontologies in Data Integration
Regulation of enzymesRegulation of enzymes• Reversible and irreversible Reversible and irreversible
inhibitioninhibition• Competitive and allosteric Competitive and allosteric
regulationregulation– Allosteric regulation can be
activation or inhibition – Tense (T) and relaxed (R)
states– Activator binds to R state– Inhibitor binds to T state
• Different kinetics for each Different kinetics for each
CSE-291: Ontologies in Data Integration
Regulatory control of enzymesRegulatory control of enzymes
• Alteration of enzyme activityAlteration of enzyme activity– Enzyme modification
• Covalent modification
• Protein-protein interaction
– Substrate control– Product control – Allosteric control
CSE-291: Ontologies in Data Integration
Regulatory control of enzymesRegulatory control of enzymes
• Alteration of number of enzyme moleculesAlteration of number of enzyme molecules– Transcription– Translation– Control of enzyme degradation
• Compartmentalization Compartmentalization – Example: hexokinase in brain and liver
CSE-291: Ontologies in Data Integration
Enzyme NomenclatureEnzyme Nomenclature
• OxidoreductasesOxidoreductases (EC Class 1)(EC Class 1)– Transfer electrons (RedOx reactions)\
• TransferasesTransferases (EC Class 2)(EC Class 2)– Transfer functional groups between molecules
• HydrolasesHydrolases (EC Class 3)(EC Class 3)– Break bonds by adding H2O
• LyasesLyases (EC Class 4)(EC Class 4)– Elimination reactions to form double bonds
• IsomerasesIsomerases (EC Class 5)(EC Class 5)– Intramolecular rearangements
• LigasesLigases (EC Class 6)(EC Class 6)– Join molecules with new bonds
CSE-291: Ontologies in Data Integration
ID 2.3.1.43DE Phosphatidylcholine--sterol O-acyltransferase.AN Lecithin--cholesterol acyltransferase.AN LCAT.AN Phospholipid--cholesterol acyltransferase.CA Phosphatidylcholine + sterol = sterol ester +CA 1-acylglycerophosphocholine.CC -!- Palmitoyl, oleoyl, and linoleoyl can be transferred; a number ofCC sterols, including cholesterol, can act as acceptor.CC -!- The bacterial enzyme also catalyses the reactions of EC 3.1.1.4 andCC EC 3.1.1.5.DI Norum disease; MIM:245900.DI Fish-eye disease; MIM:136120.PR PROSITE; PDOC00110;DR BRENDA; 2.3.1.43.DR EMP/PUMA; 2.3.1.43.DR WIT; 2.3.1.43.DR KYOTO UNIVERSITY LIGAND CHEMICAL DATABASE; 2.3.1.43.DR P10480, GCAT_AERHY; P53760, LCAT_CHICK; P04180, LCAT_HUMAN;DR P16301, LCAT_MOUSE; Q08758, LCAT_PAPAN; P30930, LCAT_PIG ;DR P53761, LCAT_RABIT; P18424, LCAT_RAT ;//
Example entry from the Enzyme Database at Example entry from the Enzyme Database at http://www.expasy.ch/enzyme/http://www.expasy.ch/enzyme/
CSE-291: Ontologies in Data Integration
Enzyme Catalytic MechanismsEnzyme Catalytic Mechanisms
• Fundamentally familiar reactions from Organic Fundamentally familiar reactions from Organic ChemistryChemistryAcid Base Catalysis - Donation or abstraction of protons
Covalent Catalysis - Covalent (co)enzyme-substrate intermediate
Metal Ion - Substrates and metals positioned for reaction
Electrostatic - Charge complimentarity to transition state
Proximity and Orientation - Substrates aligned for reaction
Transition state stabilization - G‡ reduced
CSE-291: Ontologies in Data Integration
Metabolic networksMetabolic networks
• Each enzyme/reaction can be a path between nodesEach enzyme/reaction can be a path between nodes– Each node is an enzyme substrate (product or reactant)
• Converting individual reactions to paths and nodesConverting individual reactions to paths and nodes– Produces directed graphs
• Classification of biochemical reactionsClassification of biochemical reactions– EC numbering system (Enzyme Commission)
– Hierarchical numerical system i.e. 1.5.3.1
– Based on organic chemistry involved, not proteins
CSE-291: Ontologies in Data Integration
PainPainthe Boehringer-Mannheim wallchartsthe Boehringer-Mannheim wallcharts
CSE-291: Ontologies in Data Integration
Gene Regulatory NetworksGene Regulatory Networks
What is gene regulation?What is gene regulation?
The primary role of a gene, is transcription, which produces mRNA, a copy of a single strand of the gene. Different proteins can control the transcription process by activating, inhibiting, or competitively binding to the promoter region of genes.
CSE-291: Ontologies in Data Integration
Protein SynthesisProtein Synthesis
• TranscriptionTranscription– Before the synthesis of a protein
begins, the corresponding RNA molecule is produced by RNA transcription. One strand of the DNA double helix is used as a template by the RNA polymerase to synthesize a messenger RNA (mRNA).
– This mRNA migrates from the nucleus to the cytoplasm. During this step, mRNA goes through different types of maturation including one called splicing when the non-coding sequences are eliminated. The coding mRNA sequence can be described as a unit of three nucleotides called a codon.
CSE-291: Ontologies in Data Integration
Protein SynthesisProtein Synthesis• TranslationTranslation
– The ribosome binds to the mRNA at the start codon that is recognized only by the initiator tRNA.
– The ribosome proceeds to the elongation phase of protein synthesis. During this stage, complexes, composed of an amino acid linked to tRNA, sequentially bind to the appropriate codon in mRNA by forming complementary base pairs with the tRNA anticodon.
– The ribosome moves from codon to codon along the mRNA. Amino acids are added one by one, translated into polypeptidic sequences dictated by DNA and represented by mRNA.
– At the end, a release factor binds to the stop codon, terminating translation and releasing the complete polypeptide from the ribosome.
CSE-291: Ontologies in Data Integration
Control of Gene ExpressionControl of Gene Expression
• Gene Expression is a term indicating the act of protein synthesis by Gene Expression is a term indicating the act of protein synthesis by a genea gene– not all genes produce proteins in all cells or in all phases of a cell’s life cycle
• Many control pointsMany control points– transcription, mRNA processing, nRNA transport, translation, post-
translational modifications
• Each gene has its own control regionsEach gene has its own control regions– all genes differ slightly in the exact locations of control and the exact set of
transcription factors (proteins that control transcription)
• Different combinations of transcription factors, and their relative Different combinations of transcription factors, and their relative timing of bindings create a large space of control signalstiming of bindings create a large space of control signals– some control signals may control the transcription of more than one gene
CSE-291: Ontologies in Data Integration
Transcription-Initiation ComplexTranscription-Initiation Complex
CSE-291: Ontologies in Data Integration
Events Leading to Transcription InitiationEvents Leading to Transcription Initiation
CSE-291: Ontologies in Data Integration
Enhancers can be equally complexEnhancers can be equally complex
CSE-291: Ontologies in Data Integration
Ontologies and Databases for Ontologies and Databases for Biological PathwaysBiological Pathways
CSE-291: Ontologies in Data Integration
BioPaxBioPax
BioPAX
Molecular InteractionsPro:Pro All:All
PSI
Biochemical Reactions
SBML,CellML
Regulatory PathwaysQualitative Quantitative
GeneticInteractions
Interaction NetworksMolecular Non-molecularPro:Pro TF:Gene Genetic
Metabolic Pathways Qualitative Quantitative
DatabaseExchange Formats
Simulation ModelExchange Formats
SmallMolecules (CML)
RateFormulas
Enzymes
CSE-291: Ontologies in Data Integration
Design GoalsDesign Goals• EncapsulationEncapsulation: An entire pathway in one record: An entire pathway in one record
• CompatibleCompatible: Use existing standards wherever possible: Use existing standards wherever possible
• ComputableComputable: From file reading to logical inference: From file reading to logical inference• OWL (Ontology Web Language)
– Fast
– Complete: all conclusions are guaranteed to be computed
– Decidable: all computations will finish in finite time (with OWL Lite, short amount of time.
CSE-291: Ontologies in Data Integration
Requirements SpecificationRequirements Specification
• Accommodate Accommodate existing databaseexisting database representations: BioCyc, BIND, representations: BioCyc, BIND, WIT, aMAZE, KEGG, etc.WIT, aMAZE, KEGG, etc.– Compatible as a superset of representations
• Support different pathway types:Support different pathway types:– Metabolic pathways– Signaling pathways– Protein-protein interactions– Gene regulatory pathways
• OWL- used for encoding the ontologyOWL- used for encoding the ontology
CSE-291: Ontologies in Data Integration
Implementation of BioPAXImplementation of BioPAX• Implemented using OWL languageImplemented using OWL language• OWL isOWL is
– Ontology Web Language
– XML based
– W3C standard www.W3C.org
• Example of a BioPAX Class and Instance in OWLExample of a BioPAX Class and Instance in OWL
CSE-291: Ontologies in Data Integration
Example – Class def in OWLExample – Class def in OWL
<owl:Class rdf:ID="protein"> <rdfs:subClassOf> <owl:Class rdf:about="#physicalEntity"/> </rdfs:subClassOf> <rdfs:comment
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A protein (e.g. The EGFR protein sequence. See Swiss-Protfor more examples.)
</rdfs:comment></owl:Class>
CSE-291: Ontologies in Data Integration
Example – Instance in OWLExample – Instance in OWL
<bpx:protein rdf:ID="biopax-L1v0.5_Instance_42"> <bpx:NAMES> <bpx:namesType rdf:ID="biopax-L1v0.5_Instance_43"> <bpx:SHORTLABEL>phosphoglucose isomerase</bpx:SHORTLABEL> </bpx:namesType> </bpx:NAMES> </bpx:protein>
CSE-291: Ontologies in Data Integration
Current structure of Current structure of
class hierarchyclass hierarchy
Level 1 v0.9 (Dec. 2003)Level 1 v0.9 (Dec. 2003)
BioPAX OntologyBioPAX Ontology
CSE-291: Ontologies in Data Integration
Metabolic Data in BioPAXMetabolic Data in BioPAX
Biochemical ReactionBiochemical Reaction
IDID 11
Full NameFull Name Glucose-6-p to Glucose-6-p to fructose-6-pfructose-6-p
LeftLeft <cml>glucose-6-<cml>glucose-6-phosphate</cml>phosphate</cml>
RightRight <cml>fructose-6-<cml>fructose-6-phosphate</cml>phosphate</cml>
Delta GDelta G 0.4 kcal/mole0.4 kcal/mole
ECEC 5.3.1.95.3.1.9
EcoCyc: Reaction BioPAX: Biochemical Reaction
CSE-291: Ontologies in Data Integration
Metabolic Data in BioPAXMetabolic Data in BioPAX
CatalysisCatalysis
IDID 22
NameName Catalysis of glucose-Catalysis of glucose-6-p to fructose-6-p6-p to fructose-6-p
EnzymeEnzyme glucose-6-phosphate glucose-6-phosphate isomeraseisomerase
ReactionReaction BioPAX ID=1BioPAX ID=1
InhibitorsInhibitors Low pHLow pH
EcoCyc: Enzyme-Catalyzed Reaction BioPAX: Catalysis
CSE-291: Ontologies in Data Integration
Metabolic Data in BioPAXMetabolic Data in BioPAX
PathwayPathway
IDID 1010
NameName GlycolysisGlycolysis
InteractionsInteractions
1. BioPAX ID=21. BioPAX ID=2
2. BioPAX ID=42. BioPAX ID=4
3. BioPAX ID=63. BioPAX ID=6
etc.etc.
EcoCyc: Pathway BioPAX Class: Pathway
CSE-291: Ontologies in Data Integration
Signal Transduction Data in BioPAXSignal Transduction Data in BioPAX
ReactionReaction
IDID 2020
NameName Activation of NF-kBActivation of NF-kB
SubstrateSubstrate NF-kB (inactive)NF-kB (inactive)
ProductProduct NF-kB (active)NF-kB (active)
Enzyme CatalysisEnzyme Catalysis
IDID 2121
NameName MAP-kinase activates NF-MAP-kinase activates NF-kBkB
EnzymeEnzyme MAP-kinaseMAP-kinase
ReactionReaction BioPAX ID=20BioPAX ID=20
CSNDB Signaling Pathway Step
CSE-291: Ontologies in Data Integration
Signal Transduction Data in BioPAXSignal Transduction Data in BioPAX
PathwayPathway
IDID 1010
NameName MAPKMAPK
InteractionsInteractions 1. BioPAX ID=211. BioPAX ID=21
2. BioPAX ID=232. BioPAX ID=23
3. BioPAX ID=253. BioPAX ID=25
etc.etc.
CSNDB Pathway
CSE-291: Ontologies in Data Integration
Descriptions of some databasesDescriptions of some databases
Name:Name: KEGG (Kyoto Encyclopedia of Genes and Genomes)KEGG (Kyoto Encyclopedia of Genes and Genomes)Web:Web: http://www.genome.ad.jp/kegg/http://www.genome.ad.jp/kegg/Owner:Owner: Institute for Chemical Research, Kyoto UniversityInstitute for Chemical Research, Kyoto UniversityDescription:Description: KEGG is an effort to computerize current knowledge of molecular and cellular
biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the gene catalogs produced by genome sequencing projects. The KEGG project is undertaken in the Bioinformatics Center, Institute for Chemical Research, Kyoto Univ.
Name:Name: PathDBPathDBWeb:Web: http://www.ncgr.org/pathdb/index.htmlhttp://www.ncgr.org/pathdb/index.htmlOwner:Owner: National Center for Genomic ResourcesNational Center for Genomic ResourcesDescription:Description: PathDB™ is a functional prototype research tool for biochemistry and
functional genomics. One of the key underlying philosophies of their project is to capture discrete metabolic steps. This allows them to build tools to construct metabolic networks de novo from a set of defined steps. PathDB is not simply a data repository but a system around which tools can be created for building, visualizing, and comparing metabolic networks.
CSE-291: Ontologies in Data Integration
List of Pathway Database/Tools (cont.)List of Pathway Database/Tools (cont.)
Name:Name: GenMAPP (Gene MicroArray Pathway Profiler)GenMAPP (Gene MicroArray Pathway Profiler)
Gladstone Institute, UCSF. Gladstone Institute, UCSF.
GenMAPP is a computer application designed to visualize gene expression data on maps representing biological pathways and groupings of genes. The first release of GenMAPP 1.0 beta is available with over 50 mouse and human pathways. They also provide hundreds of functional groupings of genes derived from the Gene Ontology Project for the human, mouse, Drosophila, C. elegans, and yeast genomes. GenMAPP seeks collaborators in the biological community to assist in the development of a library of pathways that will encompass all known genes in the major model organisms.
Name: Name: SPAD: Signaling PAthway DatabaseSPAD: Signaling PAthway Database
Graduate School of Genetic Resources Technology. Kyushu University. Graduate School of Genetic Resources Technology. Kyushu University.
There are multiple signal transduction pathways: cascade of information from plasma membrane to nucleus in response to an extracellular stimulus in living organisms. Extracellular signal molecule binds specific intracellular receptor, and initiates the signaling pathway. Now, there is a large amount of information about the signaling pathways which control the gene expression and cellular proliferation. They have developed an integrated database SPAD to understand the overview of signaling transduction. SPAD is divided to four categories based on extracellular signal molecules (Growth factor, Cytokine, and Hormone) that initiate the intracellular signaling pathway. SPAD is compiled in order to describe information on interaction between protein and protein, protein and DNA as well as information on sequences of DNA and proteins.
CSE-291: Ontologies in Data Integration
Specific Pathway DatabasesSpecific Pathway Databases
• Cytokine Signaling Pathway DBCytokine Signaling Pathway DB.. Dept. of Biochemistry. Kumamoto Univ.Dept. of Biochemistry. Kumamoto Univ.– The Database contains information on signaling pathways of cytokines. It is designed for researchers who work
with cytokines and their receptors, and provides biochemical data and references about signaling molecules as well as ligand-receptor relationships.
• EcoCyc and MetaCycEcoCyc and MetaCyc Stanford Research InstituteStanford Research Institute– EcoCyc database describes the genome and the biochemical machinery of E. coli. The database contains up-to-
date annotations of all E. coli genes. EcoCyc describes all known pathways of E. coli small-molecule metabolism. Each pathway and its component reactions and enzymes are annotated in rich detail, with extensive references to the biomedical literature. The Pathway Tools software provides query and visualization services.
BIND (Biomolecular Interaction Network Database)BIND (Biomolecular Interaction Network Database) UBC, Univ. of Toronto UBC, Univ. of Toronto
-- -- BIND is a database designed to store full descriptions of interactions, molecular complexes and pathways, including interactions between any two molecules composed of proteins, nucleic acids and small molecules. Chemical reactions, photochemical activation and conformational changes can also be described. Abstraction is made in such a way that graph theory methods may be applied for data mining. The database can be used to study networks of interactions, to map pathways across taxonomic branches and to generate information for kinetic simulations.
CSE-291: Ontologies in Data Integration
Objectives of the KEGG ProjectObjectives of the KEGG Project
• Pathway Database:Pathway Database: Computerize current knowledge of molecular and cellular biology in terms of the pathway of interacting molecules or genes.– generic metabolic pathways (143)– inferred pathways for all sequenced genomes (2706)
• Genes Database:Genes Database: Maintain gene catalogs of all sequenced organisms and link each gene product to a pathway component
• Ligand Database:Ligand Database: Organize a database of all chemical compounds in living cells and link each compound to a pathway component
• Pathway Tools:Pathway Tools: Develop new bioinformatics technologies for functional genomics, such as pathway comparison, pathway reconstruction, and pathway design
CSE-291: Ontologies in Data Integration
Data Representation in KEGGData Representation in KEGG
• Entity:Entity: a molecule or a gene a molecule or a gene
• Binary relation:Binary relation: a relation between two entities a relation between two entities
• Network:Network: a graph formed from a set of related entities a graph formed from a set of related entities
• Pathway:Pathway: metabolic pathway or regulatory pathway metabolic pathway or regulatory pathway
CSE-291: Ontologies in Data Integration
KEGG: query capabilitiesKEGG: query capabilities
• Searching an browsingSearching an browsing• Clickable mapsClickable maps• Map coloring Map coloring
– user provides a family of genes from gene expression data– matching pathways are listed– genes are colored on pathway maps
• Path finding between compoundsPath finding between compounds
CSE-291: Ontologies in Data Integration
Concluding remarksConcluding remarks
• We focused on what needs to be representedWe focused on what needs to be represented• New kinds of queriesNew kinds of queries
– Graph queries– Comparison of models and traces– is flux q possible in steady state for network N?– Similarity of networks based on the similarity of their flux
cones– Compare networks based on
• Their structure• Their flux cone• Their dynamic behavior
– What-if queries
• We did not cover logics for simulationWe did not cover logics for simulation– linear logic, computation tree logic