BEL language v1.0

21
Natalie Catlett May 10, 2013 Language Overview 1

description

Biological Expression Language is a knowledge representation language for qualitative causal biology.

Transcript of BEL language v1.0

Page 1: BEL language v1.0

Natalie CatlettMay 10, 2013

Language Overview

1

Page 2: BEL language v1.0

Contents

• Language Overview– Statements– Annotations– Terms– Functions– Relationships

• Knowledge Representation Examples

Page 3: BEL language v1.0

Language Overview

3

Page 4: BEL language v1.0

Language Overview

• BEL statements capture knowledge• BEL annotations provide information about

statements– Citation, experimental context, etc.

• BEL terms are composed using BEL functions applied to namespace values

• BEL relationships connect BEL terms

4

Page 5: BEL language v1.0

BEL Statements• Basic statement types:

• Complex statement type:– A causal statement can be used as the target term of a causal statement

5

Term Expression Relationship Term Expression

Term Expression

complex(p(HGNC:CCND1), p(HGNC:CDK4))

p(HGNC:CCND1) directlyIncreases kin(p(HGNC:CDK4))

Term Expression Causal Relationship Causal Statement

p(HGNC:CLSPN) -> (kin(p(HGNC:ATR)) => p(HGNC:CHEK1, pmod(P)))

Page 6: BEL language v1.0

BEL Annotations

• Annotations provide information about one or more BEL Statements

6

SET Citation = {"PubMed", "J Mol Med", "12682725", "2003-03-14","Limbourg FP|Liao JK",""}

SET Evidence = "high-dose steroid treatment decreases vascular inflammation and ischemic tissue damage after myocardial infarction and stroke through direct vascular effects involving the nontranscriptional activation of eNOS"

SET Species = "9606"

SET Tissue = "Vascular System"

SET Disease = "Stroke"

a(CHEBI:corticosteroid) -| bp(MESHD:"Inflammation")

Page 7: BEL language v1.0

BEL Terms

• BEL terms have the following components:– Function

• Required• Can be nested to create complex terms

– Namespace Abbreviation• Optional

– Value• Required• If a namespace is given, the value is found in that namespace

• BEL terms from different namespaces are unified during compilation using information in the BEL Namespace Equivalence documents

7

f (ns:value)

Page 8: BEL language v1.0

BEL Functions

• Types of functions:– Abundances– Modifications of abundances– Processes– Activities– Transformations

• Abundances and processes are applied directly to namespace values

• All other functions are applied to abundance functions!

Page 9: BEL language v1.0

BEL Functions

• BEL Functions enable representation of different aspects of a value– E.g., AKT1 (EGID:207)

can be represented as a• Gene• RNA• Protein• Modified Protein• Activity

9

Page 10: BEL language v1.0

Abundance Functions

10

• Most abundance functions take namespace values– complexAbundance() can take a namespace value OR a list of

abundance terms– compositeAbundance() must take a list of abundance terms

Short Form Long Form Example Example Descriptiona() abundance() a(CHEBI:water) the abundance of water

p() proteinAbundance() p(HGNC:IL6) the abundance of human IL6 protein

complex() complexAbundance()

complex(NCH:"AP-1 Complex") the abundance of the AP-1 complex

complex(p(MGI:Fos), p(MGI:Jun)) the abundance of the complex comprised of mouse Fos and Jun proteins

composite() compositeAbundance() composite(p(HGNC:IL6), a(CHEBI:dexamethasone))

the abundances of IL6 protein and dexamethasone, together

g() geneAbundance() g(HGNC:ERBB2) the abundance of the ERBB2 gene (DNA)

m() microRNAabundance() m(MGI:Mir21) the abundance of mouse Mir21 microRNA

r() rnaAbundance() r(HGNC:IL6) the abundance of human IL6 RNA

Page 11: BEL language v1.0

Modification Functions

• Modifications are functions used as arguments within abundance functions– Post-translational modifications– Sequence variants (mutations, polymorphisms)

11

Short Form Long Form Example Example Description

pmod() proteinModification()

p(HGNC:AKT1, pmod(P)) the abundance of human AKT1 protein modified by phosphorylation

p(MGI:Rela, pmod(A, K)) the abundance of mouse Rela protein acetylated at an unspecified lysine

p(HGNC:HIF1A, pmod(H, N, 803)) the abundance of human HIF1A protein hydroxylated at asparagine 803

sub() substitution() p(HGNC:PIK3CA, sub(E, 545, K))the abundance of the human PIK3CA protein in which glutamic acid 545 has been substituted

with lysine

trunc() truncation() p(HGNC:ABCA1, trunc(1851))the abundance of human ABCA1 protein that has

been truncated at amino acid residue 1851 via introduction of a stop codon

fus() fusion()

p(HGNC:BCR, fus(HGNC:JAK2, 1875, 2626))

the abundance of a fusion protein of the 5' partner BCR and 3' partner JAK2, with the

breakpoint for BCR at 1875 and JAK2 at 2626

p(HGNC:BCR, fus(HGNC:JAK2)) the abundance of a fusion protein of the 5' partner BCR and 3' partner JAK2

Page 12: BEL language v1.0

Process Functions

• Processes include biological phenomena that occur at the level of the cell or organism

12

Short Form Long Form Example Example Description

bp() biologicalProcess() bp(GO:"cellular senescence") the biological process cellular senescence

path() pathology() path(MESHD:"Pulmonary Disease, Chronic Obstructive") the pathology COPD

Page 13: BEL language v1.0

Activity Functions

• Applied to protein and complex abundances to specify the frequency of events resulting from the molecular activity of the abundance– This distinction is useful for proteins whose activities are regulated by post-

translational modification

13

Short Form Long Form Example Example Descriptioncat() catalyticActivity() cat(p(RGD:Sod1)) the catalytic activity of rat Sod1 protein

chap() chaperoneActivity() chap(p(HGNC:CANX)) the events in which the human CANX (Calnexin) protein functions as a chaperone to aid the folding of other proteins

gtp() gtpBoundActivity() gtp(p(PFH:"RAS Family")) the GTP-bound activity of RAS Family protein

kin() kinaseActivity()kin(complex(NCH:"AMP-activated protein kinase

complex"))the kinase activity of the AMP-activated protein kinase complex

act() molecularActivity() act(p(HGNC:TLR4)) the ligand-bound activity of the human non-catalytic receptor protein TLR4; a more specific activity function is not applicable to TLR4 protein

pep() peptidaseActivity() pep(p(RGD:Ace)) the peptidase activity of the Rat angiotensin converting enzyme (ACE)

phos() phosphataseActivity() phos(p(HGNC:DUSP1)) the phosphatase activity of human DUSP1 protein

ribo() ribosylationActivity() ribo(p(HGNC:PARP1)) the ribosylation activity of human PARP1 protein

tscript() transcriptionalActivity() tscript(p(MGI:Trp53)) the transcriptional activity of mouse TRP53 (p53) protein

tport() transportActivity() tport(complex(NCH:"ENaC Complex"))

the frequency of ion transport events mediated by the epithelial sodium channel (ENaC) complex

Page 14: BEL language v1.0

Transformation Functions

• Transformations are events in which one class of abundance is transformed or changed into a second class of abundance

14

Short Form Long Form Example Example Description

deg() degradation() deg(r(HGNC:MYC)) the degradation of human MYC RNA

sec() cellSecretion() sec(p(MGI:Il6)) the secretion of mouse Il6 protein

surf() cellSurfaceExpression() surf(p(RGD:Fas)) the translocation of Rat Fas protein to the cell surface

tloc() translocation() tloc(p(HGNC:NFE2L2), MESHCL:Cytoplasm, MESHCL:"Cell Nucleus")

the event in which human NFE2L2 protein is translocated from the

cytoplasm to the nucleus

rxn() reaction()rxn(reactants(a(CHEBI:phophoenolpyruvate), a(CHEBI:ADP)),products(a(CHEBI:pyruvate),

a(CHEBI:ATP)))

the event in which the reactants phosphoenolpyruvate and ADP are

converted into the products pyruvate and ATP

Page 15: BEL language v1.0

BEL Relationships

• Causal relationships– increases, directlyIncreases, decreases, directlyDecreases,

rateLimitingStepOf, causesNoChange

• Correlative relationships– negativeCorrelation, positiveCorrelation, association

• Biomarker relationships– biomarkerFor, prognosticBiomarkerFor

• Assignment to groups– hasMember, hasComponent, hasMembers, hasComponents

• Other– isA, subProcessOf

• Genomic relationships– transcribedTo, translatedTo, orthologousTo

15

Page 16: BEL language v1.0

Knowledge Representation Examples

16

Page 17: BEL language v1.0

Knowledge Capture – Example 1• From published paper describing effects of Tnf in rat

chondrocytes

17

Page 18: BEL language v1.0

Knowledge Capture – Example 1

18

SET Citation = {"PubMed","Arthritis Res Ther.","19144181"}

SET Species = "10116"

SET Cell = "Chondrocytes"

SET Evidence = "we identified the relative changes in transcript levels of the extracellular matrix components Agc1, Hapln1, and Col2a1, proteases Mmp-9 and Mmp-12, as well as the inflammatory cytokine macrophage Csf-1 (Figure 3). TNFα decreased Agc1 and Hapln1 (Figure 3a, b) and increased Mmp-9 and Mmp-12 (Figure 3e, f)"

p(RGD:Tnf) -> r(RGD:Mmp9)p(RGD:Tnf) -> r(RGD:Mmp12)p(RGD:Tnf) -| r(RGD:Acan) // Agc1 = Acanp(RGD:Tnf) -| r(RGD:Hapln1)Perturbation (source term)

= Tnf proteinMeasurements (target terms) = RNA abundance

In-line comment

Experimental context = Rat chondrocytes

Text from paper supporting statements

Reference

Page 19: BEL language v1.0

Knowledge Capture – Example 2

19

SET Citation = {"PubMed", "Anticancer Agents Med Chem. 2010 Oct 1;10(8):617-24.","21182469"}

SET Evidence = "One non-synonymous SNP 538G>A (Gly180Arg) has been found to greatly affect the function and stability of de novo synthesized ABCC11 (Arg180) variant protein. The SNP variant lacking N-linked glycosylation is recognized as a misfolded protein in the endoplasmic reticulum (ER) and readily undergoes proteasomal degradation. "

p(HGNC:ABCC11, sub(G,180,R)) =| \ p(HGNC:ABCC11, pmod(G,N))

p(HGNC:ABCC11, pmod(G,N)) =| deg(p(HGNC:ABCC11))

Gly180Arg variant ABCC11 protein lacks glycosylation

ABCC11 glycosylation blocks degradation

• Protein variants and post-translational modifications

Page 20: BEL language v1.0

Knowledge Capture – Example 3• Microarray data – can use probe set ID as identifier

20

SET Citation = {"PubMed","J Exp Med. 2006 Nov 27;203(12):2763-77.","17116732"}

SET Evidence = "Table S1. Affymetrix U133 Plus 2.0 GeneChip array data showing transcripts in HDLECs up- or down-regulated by a factor of at least twofold (P < 0.1) after stimulation with TNF-α."

SET Tissue = "Endothelium, Lymphatic"

p(HGNC:TNF) -> r(HGU133P2:205476_at)p(HGNC:TNF) -> r(HGU133P2:215101_s_at)p(HGNC:TNF) -> r(HGU133P2:214974_x_at)p(HGNC:TNF) -> r(HGU133P2:203868_s_at)

p(HGNC:TNF) -| r(HGU133P2:235683_at)p(HGNC:TNF) -| r(HGU133P2:235150_at)p(HGNC:TNF) -| r(HGU133P2:205258_at)

Page 21: BEL language v1.0

Knowledge Capture – Example 4• Protein modifications and activities

21

SET Citation = {"PubMed","Proc Natl Acad Sci U S A 2000 Oct 24 97(22) 11960-5","11035810"}

SET Evidence = "GSK-3 activity is inhibited through phosphorylation of serine 21 in GSK-3 alpha and serine 9 in GSK-3 beta."

SET Species = "9606"

p(HGNC:GSK3A,pmod(P,S,21)) =| kin(p(HGNC:GSK3A))p(HGNC:GSK3B,pmod(P,S,9)) =| kin(p(HGNC:GSK3B))

SET Evidence = "These serine residues of GSK-3 have been previously identified as targets of protein kinase B (PKB/Akt)"

kin(p(PFH:"AKT Family")) => p(HGNC:GSK3A,pmod(P,S,21))kin(p(PFH:"AKT Family")) => p(HGNC:GSK3B,pmod(P,S,9))

New Evidence Line; Citation and Species still apply to statements that follow