Towards semantic systems chemical biology
description
Transcript of Towards semantic systems chemical biology
![Page 1: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/1.jpg)
From Data Integration to Data mining in Semantic Web
systems chemical biology as a case studyBin Chen
School of Informatics and ComputingIndiana University at Bloomington
Lecture for S636Nov 17, 2011
![Page 2: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/2.jpg)
Outline
• Introduction• RDF (Chem2Bio2RDF)• OWL (Chem2Bio2OWL)• Graph mining (SLAP)
![Page 3: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/3.jpg)
ChemicalChemical BiologyBiology SystemsSystems PhenotypePhenotype
interacting mapping
Compound Drug
ProteinGene
PPIMetabolic PathwayGene Regulatory
DiseaseSide effectToxicity
Chemogenomics
What’s Systems Chemical Biology
Oprea TI, et al, Systems chemical biology, nature, 2007Oprea TI, et al, Systems chemical biology, nature, 2007
![Page 4: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/4.jpg)
The data are heterogeneous and scattered around the web…
MATADOR
![Page 5: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/5.jpg)
Semantic Web
• an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.
Semantic web Stackhttp://en.wikipedia.org/wiki/Semantic_Webhttp://en.wikipedia.org/wiki/Semantic_Web
![Page 6: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/6.jpg)
SPARQLRDF
Ontology
Algorithm and tools
Applications
Experimental Data Text mining Data
Chem2Bio2RDF
Chem2Bio2OWL
Path finding; Association search; Association ranking and prediction
Polypharmacology; drug side effect
Architecture of Semantic Systems Chemical Biology
![Page 7: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/7.jpg)
Outline
• Introduction• RDF (Chem2Bio2RDF)• OWL (Chem2Bio2OWL)• Graph mining (SLAP)
![Page 8: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/8.jpg)
RDF (Resource Description Framework)
• a standard model for data interchange on the Web, using triples (subject, predicate, object) to present and link data, and using URIs to identify resources.
Resource(subject)
Value(object)
Property
(predicate)
Drug Lipitorname
<RDF> <Description about="http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB01076"> <name>Lipitor</author> <company>Pfizer</company> </Description></RDF>
company
Pfizer
http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB01076
URI
![Page 9: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/9.jpg)
Use RDF to Integrate Data
http://chem2bio2rdf.org/drugbank/DB01076name
company
lipitor
Pfizer
http://chem2bio2rdf.org/drugbank/DB01076 Molecular_Weight
formula
558.6398
C33H35FN2O5
Database 1
Database 2
Same URI, merged!
![Page 10: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/10.jpg)
Use RDF to Link Data
http://chem2bio2rdf.org/drugbank/DB01076
sameAs
http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01076
http://chem2bio2rdf.org/pubchem/resource/pubchem_compound/60823
cid
![Page 11: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/11.jpg)
uniprot
Bio2RDF
Others
LODD
Chem2Bio2RDF
VirtuosoTriple store
SPARQL ENDPOINTS
Dereferenable URI
Browsing
PlotViz: Visualization
Cytoscape Plugin
Linked Path Generation and Ranking
Third party tools
![Page 12: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/12.jpg)
Workflow for RDF conversion
XML
CSV
DB
TXT
Relational DB
D2R Mapping
D2R server
Dumping VirtuosoTriple Store
Scripts
Ontology
Publishing
External Sources
DownloadLocal copy
…
Chen B,et al. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics. 2010Chen B,et al. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics. 2010
![Page 13: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/13.jpg)
# Table c2b2r_DrugBankDrugmap:c2b2r_DrugBankDrug a d2rq:ClassMap;
d2rq:dataStorage map:database;d2rq:uriPattern "drugbank_drug/@@c2b2r_DrugBankDrug.DBID|urlify@@";d2rq:class drugbank:DrugBankDrug;d2rq:classDefinitionLabel "c2b2r_DrugBankDrug";.
map:c2b2r_DrugBankDrug__label a d2rq:PropertyBridge;d2rq:belongsToClassMap map:c2b2r_DrugBankDrug;d2rq:property rdfs:label;d2rq:pattern "@@c2b2r_DrugBankDrug.Generic_Name@@";.
map:c2b2r_DrugBankDrug_DBID a d2rq:PropertyBridge;d2rq:belongsToClassMap map:c2b2r_DrugBankDrug;d2rq:property drugbank:DBID;d2rq:propertyDefinitionLabel "c2b2r_DrugBankDrug DBID";d2rq:column "c2b2r_DrugBankDrug.DBID";
Table
D2R mapping
RDF
Exhibit link
![Page 14: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/14.jpg)
Node represents each database colored by its RDF vender; Directed edge shows the linkage from one dataset to another dataset, colored by the linkage type. E.g,., the type compound includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively.
Chem2Bio2RDF Datasets
Chem2Bio2RDF data
Other data venderscompoundprotein/genechemogenomicsliteratureothers
http://chem2bio2rdf.org
![Page 15: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/15.jpg)
http://linkeddata.org
![Page 16: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/16.jpg)
uniprot
Bio2RDF
Others
LODD
Chem2Bio2RDF
VirtuosoTriple store
SPARQL ENDPOINTS
Dereferenable URI
Browsing
PlotViz: Visualization
Cytoscape Plugin
Linked Path Generation and Ranking
Third party tools
![Page 17: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/17.jpg)
SPARQL
• SQL-like Query Language for RDF
![Page 18: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/18.jpg)
Implement cheminformatics and bioinformatics tools into SPARQL
ARQ Function Extension
SPARQL
Chemistry Development
KitsBioJAVA Web Services
PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/>PREFIX f: <java:org.bio2chem2rdf.arq.> SELECT ?x ?s WHERE { ?x drugbank:smilesStringCanonical ?s FILTER (
f:tanimoto('NS(=O)(=O)C1=CC(=C(Cl)C(Cl)=C1)S(N)(=O)=O', ?s, 'MACCS') > 0.9 )}
f:tanimoto is used for compound similarity search
![Page 19: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/19.jpg)
Answer scientific questions• Give me all information about this compound• Give me all information about this target• Find chemical associated genes• Find gene associated chemicals• Find disease associated chemicals• Find side effect associated chemicals• Find all the drug-like compounds in PubChem BioAssay that
share at least two targets with a drug in DrugBank • Link KEGG / Reactome Pathways and PubChem to identify
potential multiple pathway inhibitors for MAPK
More in http://chem2bio2rdf.wikispaces.com/multiple+sources
![Page 20: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/20.jpg)
link
![Page 21: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/21.jpg)
Outline
• Introduction• RDF (Chem2Bio2RDF)• OWL (Chem2Bio2OWL)• Graph mining (SLAP)
![Page 22: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/22.jpg)
Node represents each database colored by its RDF vender; Directed edge shows the linkage from one dataset to another dataset, colored by the linkage type. E.g,., the type compound includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively.
Chem2Bio2RDF Datasets
Chem2Bio2RDF data
Other data venderscompoundprotein/genechemogenomicsliteratureothers
http://chem2bio2rdf.org
![Page 23: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/23.jpg)
Ontology workflow
![Page 24: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/24.jpg)
Step 1: Hunting for scientific questions and targeting goals
• What's the targets of troglitazone?• Find PPARG inhibitors with molecular weight
smaller than 500d.• Which pathway will be affected by
troglitazone? • Find all the common/unique genes or proteins
or drugs between/among two or many nodes. • What genes may the compound interact with
and are expressed in liver?
![Page 25: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/25.jpg)
Step 2: Propose framework and basic classes
• SmallMolecule• MacroMolecule• Disease• SideEffect• Pathway• BioAssay• Literature• Interaction
![Page 26: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/26.jpg)
Step 3: Define classes, relations and data properties
• Refine class– Subclass– Utility class
• Object property• Data property
http://chem2bio2owl.wikispaces.com/Version+1.0
![Page 27: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/27.jpg)
Step 4: Align with External ontology
• Import BioPAX• Map disease to Disease Ontology• Standardize terms
– OBO Foundry– NCBO Bioportal
![Page 28: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/28.jpg)
Chem2Bio2OWL
![Page 29: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/29.jpg)
Chem2Bio2RDF
Step 5: Populate Chem2Bio2OWL
• Identifier for compound, drug, protein, gene, pathway, side effect and disease– Primary source
• Term mapping– String similarity match
Protégé API Virtuoso
Pellet reasoning
Chem2Bio2OWL
![Page 30: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/30.jpg)
Step 6: Evaluation---Consistence checking
• Data property• Manually check sample reasoning results by
domain experts
![Page 31: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/31.jpg)
Step 6: Evaluation---case study
• Drug target identificationPREFIX c2b2r: http://chem2bio2rdf.org/chem2bio2rdf.owl#PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?target from <http://chem2bio2rdf.org/owl#>
where {
?chemical rdfs:label ?drugName ; c2b2r:hasInteraction ?interaction . ?interaction c2b2r:hasTarget [bp:name ?target]; c2b2r:drugTarget true .
FILTER (str(?drugName)="Troglitazone") }
Annotated Chem2Bio2OWL
Mashed Chem2Bio2RDF
![Page 32: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/32.jpg)
Outline
• Introduction• RDF (Chem2Bio2RDF)• OWL (Chem2Bio2OWL)• Graph mining (SLAP)
– Semantic Link Association Prediction
![Page 33: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/33.jpg)
![Page 34: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/34.jpg)
Two objects are similar if they are related to similar objects
Coauthorship
Same Target
![Page 35: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/35.jpg)
Two objects are related if they share same objects or their related objects are related
Compound 1
Compound 1
Protein 2
Protein 2
Protein 1
Protein 1
Compound 1
Compound 1
Protein 2
Protein 2
Protein 1
Protein 1
Compound 2
Compound 2
Computer Science
Computer Science
Person2
Person2
Person 1
Person 1
Computer Science
Computer Science
Person2
Person2paper1paper1paper2paper2
advisormajor
publishciteconference
![Page 36: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/36.jpg)
Cmpd1
Cmpd1
Protein1
Protein1
Protein2
Protein2
Cmpd 2
Cmpd 2
Cmpd1
Cmpd1
Cmpd 2
Cmpd 2
Protein1
Protein1
Neighbor Chemogenomics
ChemogenomicsChemogenomics
Chemogenomics
Protein2
Protein2
Cmpd1
Cmpd1
Protein1
Protein1
Chemogenomics hasGOhasGO
Protein2
Protein2
Cmpd1
Cmpd1
Protein1
Protein1
Chemogenomics PPI
GO:0001 GO:0001
Sample patterns
Cmpd1
Cmpd1
Protein1
Protein1
Cmpd 2
Cmpd 2
Chemogenomicshypertensionhypertension
Side effect Side effect
Cmpd1
Cmpd1
Protein1
Protein1
Cmpd 2
Cmpd 2
ChemogenomicsSubstructureSubstructure
substructure substructure
![Page 37: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/37.jpg)
Target 2
Target 2
Compound1
Compound1
Compound 2
Compound 2
Compound 3
Compound 3
Target 3
Target 3
GO:00001 GO:00001
hasGO
hasGO
chemogenomicschemogenomics
chemogenomics
chemogenomics
chemogenomics
neighbor
Side Effect 1
Side Effect 1 hasSideEffect
hasSideEffect
Gene Family 1
Gene Family 1
hasGeneFamily
hasGeneFamily
Target 1
Target 1 chemogenomics
Target 4
Target 4
chemogenomics
proteinProteinInteraction
Association depends on its neighborhoodAssociation depends on its neighborhood
![Page 38: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/38.jpg)
![Page 39: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/39.jpg)
Statistical ModelConvert the question to a path surfing problem
Gene iGene i Gene j
Gene jPPI
PPI
PPI
hasGOhasGO
hasPathway
chemogenomics
P(i j) =1/3
![Page 40: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/40.jpg)
Protein2
Protein2
Cmpd1 (s)
Cmpd1 (s)
Protein
1 (t)
Protein
1 (t)
e1 e2
![Page 41: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/41.jpg)
• Randomly sample 100,000 drug target pairs• Yielding 453,087 paths, 17 patterns
Pattern Samples:
Pattern Distribution
![Page 42: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/42.jpg)
Statistical Model3. Nodes association estimation
Raw score of random pairs fit to normal distribution!
![Page 43: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/43.jpg)
Direct: drug target pairs with IC50<30umIndirect: drug target pairs with no interactionRandom: random pairs
![Page 44: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/44.jpg)
![Page 45: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/45.jpg)
SLAP interface
![Page 46: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/46.jpg)
Acknowledgement
• Cheminformatics/Chemogenomics Group (Dr. David Wild, Indiana University)– Xiao Dong, Huijun Wang, Dazhi Jiao, Dr. Qian Zhu,
Madhuvanthi Sankaranarayanan, Jaehong Shin• Semantic Web Lab (Dr. Ying Ding, Indiana
University)Yuyin Sun, Bing He, Shanshan Chen
• High performance computing (Indiana University)Jong Youl Choi
• Pfizer CS COE (Dr. Eric Gifford)
![Page 47: Towards semantic systems chemical biology](https://reader033.fdocuments.in/reader033/viewer/2022051413/554e93b3b4c90526358b4fc3/html5/thumbnails/47.jpg)
Thanks!