By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009
description
Transcript of By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009
![Page 1: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/1.jpg)
By Xianfeng (Jeff) Chen
Computational and Systems Biologist
May 7, 2009
Bioinformatics Cyber-infrastructure for Genomics and Proteomics in Systems Biology
![Page 2: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/2.jpg)
Agenda Today
(1) Cyber-infrastructure and systems biology.
(2) High performance computing and software for peptide/protein identification and quantification, data mining/target discovery, on mass spectrometry generated proteomics data. (3) Relational database management system, genome annotation methodology, systems biology data integration, biology knowledge generation and augmentation.
![Page 3: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/3.jpg)
Section One: Cyber-infrastructure and Systems Biology
Reductionist approach,one gene, one protein
Systems approach,multiple genes, network
analysis
Cutting edge science and technology
![Page 4: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/4.jpg)
Status of Technologies in Systems Biology
![Page 5: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/5.jpg)
Cyber-infrastructure for Systems Biology Cyber-infrastructure for Systems Biology
• “…. build new types of scientific and engineering knowledge environments and organizations to pursue research in new ways and with increased efficacy.
• …..new NSF funding of $1 billion per year is needed to achieve critical mass …….
2008Awarded $50 millions
http://www.communitytechnology.org/nsf_ci_report/
2004Awarded to $100 millions
2004Awarded $85 millions
![Page 6: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/6.jpg)
Supporting Cyber- infrastructure and Systems Biology Workflow
Historic strong area
Supporting
![Page 7: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/7.jpg)
(DOE - Genomics: GTL Roadmap, p.52)
Cyber-knowledge System to Enable Genomics-based Predicative Medicine
![Page 8: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/8.jpg)
System Integration at Systems Biology CenterSystem Integration at Systems Biology Center
Core Laboratory Facility:Data Generation
Core Computational Facility:Data Processing, Storage,
and Dissemination
Cyber-infrastructure, Data Management, Data Analysis Pipeline, and Data Display
(1) LIMS for raw data & protocol(2) Preprocessed data management(3) High throughput computing(4) Data validation and integration(5) Knowledge representation
Data Mining and Knowledge Discovery
![Page 9: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/9.jpg)
PC Single CPU Computing Unix Multiple CPUs Computing Cluster Computing
Cyber-infrastructure Component (1) : High Performance Computing
Step 1 Step 2Start point
Most labs 5-10 biological labs in US 2-4 biological labs
For large sets of data analysis
--- Migration of Bio-Computing Capability
![Page 10: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/10.jpg)
Cyber-infrastructure Component (2) : Integrated Knowledgebase System
--- Case Study of National Biodefense Proteomics Data Center
![Page 11: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/11.jpg)
Public File Server
Private File ServerOracle Relational Database
Database query,
Data upload over
http
Batch Processing
(1) Data uploading;
(2) Data validation;
(3) Data analysis;
(4) Data processing
Perl,
Java
Web services
Data exchange using XML based
SOAP
---- System Integration Case 1: UVa Proteomics Data Center---- System Integration Case 1: UVa Proteomics Data Center
High Performance
and ThroughputComputing
Data ManagementData Management
Section Two: High Performance Computing and Proteomics
![Page 12: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/12.jpg)
Protein Database Search EnginesMascot Matrix Science
Sequest / Bioworks Scripps/ThermoX! Tandem the GPMSpectrum Mill Agilent Technologies
OMSSA NCBIPEAKS Bioinformatics Solutions Inc. Phenyx GeneBio
Statistical Validation and QuantitationPeptideProphet Institute for Systems Biology ProteinProphet Institute for Systems Biology ASAPRatio, XPRESS, Libra Institute for Systems Biology Scaffold Batch System Proteome Software, Inc.SIEVE ThermoCensus Scripps Research Institute
Open Data StandardsFuGE and XAR FHCRC, ICBC, ITMAT, & ManchesterMIAPE HUPO PSI and Collaborators mzXML, pepXML, protXML Institute for Systems Biology MS1, MS2, SQT Scripps Research Institute
Computational Proteomics Software and Algorithms
Many more ……..…
![Page 13: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/13.jpg)
System Integration Case 2: National Biodefense Proteomics Data Center
http://www.proteomicsresource.org
Awarded $14 millions
![Page 14: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/14.jpg)
(1) University of Michigan Microarray and mass spectrometry
(2) Caprion Pharmaceuticals Mass spectrometry
(3) Harvard Proteomics Institute Genomics and protein expression array
(4) Albert Einsten College of Medicine Mass spectrometry
(5) PNNL Mass spectrometry
(6) Scripps NMR structural, X-ray crystal diffraction data, and Mass spectrometry
(7) Myriad Genetics Yeast two-hybrid system
Proteomics Research Centers (PRC) and Their Major Data Types
PRC Organizations Major Data Types
![Page 15: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/15.jpg)
Proteomics Data Flow
PRCS
VBI
Public
Data Sources
2D GELS
Protein Array
LC
Immunoaffinity purification
Y2H
MS
MS/MS
NMR
X-Ray Cryoelectron Microscopy
X-Ray Defraction
etc…
Data Types
QA
&
QC
Quality Assurance
& Quality Control
Converting to Standard Format
Standard
Format
Standard Format for Each Data Type
QA
&
QC
Quality Assurance
& Quality Control
Data Modeling / Decomposition
Relational Database
MIAME and MIAPE-like Standards/SOP for Data Submission
![Page 16: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/16.jpg)
Proteomics Database Architecture
![Page 17: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/17.jpg)
Search By Experiment/Sample
Databases in Proteomics Data Center
![Page 18: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/18.jpg)
• Annotation improvement and interaction network analysis
(1) Non-homologous based methods -------------- Phylogenetic profiling,
Rosetta stone pattern,
Operon analysis,
Co-expression profiling,
Gene neighboring etc.
(2) Comparative genomics with reference genomes --- E. coli, yeast, Arabidopsis,
etc. model organisms.
• Identifying anchor points for data integration
(1) Known metabolic pathway;
(2) Known signal transduction pathway;
(3) Known gene regulation machinery;
(4) Known protein-protein interaction map.
Strategies for Annotating Raw Data into Meaningful Knowledge
BMC Bioinformatics 2006, 7 (Suppl 4):S18
![Page 19: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/19.jpg)
Qualitative Data Integration and Knowledge Augmentation Based on Networks Biology
![Page 20: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/20.jpg)
Quantitative Proteome Profiling
--- The field is 2-3 years old
Thermo SIEVE Scatter Plot of 14 UVa Raw Files for Validation of Data Quality and Absolute Quantification.
Scaffold Capability of Proteome Spectra Counts of Semi-quantification.
![Page 21: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/21.jpg)
Search Engine Comparison at UVa Proteomics Data Center (1)
Few common annotations
Low annotation rates
![Page 22: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/22.jpg)
Peptide/Protein Identifications with Various Protein Database Search Engines (2)
X!Tandem missed OMSSA missed
Sequest over-predicted
![Page 23: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/23.jpg)
UVaPDC, MS/MS Search Engine Comparison (3)
Spectra counts
Common annotations
Statistics on confident values
![Page 24: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/24.jpg)
Statistics and Summarization Capability of Scaffold
--- The best feather of the software
![Page 25: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/25.jpg)
![Page 26: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/26.jpg)
Data Mining on Data Processed via Computational Approach
Knowledge-based Discovery
![Page 27: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/27.jpg)
Identified
Identified
Rate limited step
Knowledge Inference
Knowledge Inference
Inference on Gene Network in Systems Biology
(1) Y2H, (2) MS pull down assay, (3) Co-expression assay.
Where are the significant regulatory steps impacting pathway expression ?
Target/lead protein
![Page 28: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/28.jpg)
Raf
MAPK
EDH1
EPS8L1* or
EPS8L2*GDP
GTP
NRas*EPS15
Mucin-4*
Gβ
Gα* GγGTP
P
EGFRAdenylate
Cyclase
ATPcAMP
Cell ProliferationMP Formation
P
Gα*
Gβ
Gγ
Urinary Biomarker Identification ---EGFR Pathway Related Bladder Cancer
----- Small scale analysis
* Differentially expressed
Patient with Bladder Cancer
Healthy Individual
Urine Urine
Urine Microparticles
LC-MS/MS
SEQUEST
Spectral Count Analysis
Western Blotting
EPS8L2
Exosomes
Ectosomes
![Page 29: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/29.jpg)
Patten Matching on Gene Signatures at Various Biological States
--- Large-scale analysis
*** query signatures are compared to reference gene/protein expression signatures for known perturbations or disease phenotypes. (many to many association analysis)
![Page 30: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/30.jpg)
Section Three : Knowledge Base Establishment
Database Case 1 Soybean Upstream Regulatory Elements for Ongoing Regulatory Motif Annotation
![Page 31: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/31.jpg)
115
89
Nominated Transcription Factor Involved in Stress Response
Group IX
Red Dot = Soybean ERF genes
Implicated in regulating wounding and jasmonate responses
Soybean Promoter :
GmERFs, Gmubis, Gmcons, GmWRKYs
more and more and more……..
10 promoters per month
Promoter
![Page 32: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/32.jpg)
Ongoing Effort on Transcription Factor Binding Motifs
---- Identify genetic circuits of cell wall, starch, and lipid biosynthesis and degradation
![Page 33: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/33.jpg)
Elucidation of Conserved Co-expression Networks via Data Integration with Expression Profiling Data
![Page 34: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/34.jpg)
(1) BMC Bioinformatics. 2007, 8:129.(2) BMC Bioinformatics. 2008, 9:53.
Database Case 2 CGKB and TOBFAC Knowledge Bases
![Page 35: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/35.jpg)
Genome Annotation Strategy (1) : Homology-based Annotation
263,425 total cowpea gene space sequence (GSS).
High level coding region detection !
BMC Genomics. 2008, 9:103.
![Page 36: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/36.jpg)
Genome Annotation Strategy (2) : Metabolic Pathway Integration
BMC Bioinformatics. 2007, 8:129.
![Page 37: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/37.jpg)
Genome Annotation Strategy (3) : GO Integration with Distribution of Function Assignments
BMC Genomics. 2008, 9:103.
![Page 38: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/38.jpg)
Genome Annotation Strategy (4): Comparative Genomics at Genome-scale
BMC Genomics. 2008, 9:103.
---- Example of medicago vs cowpea
![Page 39: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/39.jpg)
Genome Annotation Strategy (5): Comparison at Gene Family Level
(1) BMC Genomics. 2008, 9:103.(2) Plant Physiology. 2008, 147:280-295.
--- WRKY and CONSTANS (CO) and CO-like Gene Families of Cowpea Transcription Factors
![Page 40: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/40.jpg)
Genome Annotation Strategies: (6) Repeat, (7) Domain, (8) Gene Model
BMC Bioinformatics. 2007, 8:129.
Repeat
Domain
Gene Model
![Page 41: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/41.jpg)
Genome Annotation Strategy (9) : Comparative Genomics on Network for Conserved Protein Complexes
Comparative genome analysis
Conserved networks
![Page 42: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/42.jpg)
Published Protein-Protein (PPI) Interactions in Organisms
Example of Yeast PPI
![Page 43: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/43.jpg)
Genome Annotation Strategy (10): Functional Validation of Genes of Interest Through Reverse Genetics Program
My name
2008
![Page 44: By Xianfeng (Jeff) Chen Computational and Systems Biologist May 7, 2009](https://reader034.fdocuments.in/reader034/viewer/2022051623/5681587d550346895dc5deba/html5/thumbnails/44.jpg)
Acknowledgement