NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B....
-
Upload
cecily-spencer -
Category
Documents
-
view
212 -
download
0
Transcript of NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B....
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Molecular Science in NPACI
Russ B. AltmanNPACI Molecular Science Thrust
Stanford Medical InformaticsStanford Computer Science
NPACI Site VisitJuly 21-22, 1999
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Overview
• Molecular Science vision and roadmap• Molecular Science project accomplishments• Alpha project: Bioinformatics Infrastructure for Large-Scale Analyses
• Overview of plans
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Molecular Science Is Changing...
• The genome sequencing project gives us unprecedented access to biological molecular information
• New experimental technologies (gene arrays) giving new access to functional information
• Experiment & theory refining structural data• Combinatorial chemistry allows design of
molecules• New paradigm: Collect the data first, then mine
it later with hypotheses
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Vision for Molecular Science Thrust• Understand how fundamental molecular properties contribute to
macroscopic phenomena in chemistry and biology.• Simulate molecular dynamics for large systems (e.g., biological
molecules).• Port existing codes to parallel machines, test them, and apply to
problems not currently within reach (CR, MS, PTE).
• Create databases for molecular systems to support exploratory analysis, hypothesis generation, communication, dissemination.
• Create and populate data schema for critical areas: Biological macromolecules, MD trajectories, quantum computations (DICE, PTE).
• Create visualization technologies for communication/analysis (IE, MS).
• Provide hardened tools to scientific community for use.• Identify critical algorithms requiring HPC, implement on NPACI
hardware.• Conduct education, outreach, and training of scientists/students (EOT,
IE, CR).
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
IE
IE
IE
META
META
Molecular Science Advancing understanding of biochemical structure and function
Bioinformatics infrastructure
Large-scale molecular dynamics
GenBank Molecular Trajectory DB
PDB
200220001999 2001
CHARMM
AMBER
Molecular dynamics
Algorithms:ComparisonPhylogenyAlignmentScanning
DICE
Federated data collections
Remote database analysis
Protein Folding
Enhanced molecular chemistry
Molecular chemistry
Quantum chemistry
Transition states
ImagingAlgorithms
Bioinformatics
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Projects and Accomplishments
• Biological Data Representation & Query (SDSC, Rutgers, Stanford, Washington U, U Texas)• All-vs.-all comparison of 3-D protein structures (SDSC)• Sitesscanning code for 3-D features (Stanford)• Genetic alg. code for large phylogenetic trees (U Texas)• CORBA for distributed access to ligand DB (Rutgers)
• Enhanced Biological Imaging (U Chicago, U Houston)• Port of “optimal line” code for EM reconstruction
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Projects and Accomplishments
• Transition States in Complex Systems (UC Berkeley)• Wrapped CHARMM, AMBER, CPMD to oversample rare
events
• Quantum Reaction Dynamics (Caltech)• Ported code for multi-atom reactions to HP Exemplar
• Management (Stanford)• New thrust management• Thrust meeting in September 1998• Two high-profile alpha projects (CHARMM, Analyses)• One strategic application collaboration (AMBER)
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Alpha Projects in Molecular Science
• Rationale• Molecular science computing is, for the most part,
workstation-based, and the uses of HPC are limited but critical:
• Long-time-scale, accurate simulations• Large scans over data collections, both O(N) and O(N2)• Global optimizations of structures, alignments, networks
• The requirements for technology support for all are significant
• Grid computing = metasystems• Movement of large amounts of data = data-intensive computing
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Bioinformatics Infrastructure for Large-Scale Analyses
• Need to construct prototype analyses• Establish feasibility of doing analyses routinely • Debug infrastructure for supporting analyses• Provide templates for “copy-and-edit” duplication
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Databases and Analyses
• PDB (SDSC, Stanford)• Linear scan searching for active sites• All-by-all comparisons for clustering
• Genbank (Washington U)• All-by-all comparison of sequences over set of alignment
parameters, followed by clustering• Linear scan through results to find new relations
• Molecular Dynamics Trajectory DB (U Houston)• Linear scan through time cuts of trajectory to look for
features of interest (e.g., form/unform active site)
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Required Technologies
• Data-intensive Computing• Robust connection to computational grid (Legion)• Language for describing data schema to SRB• Strategies for moving large amounts of data to NPACI
CPUs
• Metasystems• Registration of key algorithms within Legion for platforms• Robust connection to large data stores (SRB)• Reusable scripts for running analyses
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Bioinformatics Infrastructure for Large AnalysesGoal: Create reusable templates and demonstrate value
Protein Analysisin Legion
O(N)
PDB inSRB
GenBank in SRB
MDTDBin SRB
GeneArrayDB in SRB
FullScale Runs
of Algorithmson Databases
Critical Databases
Enabled forGrid Computing
1999 2000 2001 2002
Sequence Analysisin Legion
O(N2)
Phylogeny programsin Legion
O(N2)Templates for
large scale O(N)and O(N2) Analyses
Report & Evangelizeto ScientificCommunity
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
FY00 Milestones
• Connect SRB data model to PDB schema (XML)• Connect SRB data model to Genbank (XML)• Register linear PDB algorithms in Legion• Register sequence algorithms for Genbank• Analyze scheduling challenges for linear scans
and all-vs.-all analyses• Run linear scans on PDB and all by all on subset
of Genbank
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
FY01 Milestones
• Connect SRB model to MDTDB• Run full Genbank all-vs.-all analyses and
analysis of MD trajectories• Register phylogenetic algorithms with Legion• Optimize analyses with improved scheduling• Report results to computational science
community• Evangelize capabilities to computational science
community
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Bioinformatics Infrastructure for Large AnalysesGoal: Create reusable templates and demonstrate value
Protein Analysisin Legion
O(N)
PDB inSRB
Genbank in SRB
MDTDBin SRB
GeneArrayDB in SRB
FullScale Runs
of Algorithmson Databases
Critical Databases
Enabled forGrid Computing
1999 2000 2001 2002
Sequence Analysisin Legion
O(N2)
Phylogeny programsin Legion
O(N2)Templates for
large scale O(N)and O(N2) Analyses
Report & Evangelizeto ScientificCommunity
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Benefits
• Novel science enabled• Comprehensive scans of 3-D structure for functional sites• Bird’s-eye understanding of sequence space• Improved understanding of protein dynamics• Most comprehensive phylogenetic trees ever constructed
• Capabilities made routine and widely available• Templates for experiments made available• Time, space estimates for computations for those making
allocation requests• In-house expertise at making these work