NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B....

16
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical Informatics Stanford Computer Science NPACI Site Visit July 21-22, 1999

Transcript of NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B....

Page 1: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Molecular Science in NPACI

Russ B. AltmanNPACI Molecular Science Thrust

Stanford Medical InformaticsStanford Computer Science

NPACI Site VisitJuly 21-22, 1999

Page 2: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Overview

• Molecular Science vision and roadmap• Molecular Science project accomplishments• Alpha project: Bioinformatics Infrastructure for Large-Scale Analyses

• Overview of plans

Page 3: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Molecular Science Is Changing...

• The genome sequencing project gives us unprecedented access to biological molecular information

• New experimental technologies (gene arrays) giving new access to functional information

• Experiment & theory refining structural data• Combinatorial chemistry allows design of

molecules• New paradigm: Collect the data first, then mine

it later with hypotheses

Page 4: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Vision for Molecular Science Thrust• Understand how fundamental molecular properties contribute to

macroscopic phenomena in chemistry and biology.• Simulate molecular dynamics for large systems (e.g., biological

molecules).• Port existing codes to parallel machines, test them, and apply to

problems not currently within reach (CR, MS, PTE).

• Create databases for molecular systems to support exploratory analysis, hypothesis generation, communication, dissemination.

• Create and populate data schema for critical areas: Biological macromolecules, MD trajectories, quantum computations (DICE, PTE).

• Create visualization technologies for communication/analysis (IE, MS).

• Provide hardened tools to scientific community for use.• Identify critical algorithms requiring HPC, implement on NPACI

hardware.• Conduct education, outreach, and training of scientists/students (EOT,

IE, CR).

Page 5: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

IE

IE

IE

META

META

Molecular Science Advancing understanding of biochemical structure and function

Bioinformatics infrastructure

Large-scale molecular dynamics

GenBank Molecular Trajectory DB

PDB

200220001999 2001

CHARMM

AMBER

Molecular dynamics

Algorithms:ComparisonPhylogenyAlignmentScanning

DICE

Federated data collections

Remote database analysis

Protein Folding

Enhanced molecular chemistry

Molecular chemistry

Quantum chemistry

Transition states

ImagingAlgorithms

Bioinformatics

Page 6: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Projects and Accomplishments

• Biological Data Representation & Query (SDSC, Rutgers, Stanford, Washington U, U Texas)• All-vs.-all comparison of 3-D protein structures (SDSC)• Sitesscanning code for 3-D features (Stanford)• Genetic alg. code for large phylogenetic trees (U Texas)• CORBA for distributed access to ligand DB (Rutgers)

• Enhanced Biological Imaging (U Chicago, U Houston)• Port of “optimal line” code for EM reconstruction

Page 7: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Projects and Accomplishments

• Transition States in Complex Systems (UC Berkeley)• Wrapped CHARMM, AMBER, CPMD to oversample rare

events

• Quantum Reaction Dynamics (Caltech)• Ported code for multi-atom reactions to HP Exemplar

• Management (Stanford)• New thrust management• Thrust meeting in September 1998• Two high-profile alpha projects (CHARMM, Analyses)• One strategic application collaboration (AMBER)

Page 8: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Alpha Projects in Molecular Science

• Rationale• Molecular science computing is, for the most part,

workstation-based, and the uses of HPC are limited but critical:

• Long-time-scale, accurate simulations• Large scans over data collections, both O(N) and O(N2)• Global optimizations of structures, alignments, networks

• The requirements for technology support for all are significant

• Grid computing = metasystems• Movement of large amounts of data = data-intensive computing

Page 9: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Bioinformatics Infrastructure for Large-Scale Analyses

• Need to construct prototype analyses• Establish feasibility of doing analyses routinely • Debug infrastructure for supporting analyses• Provide templates for “copy-and-edit” duplication

Page 10: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Databases and Analyses

• PDB (SDSC, Stanford)• Linear scan searching for active sites• All-by-all comparisons for clustering

• Genbank (Washington U)• All-by-all comparison of sequences over set of alignment

parameters, followed by clustering• Linear scan through results to find new relations

• Molecular Dynamics Trajectory DB (U Houston)• Linear scan through time cuts of trajectory to look for

features of interest (e.g., form/unform active site)

Page 11: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Required Technologies

• Data-intensive Computing• Robust connection to computational grid (Legion)• Language for describing data schema to SRB• Strategies for moving large amounts of data to NPACI

CPUs

• Metasystems• Registration of key algorithms within Legion for platforms• Robust connection to large data stores (SRB)• Reusable scripts for running analyses

Page 12: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Bioinformatics Infrastructure for Large AnalysesGoal: Create reusable templates and demonstrate value

Protein Analysisin Legion

O(N)

PDB inSRB

GenBank in SRB

MDTDBin SRB

GeneArrayDB in SRB

FullScale Runs

of Algorithmson Databases

Critical Databases

Enabled forGrid Computing

1999 2000 2001 2002

Sequence Analysisin Legion

O(N2)

Phylogeny programsin Legion

O(N2)Templates for

large scale O(N)and O(N2) Analyses

Report & Evangelizeto ScientificCommunity

Page 13: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

FY00 Milestones

• Connect SRB data model to PDB schema (XML)• Connect SRB data model to Genbank (XML)• Register linear PDB algorithms in Legion• Register sequence algorithms for Genbank• Analyze scheduling challenges for linear scans

and all-vs.-all analyses• Run linear scans on PDB and all by all on subset

of Genbank

Page 14: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

FY01 Milestones

• Connect SRB model to MDTDB• Run full Genbank all-vs.-all analyses and

analysis of MD trajectories• Register phylogenetic algorithms with Legion• Optimize analyses with improved scheduling• Report results to computational science

community• Evangelize capabilities to computational science

community

Page 15: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Bioinformatics Infrastructure for Large AnalysesGoal: Create reusable templates and demonstrate value

Protein Analysisin Legion

O(N)

PDB inSRB

Genbank in SRB

MDTDBin SRB

GeneArrayDB in SRB

FullScale Runs

of Algorithmson Databases

Critical Databases

Enabled forGrid Computing

1999 2000 2001 2002

Sequence Analysisin Legion

O(N2)

Phylogeny programsin Legion

O(N2)Templates for

large scale O(N)and O(N2) Analyses

Report & Evangelizeto ScientificCommunity

Page 16: NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Benefits

• Novel science enabled• Comprehensive scans of 3-D structure for functional sites• Bird’s-eye understanding of sequence space• Improved understanding of protein dynamics• Most comprehensive phylogenetic trees ever constructed

• Capabilities made routine and widely available• Templates for experiments made available• Time, space estimates for computations for those making

allocation requests• In-house expertise at making these work