TERASCALE LINUX CLUSTERS: SUPERCOMPUTING SOLUTIONS FOR LIFE SCIENCES MARCH 27,2003 Dr. Erwin Frise,...

TERASCALE LINUX CLUSTERS:

SUPERCOMPUTING SOLUTIONS FOR LIFE SCIENCES

MARCH 27,2003

Dr. Erwin Frise, Berkeley National LaboratoryDr. Padmanabhan Iyer, Linux Networx

“It took millions of times more computing power to map the human genome than it did to land a man on the moon, and that’s only a fraction of what’s needed right now. The biggest challenge in biology is going to be computing.”

Dr. J. Craig VenterChairman of the Board

The Institute for Genomic Research Dec. 2001

The Need For Computing

Clusters : What and Why?

What? Collection of computers networked together to

perform a particular application in parallel

Why? Clusters can be built to offer performance to match

the needs of virtually any application (scalability) Clusters can be built out of commodity,

components (cost effectiveness)

Role of Your Cluster Vendor

Expertise with cluster management Allow you to concentrate on the

complexities of your models – not your computer system

Expertise with turnkey solutions Open Source distribution Proven compatibility with life sciences

applications

One-stop support

GFLOPS to TFLOPS … in 8 Years!

Wiglaf Cluster

1994

COTS

Hrothgar Cluster

1 GigaFLOPS1 Billion

Floating-Point Operations per Second

1997 2001

10 GigaFLOPSJoining 2 Clusters

11 TeraFLOPS

250 GigaFLOPS

1st Commercial Linux Clusters Linux NetworX Brookhaven NL Los Alamos NL

19981995 1996 1999 2000 2002

1.5 TeraFLOPSTrillions of

Floating-Point Operations per Second

Alta Cluster

Lawrence Livermore NL PCR Cluster

LNXI Evolocity™ Clusters

LLNL MCR Cluster

Beowulf Class Clusters NASA Goddard Space Flight Center

Numbers Overwhelm Size

Trend to Clustered Supercomputers

Clusters are ideally suited for many life science applications:

High throughput screening runs involving multiple, repetitive sequence comparisons on vast amounts of data.

Data or query can be partitioned and dispatched to n nodes of a cluster to gain direct n-fold speedup and gain in throughput.

By employing standard parallelization tools such as MPI, even the most compute-intensive fine-grain applications (e.g., molecular chemistry) can be deployed on a cluster equipped with a high-performance interconnect fabric.

Clusters in the Life Sciences

Tularik, Incorporated San Francisco, California

Case Study

Because the mouse genome is very close to human, the effect of a specific drug agent can be tested on the mouse and pre- and post-treatment genome comparisons will help guage the effectiveness of therapeutic agents in regulating specific expression

- A compute-intensive task, but one that is easily parallelizable and highly suited for the cluster computing model

Situation Existing infrastructure would have taken 38

years to perform the 22 million genomic sequence comparisons

Solution Linux NetworX cluster of 150 Pentium-3 nodes

with ICEBox and ClusterWorX management

Results Study completed in 34 days – 450x acceleration

Enabled new business opportunities

Gave new competitive advantage


Case Study—Business Case

“Cluster management tools from Linux NetworX are setting the standard …. Without cluster management tools, we would be spending five times as long managing the cluster…. ICE tools allow us to concentrate on finding new genes that cause disease and not worry about cluster management”

- Gene Cutler Tularik, Inc.


Case Study— Business Success

Case Study— BDGP

Berkeley Drosophila Genome Project

(BDGP)

Building and using a cluster in a major genome center

BDGP Cluster

Annotation of the Drosophila genome

Research Transposable elements Aligning several Drosophila species RNA folding and predictions Hunt for microRNAs

Public blast server

BDGP Cluster: Annotation

Chris Mungall, ShengQiang Shu, Suzanna Lewis and many curators

Pipeline and GadFly database

PBS

Apollo

BOP

Sequence

Sequence FinishingGadflyPipeline

FTP directory

computefarm

BDGP Cluster: Annotation

Sim4

Sim4wrapBLASTN Sim4

Gadfly

BOP

Repeat-maskerBLASTX

TBLASTX GenscanGenie

tRNAscan-SE

Category Source

Release 2 transcripts Celera

ESTs BDGP plus dbEST

Complete cDNAs BDGP

Annotated reference Genome sequences (ARGS)

Genbank

Insertion flanking BDGP

Non-coding RNA FlyBase

Consensus sequences SWISS-PROT

Transposable elements BDGP

other invertebrates

SPTR primates

SPTR rodents

SPTR other vertebrates

SPTR C. elegans

SPTR S. cerevisiae

SPTR

plants SPTR

M. musculus UniGene

Insects dbEST

Drosophila melanogaster specific sequences

Peptide sequences from other species

Other species EST sequences

Repeat-maskerBLASTX

BDGP Cluster: Transposons

Josh Kaminker Finding repetitive regions Divide the genome into overlapping pieces BLAST everything vs. everything

(~1 millon blastn) Check for “transposon” features

BDGP Cluster: Transposons

BDGP Cluster: GOst

Brad Marshall & GO consortium GO = Gene Onthology GO blast Public service for finding blast matches to

GO annotated sequences Blast jobs submitted to the BDGP cluster http://godatabase.org

BDGP Cluster

2000: 20 nodes (dual 700MHz PIII) 2001: 32 nodes (added 12 dual 1GHz PIII) 2002: 52 nodes (added 20 dual Athlon 1800XP) approximately 8-10 millon jobs since 2000 GadFly Release 3 entirely based on it Drosophila Transposons GOst & future public blast/pipeline server Lots of future research

Talk Topics

“Beowulf” concept:Combine commodity hardware to create a supercomputer.Focus on long running large jobs which a split evenly among the nodes and on total system performance.

Emphasis on modifications of traditional Beowulf cluster for the requirements of genome science/bioinformatics

Bioinformatics Applications

Large computer splitable jobcomputer can split the job into equal segmentse.g. Data clustering, MEME, BLAST?= Traditional Beowulf

Large jobs with prior knowledge where to split“pipelines”: e.g BLAST, tRNAscan, Interpro,...

Genome wide/large scale application of programs not written for clusters: most programs

User submitted “small” jobse.g individual (batch) BLAST, gene-finding


Traditional “Beowulf” jobsMandelbrot set Sequence alignment Painting eggsMandelbrot set Sequence alignment Painting eggs


Large number of small jobsPipeline Painting eggs in different colors

Blast/Genscan

Repetitive elements

Bioinformatics Application Requirements

Compute farm for: Small jobs Large jobs Single/multithreaded jobs (blast) MPI/PVM jobs (Clustering, MEME)

Usually lots of files/databases

Cluster Building Problems

Hardware building (Linux Networx) Integration into infrastructure Running the programs on the cluster

Teaching users/programmers Adjusting the software

e.g. Creating a cluster aware pipeline, scripts submitting jobs asynchronously

BDGP Cluster Infrastructure

Head node (bam, “bag of marbles”) Gateway to the cluster Runs maintainance and monitoring software Job distribution and control

Nodes (52 nodes) Separate network (security), NIS replication from BDGP

network Dual CPU, local harddisk with large swap partition

Network interconnect 100BT Ethernet with Gigabit Ethernet uplink to NAS storage

StorageDedicated NAS storage (NetApp and OpenNAS)

BDGP Cluster Structure

NAS NASNASNAS

BamPublic network Cluster network

NAS

100BT

Gigabit

Running programs on the cluster

“Making the cluster available to the users”

On a cluster, programs (jobs) don't run efficiently in an interactive manner.

Using a queuing system to Run jobs in an organized manner Handle the requirements of users and their

bioinformatics tasks (large number of varying jobs)

Keep the nodes optimally loaded

Queuing Systemstblastx db_A a1

tblastx db_B b2

tblastx db_B b1

tblastx db_A a3

tblastx db_A a2

tblastx db_A a5

tblastx db_A a6

tblastx db_B b3

tblastx db_A a4

Job 6

Job 2

Job 5

Job 4

Job 3

Job 1

N1 N8N7N6N5N2 N3 N4

Server(headnode)

Research Pipeline

Queuing Systems

LSF

Sun Gridengine

PBSPro (Veridian) and OpenPBS at BDGP Originally/still now most frequently used system Very stable for production system Cross-platform Not too robust to node failures (OpenPBS) Open source but no central repository (OpenPBS) PBSPro free to academia only Not easy to configure/get running Job submission hard for non-techies Does not scale very well to many jobs in queue

OpenPBS Implementation at BDGP

Several fifo queues with queue policies (priority) MPI Pipeline Research

Overlapping node allocations for each queue

Queues for 1 CPU: 2 jobs per node 2 CPU: 1 mutithreaded job per node MPI jobs: job distributes itself over serveral

nodes

OpenPBS Problems

Sensitive to node failure because of TCP timeouts

Does not scale well beyond 10,000 jobs Job submission

OpenPBS Node Failure Patches

Modifications by E.F. to increase scheduler robustness

Set node “offline” when down CPLANT patches

http://www.cs.sandia.gov/cplant/doc/pbs/pbs.html

OpenPBS Scalability Patch

Written by E.F. for the BDGP pipeline Patched PBS scheduler

Everything is cached Very fast in running lots of small jobs even

when queues are full scales to >100,000 jobs in queue with single

650 Mhz PIII with 768MB, lots more with more current machine

Open source

OpenPBS Job Submission

PBS tools qsub/qstat Awkward to end user; requires shell script Most options Possibility of timeouts

pbsrsh/pbsquery Based on C++ object framework encapsuling

the PBS libraries rsh-like tool for easy job submission in the

command line: pbsrsh -a “blastn mydb myseq” Future: pipe jobs into pbsrsh to avoid timeouts

OpenPBS MPI

Designated mpi queue LAM 6.5.4 (http://www.lam-mpi.org) mpipbs Bourne shell script

Mpipbs mpirun -p __n mympi_program

Reads node information from PBS Creates appropriate LAM configuration Starts LAM on PBS allocated nodes Starts program with the appropriate processor

number Stops LAM

BDGP Cluster References

BDGP: http://www.fruitfly.org Presentation & BDGP cluster administration

software:http://www.fruitfly.org/~efrise/cluster.html

FlyBase: http://flybase.org GadFly3:

http://www.fruitfly.org/cgi-bin/annot/query Pipeline, Annotation and Transposon work:

GenomeBiology 2002, 3(12)http://genomebiology.com/drosophila/

Acknowledgements

Gerry Rubin (Director BDGP and VP HHMI) Suzanna Lewis (Director informatics) Sue Celniker (Director genomics) Eric Smith (Co-administrator) Chris Mungall (Pipeline and Gadfly) Josh Kaminker (Transposon research) Brad Marshall (Gost)

Contact Information

Erwin Frise

Berkeley Drosophila Genome Project

Lawrence Berkeley National Labs

One Cyclotron Road, MS64-121

Berkeley, CA 94720

Tel. (510) 486-7251

[email protected]

http://www.fruitfly.org/~efrise/cluster.html

TERASCALE LINUX CLUSTERS:

SUPERCOMPUTING SOLUTIONS FOR LIFE SCIENCES

MARCH 27,2003

Dr. Erwin Frise, Berkeley National LaboratoryDr. Padmanabhan Iyer, Linux Networx

TERASCALE LINUX CLUSTERS: SUPERCOMPUTING SOLUTIONS FOR LIFE SCIENCES MARCH 27,2003 Dr. Erwin Frise,...

Documents

Transcript of TERASCALE LINUX CLUSTERS: SUPERCOMPUTING SOLUTIONS FOR LIFE SCIENCES MARCH 27,2003 Dr. Erwin Frise,...