Application Driven Design for a Large-Scale, Multi-Purpose Grid Infrastructure Mary Fran Yafchak,...

Post on 26-Dec-2015

213 views 0 download

Transcript of Application Driven Design for a Large-Scale, Multi-Purpose Grid Infrastructure Mary Fran Yafchak,...

Application Driven Design for a Large-Scale, Multi-Purpose Grid Infrastructure

Mary Fran Yafchak, maryfran@sura.orgSURA IT Program Coordinator, SURAgrid project manager

SURA Mission: foster excellence in scientific research strengthen the scientific and technical capabilities of the nation and of the Southeast provide outstanding training opportunities for the next generation of scientists and engineers

About SURA 501(c)3 consortium of research universities Major programs in:

– Nuclear Physics (“JLab”, www.jlab.org)– Coastal Science (SCOOP, scoop.sura.org)– Information Technology

• Network infrastructure• SURAgrid• Education & Outreach

http://www.sura.org

SURA Mission: foster excellence in scientific research strengthen the scientific and technical capabilities of the nation and of the Southeast provide outstanding training opportunities for the next generation of scientists and engineers

About SURA 501(c)3 consortium of research universities Major programs in:

– Nuclear Physics (“JLab”, www.jlab.org)– Coastal Science (SCOOP, scoop.sura.org)– Information Technology

• Network infrastructure• SURAgrid• Education & Outreach

http://www.sura.org

Scope of the SURA region 62 diverse member institutions Geographically - 16 states plus DC Perspective extends beyond the membership

– Broader education community• Non-SURA higher ed, Minority Serving Institutions, K-12

– Economic Development• Regional network development• Technology transfer• Collaboration with Southern Governors’ Association

About SURAgrid A open initiative in support of regional strategy and

infrastructure development– Applications of regional impact are key drivers

Designed to support a wide variety of applications – “Big science” but “smaller science” O.K. too!– Applications beyond those typically expected on grids

• Instructional use, student exposure• Open to what new user communities will bring

– On-ramp to national HPC & CI facilities (e.g., Teragrid) Not as easy as building a community or project-

specific grid but needs to be done…

About SURAgrid

Broad view of grid infrastructure Facilitate seamless sharing of resources within a

campus, across related campuses and between different institutions– Integrate with other enterprise-wide middleware – Integrate heterogeneous platforms and resources– Explore grid-to-grid integration

Support range of user groups with varying application needs and levels of grid expertise– Participants include IT developers & support staff,

computer scientists, domain scientists

SURAgrid Goals To develop scalable infrastructure that leverages

local institutional identity and authorization while managing access to shared resources

To promote the use of this infrastructure for the broad research and education community

To provide a forum for participants to share experience with grid technology, and participate in collaborative project development

SURAgrid Vision

SURA regional development:• Develop & manage partnership relations• Facilitate collaborative project development• Orchestrate centralized services & support• Foster and catalyze application development• Develop training & education (user, admin)• Other…(Community-driven, over time…)

Project-Specific View

SURAgrid Resources

SURAgrid Resourcesand Applications

“MySURAgrid” View

Sample User Portals

Industry PartnerCoop Resources

(e.g. IBM partnership)

InstitutionalResources

(e.g. Current participants)

Gateways to National Cyberinfrastructure

(e.g. Teragrid)

Other externallyfunded resources

(e.g. group proposals)

VO or ProjectResources

(e.g, SCOOP)

Heterogeneous Environment to Meet Diverse User Needs

SURAgrid

Project-specific tools

GSU

UAB

USC

ULL

TAMU UFL

MCSR

Tulane

UArk

TTUSC

LSU

GPNODU

UNCC

NCState

Vanderbilt

UAH

UMich UKY

UVA

TACC

GMU

SURAgrid Participants(As of November 2006)

= SURA Member = Resources on the grid

UMD

Bowie State

LATechKennesaw

State

Clemson

Major Areas of Activity Grid-Building (gridportal.sura.org)

– Themes: heterogeneity, flexibility, interoperability Access Management

– Themes: local autonomy, scalability, leveraging enterprise infrastructure

Application Discovery & Deployment– Themes: broadly useful, inclusive beyond typical users and

uses,promoting collaborative work Outreach & Community

– Themes: sharing experience, incubator for new ideas, fostering scientific & corporate partnerships

Major Areas of Activity Grid-Building (gridportal.sura.org)

– Themes: heterogeneity, flexibility, interoperability Access Management

– Themes: local autonomy, scalability, leveraging enterprise infrastructure

Application Discovery & Deployment– Themes: broadly useful, inclusive beyond typical users and

uses,promoting collaborative work Outreach & Community

– Themes: sharing experience, incubator for new ideas, fostering scientific & corporate partnerships

SURAgrid Application Strategy Provide immediate benefit to applications while

applications drive infrastructure development Leverage initial application set to illustrate benefits

and refine deployment Increase quantity and diversity of both applications

and users Develop processes for scalable, efficient deployment;

assist in “grid-enabling” applications

Efforts significantly bolstered through NSF award: “Creating a Catalyst Application Set for the Development of Large-Scale Multi-purpose Grid Infrastructure” (NSF-OCI-054555)

Creating a Catalyst Application Set

Discovery Ongoing methods: meetings, conferences, word of mouth Formal survey of SURA members to supplement methods

Evaluation Develop criteria to help prioritize and direct deployment efforts Determine readiness to deploy and tools/assistance required

Implementation Exercise and evolve existing deployment & support processes

in response to lessons learned Document and disseminate lessons learned Explore means to assist in grid-enabling applications

Some Application Close-ups

In SURAgrid demo area today: GSU: Multiple Genome Alignment on the Grid

– Demo’d by Victor Bolet, Art Vandenberg UAB: Dynamic BLAST

– Demo’d by: Enis Afgan, John-Paul Robinson ODU: Bioelectric Simulator for Whole Body Tissues

– Demo’d by Mahantesh Halappanavar NCState: Simulation-Optimization for Threat

Management in Urban Water Systems– Demo’d by Sarat Sreepathi

UNC: Storm Surge Modeling with ADCIRC– Demo’d by Howard Lander

GSU Multiple Genome Alignment

Sequence Alignment Problem Used to determine biological meaningful relationship among

organisms– Evolutionary information– Diseases, causes and cures– Information about a new protein

Especially compute intensive for long sequences– Needleman and Wunsch (1970) - optimal global alignment– Smith and Waterman (1981) - optimal local alignment– Taylor (1987) - multiple sequence alignment by pairwise alignment– BLAST trades off optimal results for faster computation

Examples of Genome AlignmentAlignment 1

Sequence X A T A – A G TSequence Y A T G C A G T

Score 1 1 -1 -2 1 1 1 Total Score = 2

Alignment 2Sequence X A T A A G TSequence Y A T G C A G T

Score 1 1 -1 -1 -1 -1 -1 Total Score = -3

Based on pairwise algorithm– Similarity Matrix, SM, built to compare all sequence positions– Observation that many “alignment scores” are zero value

SM reduced by storing only non-zero elements– Row-column information stored along with value– Block of memory dynamically allocated as non-zero element found– Data structure used to access allocated blocks

Parallelism introduced to reduce computation

Ahmed, N, Pan, Y, Vandenberg, A and Sun, Y, "Parallel Algorithm for Multiple Genome Alignment on the Grid Environment," 6th Intl Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-05) in conjunction with (IPDPS-2005) April 4-8, 2005.

Align Sequence X: TGATGGAGGT Sequence Y: GATAGG

1 = matching; 0 = non-matching ss = substitution score; gp = gap score Generate SM max score with respect to neighbors:

Similarity Matrix Generation

Back trace matrix to find sequence matches

Trace sequences

Parallel distribution of multiple sequences

Sequences 1-6

Sequences 7-12

Seq 1-2 Seq 5-6Seq 3-4

Convergence - collaboration Algorithm implementation

– Nova Ahmed, Masters CS (now PhD student GT)– Dr. Yi Pan, Chair Computer Science

NMI Integration Testbed program– Georgia State, Art Vandenberg, Victor Bolet, Chao

“Bill” Xie, Dharam Damani, et al.– University of Alabama at Birmingham, John-Paul

Robinson, Pravin Joshi, Jill Gemmill,

SURAgrid– Looking for applications to demonstrate value

Algorithm Validation: Shared Memory

Performance Validates AlgorithmComputation time decreases with increased number of processors

2 4 6 8 10 12

Computation Time(Shared Memory)

0

100

200

300

400

500

Computation

Time

Number of Processors

Computation Time(Shared Memory)

SGI Origin 200024 250MHz R10000; 4GLimitations Memory

(Max sequence is2000 x 2000)

Processors(Policy limits studentto 12 processors)

Not scalable

Shared Memory vs. Cluster, Grid Cluster*

* NB: Comparing clusters with shared memory is, of course, relative; systems are distinctly different.

2

8

14

20

26

0

100

200

300

400

500

Computation

Time (seconds)

Number of Processors

Genome length 3000(Grid)

Genome length 3000(Cluster)

Genome length 3000( Shared Memory)

UAB cluster: 8 node Beowulf (550MHz Pentium III; 512 MB RAM) Clusters retain algorithm improvement

Grid (Globus, MPICH-G2) overhead negligible

Advantages of grid-enabled cluster: Scalable – Can add new cluster nodes to the grid Easier job submission – Don’t need account on every node Scheduling is easier – Can submit multiple jobs at one time

2

8

14

20

26

0

200

400

600

Computation

Time (seconds)

Number of Processors

Genome length 10000 (Grid)

Genome length 10000( Cluster)

Computation Time Speed up (1 cpu / N cpu)

9 processors available in Multi Clustered Grid32 processors for other configs.

0

100

200

300

400

500

0 5 10 15 20 25 30

Number of processors

Computation time (sec)

Single Cluster

Single Clustered

Grid

Multi Clustered

Grid

0

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30

Number of Processors

Speed up

Single Cluster

Single Clustered

Grid

Multi Clustered Grid

Interesting: When multiple clusters used (application spanned three separate clusters), performance improved additionally?!

Grid tools used Globus Toolkit - built on the Open Grid Services

Architecture (OGSA) Nexus - Communication library, allows multi-method

communication with a single API for a wide range of protocols. Using Nexus, Message Passing Interface MPICH-G2 used in the experiments.

Resource specification language (RSL) - job submission and execution (globus-job-submit, globus-job-run) and status (globus-job-status)

The Grid-enabling Story Iterative, evolutionary, collaborative

– 1st ssh to resource and get code working– 2nd submit from local account to remote globus machine– 3rd run from SURAgrid portal

SURAgrid infrastructure components providing improved work-flow

Integration with campus components enables more seamless access

Overall structure can be used as model for campus research infrastructure:– Integrated authentication/authorization– Portals for applications– Grid administration/configuration support

SURAgrid Portal

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

SURAgrid MyProxy service

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Get Proxy

MyProxy… secure grid credential

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GSI proxy credentials are loaded into your account…

SURAgrid Portal file transfer

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Job submission via Portal

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Output retrieved

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

SURAgrid Account Managementlist myusers

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

SURAgrid Account Management

add user

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Multiple Genome Alignment & SURAgrid

Collaborative cooperation Convergence of opportunity Application / Infrastructure drivers interact Emergent applications:

– Cosmic ray simulation (Dr. Xiaochun He)– Classification/clustering (Dr. Vijay Vaishnavi, Art Vandenberg)– Muon detector grid (Dr. Xiaochun He)– Neuron (Dr. Paul Katz, Dr. Robert Calin-Jageman, Chao “Bil”l Xie)– AnimatLab (Dr. Don Edwards, Dr. Ying Zhu, David Cofer, James Reid)

IBM System p5 575 with Power5+ Processors

BioSim: Bio-electric Simulator for Whole Body Tissues

Numerical simulations for electrostimulation of tissues and whole-body biomodels

Predicts spatial and time dependent currents and voltages in part or whole-body biomodels

Numerous diagnostic and therapeutic applications, e.g., neurogenesis, cancer treatment, etc.

Fast parallelized computational approach

Simulation Models From electrical standpoint, tissues are characterized

as conductivities and permittivities Whole-body discretized within a cubic space

simulation volume Cartesian grid of points along the three axes. Thus, at

most a total of six nearest neighbors

* Dimensions in millimeters

Numerical Models

Kirchhoff’s node analysis

Recast to compute matrix only once

For large models, matrix inversion is intractable

LU decomposition of the matrix

∑ =Δ+Δ 0)]/}({/}{)/[( LAVdtVdLA σε

)]([]||][[ tBVVM tdtt =Δ−Δ +

Numerical Models

Voltage: User-specified time-dependent waveform

Impose boundary conditions locally

Actual data for conductivity and permittivity

Results in extremely sparse (asymmetric) matrix

Red: Total elements in the matrix

Blue: Nonzero Values

[M]

The Landscape of Sparse Ax=b Solvers

Pivoting

LU

GMRES, QMR, …

Cholesky

Conjugate gradient

DirectA = LU

Iterativey’ = Ay

Non-symmetric

Symmetricpositivedefinite

More Robust Less Storage

More Robust

More General

Source: John Gilbert, Sparse Matrix Days in MIT 18.337

LU Decomposition

Source: Florin Dobrian

LU Decomposition

Source: Florin Dobrian

Computational Complexity 100 X 100 X 10 nodes: ~75 GB of memory (8-B floating precision) Sparse data structure: ~ 6 MB (in our case) Sparse direct solver: SuperLU-DIST

– Xiaoye S. Li and James W. Dimmel, “SuperLU-DIST: A Scalable Distributed-Memory Sparse Direct Solver for Unsymmetric Linear Systems”, ACM Trans. Mathematical Software, June 2003, Volume 29, Number 2, Pages 110-140.

Fill reducing orderings with Metis– G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for

partitioning irregular graphs”, SIAM Journal on Scientific Computing, 1999, Volume 20, Number 1.

Performance on compute clusters

144,000-node Rat Model

Blue: Average iteration time

Cyan: Factorization time

Tim

e in Seconds

Output: Visualization with MATLAB

Potential Profile at a depth of 12mm

Output: Visualization with MATLAB

Simulated Potential Evolution Along the Entire 51-mm Width of the Rat Model

Deployment on Mileva: 4-node cluster dedicated for SURAgrid purposes Authentication

– ODU Root CA

– Cross certification with SURA Bridge

– Compatibility of accounts for ODU users Authorization & Accounting Initial Goals:

– Develop larger whole-body models with greater resolution

– Scalability tests

Grid Workflow

Establish user accounts for ODU users – SURAgrid Central User Authentication and

Authorization System– Off-line/Customized (e.g., USC)

Manually launch jobs based on remote resource – SSH/GSISSH/SURAgrid Portal– PBS/LSF/SGE

Transfer files – SCP/GSISCP/SURAgrid Portal

Conclusions

Science:– Electrostimulation has variety of diagnostic and

therapeutic applications– While numerical simulations provide many advantages

over real experiments, they can be very arduous

Grid enabling:– New possibilities with grid computing– Grid-enabling an application is complex and time

consuming– Security is nontrivial

Future Steps

Grid-enabling BioSim– Explore alternatives for grid enabling BioSim– Establish new collaborations – Scalability experiments with large compute clusters

accessible via SURAgrid Future applications:

– Molecular and Cellular Dynamics– Computational Nano-Electronics– Tools: Gromacs, DL-POLY, LAMMPS

References and Contacts A Mishra, R Joshi, K Schoenbach and C Clark, “A Fast

Parallelized Computational Approach Based on Sparse LU Factorization for Predictions of Spatial and Time-Dependent Currents and Voltages in Full-Body Biomodels”, IEEE Trans. Plasma Science, August 2006, Volume 34, Number 4.

http://www.lions.odu.edu/~rjoshi/ Ravindra Joshi, Ashutosh Mishra, Mike Sachon, Mahantesh

Halappanavar– (rjoshi, amishra, msachon, mhalappa)@odu.edu

UAB and SURAgrid

Background

UAB's approach to grid deployment

Specific Testbed Application

UAB and SURAgrid Background The University of Alabama at Birmingham (UAB) is

leading medical research campus in the South East UAB was a participant in the National Science

foundation Middleware Initiative Test-bed (NMI Test-bed) in 2002-2004

Evaluation of grid technologies in NMI Test-bed sparked effort to build a campus grid: UABgrid

Collaboration with NMI Test-bed team members leveraged synergistic interest in grid deployment

Managing Grid Middleware at UAB Over 10 years experience operating an

authoritative campus identity management system

All members of UAB community have a self-selected network identity that is used as an authoritative identity for human resource records, desktop systems, and many other systems on campus

Desire to leverage the same identity infrastructure for grid systems

UABgrid Identity Management

Leverage Pubcookie as WebSSO system for web-based interface to authoritative, LDAP-based identity management system

Create UABgrid CA to transparently assign grid credentials using web based interface

Establish access to UABgrid resources based on UABgrid PKI identity

Establish access to SURAgrid resources base on cross certification of UABgrid CA with SURAgrid Bridge CA

UABgrid Workflow

Establish grid PKI credentials for UAB users – UAB Campus Identity Management System– UABgrid CA cross certified with SURAgrid Bridge CA

Launch jobs via application specific web interface – User authenticates via campus WebSSO– Transparent initialization of grid credentials– Submits job request

Collects results via application's web interface

Engaging Campus Researchers

Empower campus research applications with grid technologies

Hide details of grid middleware operation Select an application specific to the needs

campus researches

BLAST

Basic Local Alignment Search Tool (BLAST) is an established sequence analysis tool that performs similarity searches by calculating statistical significance between a short query sequence and a large database of infrequently changing information such as DNA and amino acid sequences

Used by numerous scientists in gene discovery, categorization, and simulation.

BLAST algorithms

With the rapid development in sequencing technology of large genomes for several species, the sequence databases have been growing at exponential rates

Parallel techniques for speeding up BLAST searches:– Database splitting BLAST (mpiBLAST,

TurboBLAST, psiBLAST)– Query splitting BLAST (SS-Wrapper)

Goals of Grid Applications

Use distributed, dynamic, heterogeneous and unreliable resources without direct user intervention

Adapt to environment changes

Provide high level of usability while at least maintaining performance levels

Dynamic BLAST goals

Complement BLAST with the power of the grid

Master-worker application allowing for dynamic load balancing and variable resource availability based on Grid middleware

Focuses on using small, distributed, readily available resources (cycle harvesting)

Works with existing BLAST algorithm(s)

Dynamic BLAST architecture

MySQLPerl scripts

Java

GT4GT4

SGESGE

BLASTBLAST

Dynamic BLASTDynamic BLAST

GT4GT4

PBSPBS

BLASTBLAST

GT4GT4

N/AN/A

BLASTBLAST

Web interface

GridDeploy GridWayGridDeploy GridWay

Dynamic BLAST workflow

Data analysisData analysis

File Parsing and Fragment creation

File Parsing and Fragment creation

Submit jobSubmit job

Post-processingPost-processing

Jobs

Job MonitorJob Monitor

GJ

GJ

GJ

Dynamic BLAST performance

0 20 40 60 80 100 120 140 160

Dynamic BLAST

mpiBLAST

QS-BLAST

Time (in seconds)

• Comparison of execution time between query splitting BLAST,Dynamic BLAST and mpiBLAST for searching 1,000 queries against yeast.nt database

• Comparison of execution time of Dynamic BLAST on variable number of nodes for 1,000 queries against yeast.nt database

020406080

100120140

8 nodes 16 nodes 26 nodes

Time (in seconds)

Deployment on SURAgrid Incorporate ODU resource Mileva: 4-node cluster

dedicated for SURAgrid purposes Local UAB BLAST web interface

– Leverage existing UAB Identity Infrastructure– Cross certification with SURA Bridge CA

Authorization on a per-user basis Initial Goals:

– Solidify the execution requirements of Dynamic BLAST– Perform scalability tests– Engage researchers further in the promise of grid computing

SURAgrid Experience

Combine Local needs with regional needs No Need to Build Campus Grid in Isolation Synergistic Collaborations Leads to Solid Architectural Foundations Leads to Solid Documentation Extends Practical Experience for UAB

Students

Next Steps

Improve availability of web interface to non-UAB users– Leverage Shibboleth and GridShib to federate web interface– Conversation begun on sharing technology foundation with

SURAgrid

Extend the number of resources running BLAST– ODU integration helps document steps and is model for

additional sites

Explore other applications relevant to UAB research community

References and Contacts

Purushotham Bangalore, Computer and Information Sciences, UAB puri@uab.edu

Enis Afgan, Computer and Information Sciences, UAB afgane@uab.edu

John-Paul Robinson, HPC Services, UAB IT jpr@uab.edu

http://uabgrid.uab.edu/dynamicblast

SURAgrid Application Demos

Stop by the demo floor for a closer look at these applications and more! Application demo team: Mahantesh Halappanavar (ODU) Sarat Sreepathi (NCSU) Purushotham Bangalore (UAB) Enis Afgan (UAB) John-Paul Robinson (UAB) Howard Lander (RENCI) Victor Bolet (GSU) Art Vandenberg (GSU) Kate Barzee (SURA) Mary Fran Yafchak (SURA)

Questions or comments?

For more information, or to join SURAgrid: http://www.sura.org/SURAgrid maryfran@sura.org