U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson...
-
Upload
carol-dora-price -
Category
Documents
-
view
218 -
download
0
description
Transcript of U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson...
U.S. Department of Energy’s Office of Science
Midrange Scientific Computing Requirements
Jefferson Lab
Robert [email protected] 21, 2008
Office of Science
U.S. Department of Energy
Jefferson Lab
Lab doubling beam energy Adding new experimental Hall
CD-3 JLab Receives DOE Approval to Start Construction of
$310 Million Upgrade
Office of Science
U.S. Department of Energy
Jefferson Lab
Science Goals Understanding structure, spectroscopy and
interactions of hadrons from QCD is the central challenge of nuclear physics
How are charge, current and spin distributed in the nucleon?
What are the effective degrees of freedom describing the low-energy spectrum of the theory?
How does the nucleon-nucleon and hadron-hadron interaction arise from QCD?
Office of Science
U.S. Department of Energy
Jefferson Lab
NP Milestones in Hadron Physics NP2009 and NP2012 : measurement &
determination of mass & electromagnetic properties of low lying baryons (hadrons)
New Hall D – seek information about exotic measurements. Flagship of JLab upgrade
NP2014 : determination of Generalized Parton Distributions within nucleons. These characterize how quarks interact with nucleons
Office of Science
U.S. Department of Energy
Jefferson Lab
Computing efforts in support of mission Experimental physics data acquisition,
storage, and analysis (farm computing) Lattice QCD theory calculations of
fundamental quantities
Office of Science
U.S. Department of Energy
Jefferson Lab
Theory context: USQCD Collaboration Consists of nearly all high energy and nuclear physicists
in the US involved in lattice QCD. Formed nine years ago to develop infrastructure for these studies
Research directly in support of DOE experimental facilities at BNL, FNAL, JLab
SciDAC I & II: software development for lattice QCD FY06-11: ~$2.0M/yr HEP+NP, $0.4M/yr JLab
USQCD Facilites: LQCD I – Midrange computing FY06-09: ~$2.5M/yr HEP+NP Access to USQCD dedicated hardware is allocated within
peer-reviewed process LQCD II: FY10-14 – proposal submitted & reviewed
Office of Science
U.S. Department of Energy
Jefferson Lab
Example: Spectroscopy Experimental and ab initio N* and Exotic-
meson programs aim at discovering effective degrees of freedom of QCD, NP2009 and NP2012 milestones Excited Baryon Analysis Center
(EBAC) at Jefferson Lab Spectroscopy of Exotic Mesons is a
flagship component of CEBAF@12GeV½+ 3/2+ 5/2+ ½- 3/2- 5/2-
550
1100
1650
2200
2750
3300
Nucleon spectrum (MeV)
Office of Science
U.S. Department of Energy
Jefferson Lab
Nature of the LQCD calculationsCalculations are performed in two steps: Monte Carlo methods are used to gauge
configurations with a probability proportional to their weight in the Feynman path integrals that define QCD
These configurations are stored, and used to calculate a wide variety of physical observables
Office of Science
U.S. Department of Energy
Jefferson Lab
Nature of the LQCD calculations (2) Leadership machines: gauge generation uses large
core count, single sequence of computing, requires multi TF sustained on one job
Midrange machines: analysis jobs typically smaller core count, each configuration an independent job, typically a few 100 GF sustained
Fine grained parallelism; regular hypercubic problems Computations and communication equally important;
low latency required
Office of Science
U.S. Department of Energy
Jefferson Lab
Leadership Machines USQCD aggressively pursuing national resources:
INCITE 07: ORNL: Cray XT4 - 10M-hrs (largest allocation) INCITE 08-10: yearly allocations, will increase
ORNL: Cray XT4 - 7M-hrs ANL: BG/P – 20M-hrs
ESP 08: ANL: BG/P - 250M-hrs [11 TF-yr] ESP 09: ORNL: XT-5 ~ 60M-hrs ??? [7 TF-yr]
Groups (JLab) also pursuing NSF + other resources 2007: 1 TF-yr (PSC+SDSC) 2008: 2 TF-yr (PSC+SDSC+TACC) 2008: 1 TF-yr (LANL/NNSA)
Office of Science
U.S. Department of Energy
Jefferson Lab
LQCD Computing Project Hardware (Midrange)
Year Computer Site Nodes Performance(TF/s)
2002 QCD FNAL 127 0.15
2004 4g JLab 384 0.36
2005 Pion FNAL 518 0.86
2005 QCDOC BNL 12288 4.20
2006 6n JLab 256 0.62
2006 Kaon FNAL 600 2.56
2007 7n JLab 396 2.98
2008 J/Psi FNAL 400 6.00
w/ OASCR
SciDAC I
LQCD I
Office of Science
U.S. Department of Energy
Jefferson Lab
Distribution of Jobs Leadership machines: few K to 128K cores Clusters: currently up to 1K cores. Will grow as lattice
sizes scale up on leadership machines
Office of Science
U.S. Department of Energy
Jefferson Lab
Future Hardware goals (BNL+FNAL+JLab) by fiscal year
Initial year based upon initial INCITE and leadership class awards. Moore’s law growth thereafter
Dedicated hardware is for all resources running that year; assumes clusters with 3.5 year life
Office of Science
U.S. Department of Energy
Jefferson Lab
Distribution of Resources (2010-2014)
Office of Science
U.S. Department of Energy
Jefferson Lab
LQCD II budget request
Office of Science
U.S. Department of Energy
Jefferson Lab
LQCD II review: Jan. 2008Charges to the collaboration by DOE: Why is a new project needed if OASCR is providing
access to Leadership Class machines? In particular, is dedicated hardware, such as additional clusters, essential and cost effective in such an environment? What is the optimal mix of machines, given realistic budget constraints?
What are the plans at FNAL, TJNAF, & BNL for LQCD computing? How are these plans incorporated into your plans for LQCD II ?
Office of Science
U.S. Department of Energy
Jefferson Lab
LQCD II review: Jan. 2008Review findings: The 1-1 mix of Leadership and clusters advocated by
LQCD II is the most suitable hardware mix for this project. The review committee advocates full finding for LQCD II
at the level described in their proposal. The scenario in which the funding for LQCD II would be flat with LQCD I would cause the project to miss opportunities that would otherwise enhance several other fields that the Office of Science supports, such as computer hardware and software, nuclear and astrophysics.
Office of Science
U.S. Department of Energy
Jefferson Lab
Characteristics of computing: Dominant part of calculation: large sparse matrix system
solve; application of matrix on vector Regular grid per compute core: moving to hybrid threaded
+ MPI model. SciDAC II software
development effort
Office of Science
U.S. Department of Energy
Jefferson Lab
Opportunities for Optimization Clusters configured for a single application (LQCD) can be
better optimized than those serving a dozen applications Memory and disk lean Pruned fat tree network (e.g. Infiniband) due to highly
local communications pattern Lower aggregate bandwidth to disk compared to check-
point intensive simulations Overall impact compared to generic clusters: 50% more
computing capacity/dollar 2x-5x more cost effective for analysis jobs than leadership
class machines
1990 2000 2010
Mflops / $
101
10-1
100
QCDSP
Performance/$ for LQCD Applications
• Commodity compute nodes (leverage marketplace & Moore’s law)• Low latency, high bandwidth network to exploit full I/O capability
10-2
Supercomputers, leadership class
machines
JLab SciDAC Prototype Clusters
QCDOC
20022003
2004
2008/9 cluster at FNAL
Japanese Earth Simulator
JLab clusters
BlueGene/L
BlueGene/P
Office of Science
U.S. Department of Energy
Jefferson Lab
NP Approved MissionsData analysis and support for 12 GeV:
Why: Needed to support the current 6 GeV and future 12 GeV programs (detector simulation)
How it’s done today: small cluster / farm, plus all necessary infrastructure (tape library, cache disk, …)
Status: Constrained by tight budgets, but currently keeping up with requirements (barely)
Barriers: Future: multi-threading to prevent memory requirement blow-up on many-core architectures
Special Features: integer intensive and i/o intensive
Office of Science
U.S. Department of Energy
Jefferson Lab
NP Approved MissionsLattice QCD:
Why: Theory calculations in support of JLab experimental physics program
How it’s done today: NSF+INCITE computers + midrange computing within USQCD
Status: need more runs of larger size (more cores), plus more statistics (longer runs)
Barriers: multi-threading to reduce comms. Need LQCD II.
Special Features: floating point intensive + balanced communications – fine grained parallelism
Office of Science
U.S. Department of Energy
Jefferson Lab
NP Proposed Missions or Initiatives
No new proposed missions Currently fulfilling approved missions, with
funding requested to continue those approved missions