Scientific Computing at Fermilab
description
Transcript of Scientific Computing at Fermilab
Scientific Computing at Fermilab
Our “Mission”• Provide computing, software tools and expertise to all
parts of the Fermilab scientific program including theory simulations (e.g. Lattice QCD and Cosmology), and accelerator modeling
• Work closely with each scientific program – as collaborators (where a scientist/staff from SCD is involved) and as valued customers.
• Create a coherent Scientific Computing program from the many parts and many funding sources – encouraging sharing of facilities, common approaches and re-use of software wherever possible
• Work closely with CCD as part of an overall coherent program
Scientific Computing - Fermilab S&T Review, Sept 5 20122
A Few points
• We are ~160 strong made up of almost entirely technically trained staff
• 26 Scientists in the Division• As the lab changes its mission, scientific
computing is having to adapt to this new and more challenging landscape.
• Scientific Computing is a very “matrixed” organization. I will not try to cover all we do but pick and choose things that are on my mind right now…
3
Scientific Discovery – the reason we are here
• The computing capability needed for scientific discovery is bounded only by human imagination
• Next Generation of Scientific Breakthroughs Require major new advances in computing
technology Energy-efficient hardware, algorithms, applications
and systems sofware Data “Explosion” – Big Data is here
Observational, sensor networks, and simulation Computing/Data Throughput challenges
4
About to Experience a Paradigm Shift in Computing
• For the last decade – GRID and computing resources has been very stable
• However…. End of Moore’s Law is looming New Computing technologies on the near horizon
Phase change memories Stacked dies
Exponential grown in parallelism in HPC IBM Blue Gene leading the charge Heterogeneous systems delivering higher
performance/watt (Titan) Power is a constraint Programmability…
5
Computing Landscape will change..
• HEP is going to have to adapt to this changing world
• While the future for the next few years is clear, we don’t really know where we will be in the next decade or 20 years
• Ultimately market forces will determine the future
• We need to turn this into a positive force for both High Energy Physics and High Performance computing.
6
Think Back on your Computing Careers…
7
And Today….
8
Ask Yourself… Has Computing Gotten any Easier in the last 30 years?
Lattice b_c machine…
Starting to Make the Bridge to the FutureNew Funding Initiatives • COMPASS Scidac Project (3 year) $2.2M/year• US Lattice QCD Project (5 year) ~$5M/year• Geant4 Parallelization -- joint with ASCR (2 year)
$1M/year• CMS on HPC machines (1 year) $150k• PDACS – Galaxy Simulation Portal – joint with Argonne (1
year) $250k• Science Framework for DES (1 year) $150k• Tevatron Data Preservation (2 year) $350k/year
• Partnering with NSF through OSG• Will be aggressive in upcoming data and knowledge
discovery opportunities at DOE 9
Geant4
10
Workshop Held• between HEP and ASCR• Discussed how to transform
GEANT4 to run efficiently on modern and future multi-core computers and hybrids
• Workshop chairs were Robert Lucas (USC) and RR.
• Funded for $1M/year for 2 years
Here: Algorithmic development to be able to utilize multi-core architectures and are porting G4 sections to the GPUs)
CMS • CMS would like to maintain current trigger thresholds for
2015 run to allow full Higgs Characterization• Thus nominal 350hz output would increase to ~1khz.• Computing budgets expected to remain constant – not
grow.• Need to take advantage of leadership class computing
faciilites• Need to incorporate more parallelism into software• Algorithms need to be more efficient (faster)
11
PDACS
• Portal for Data Analysis Services for Cosmological Simulation. Joint Project with Argonne, Fermilab, and NERSC Salman Habib (Argonne) is the PI
• Cosmological data/analysis service at scale – a workflow management system
• Portal based on that used for computational biology – idea is to facilitate analysis/simulation effort for those not familiar with advanced computing techniques12
Dark energy, matter Cosmic gas Galaxies
Simulations connect fundamentals with observables
Data Archival Facility
13
• Would like to offer archive facilities for broader community
• Will require work on front ends to simplify for non HEP Users
• Had discussions with Ice Cube
One of Seven 10k slot tape robots at FNAL
We Can’t forget our day job….
14
CMS Tier 1 at Fermilab• The CMS Tier-1 facility at Fermilab
and the experienced team who operate it enable CMS to reprocess data quickly and to distribute the data reliably to the user community around the world.
• We lead US and Overall CMS in Software and computing
15
Fermilab also operates: • LHC Physics Center (LPC)• Remote Operations Center• U.S. CMS Analysis Facility
Intensity Frontier Program (Diverse)
16
Intensity Frontier Strategy
• Common approaches/solutions are essential to support this broad range of experiments with limited SCD staff. Examples include ArtDAQ, ART, SAM IF, LArSoft, Jobsub,…
• SCD has established a liaison between ourselvs and experiments to insure communication and understand needs/requirements
• Completing the process of establishing MOU’s between SCD and experiment to clarify our roles/responsibilities
17
Intensity Frontier Strategy - 2
• A shared analysis facility where we can quickly and flexibly allocate computing to experiments
• Continue to work to “grid enable” the simulation and processing software Good success with MINOS, MINERvA and Mu2e
• All experiments use shared storage services – for data and local disk – so we can allocate resources when needed
• Perception that intensity frontier will not be computing intensive is wrong
18
artdaq Introduction
artdaq is a toolkit for creating data acquisition systems to be run on commodity servers• It is integrated with the art event reconstruction and analysis
framework for event filtering and data compression.• provides data transfer, event building, process management,
system and process state behavior, control messaging, message logging, infrastructure for DAQ process and art module configuration, and writing of data to disk in ROOT format.
• The goal is to provide the common, reusable components of a DAQ system and allow experimenters to focus on the experiment-specific parts of the system. This software that reads out and configures the experiment-specific front-end hardware, the analysis modules that run inside of art, and the online data quality monitoring modules.
• As part of our work in building the DAQ software system for upcoming experiments, such as Mu2e and Darkside 50, we will be adding more features
• . 19
artdaq IntroductionWe are currently working with the DarkSide-50 collaboration to develop and deploy their DAQ system using artdaq.• The DS-50 DAQ reads out ~15 commercial VME modules into four front-end
computers using commercial PCIe cards and transfers the data to five event builder and analysis computers over a QDR Infiniband network.
• The maximum data rate through the system will be 500 MB/s, and we have achieved a data compression factor of five.
• The DAQ system is being commissioned at LNGS, and it is being used to collect data and monitor the performance of the detector as it is being commissioned. (plots of phototube response?)
artdaq will be used for the Mu2e DAQ, and we are working toward a demonstration system which reads data from the candidate commercial PCIe cards, builds complete events, runs sample analysis modules, and writes the data to disk for later analysis. • The Mu2e system will have 48 readout links from the detector into commercial
PCIe cards, and the data rate into the PCIe cards will be ~30 GB/s. Event fragments will be sent to 48 commodity servers over a high-speed network, and the online filtering algorithms will be run in the commodity servers.
• We will be developing the experiment-specific artdaq components as part of creating the demonstration system, and this system will be used to validate the performance of the baseline design in preparation for the CD-review early next year.
20
Cosmic Frontier
• Continue to curate data for SDSS • Support data and processing for Auger, CDMS and COUPP • Will maintain an archive copy of the DES data and provide
modest analysis facilities for Fermilab DES scientists. Data management is an NCSA (NSF) responsibility Helping NCSA by “wrappering” science codes needed for 2nd light
when NCSA completes its framework.• DES use Open Science Grid resources opportunistically and
will make heavy use of NERSC• Writing Science Framework for DES – hope to extend to
LSST• Darkside 50 writing their DAQ system using artDAQ
21
Tevatron (Data) Knowledge Preservation
• Maintaining full analysis capability for next few years though building software to get away from custom sys.
• Successful FWP funded. hired TWO domain knowledgeable scientists to lead the preservation effort on each experiment (and 5 fte of SCD effort)
• Knowledge Preservation Need to plan and execute the following…
Preserving analysis notes, electronic logs etc Document how to do analysis well Document sample analyses as cross checks Understand job submission, db, and data handling issues Investigate/pursue virtualization
• Try to keep CDF/D0 strategy in synch and leverage common resources/solutions
22
Synergia at Fermilab
• Synergia is an accelerator simulation package combining collective effects and nonlinear optics Developed at Fermilab, partially
funded by SciDAC
• Synergia utilizes state-of-the-art physics and computer science
– Physics: state of the art in collective effects and optics simultaneously
– Computer science: scales from desktops to supercomputers
• Efficient running on 100k+ cores• Best practices: test suite, unit tests
Synergia is being used to model multiple Fermilab machines
Main Injector for Project-X and Recycler for ANU
Booster instabilities and injection losses
Mu2e: resonant extraction from the Debuncher
Weak scaling to 131,072 cores
Synergia collaboration with CERN for LHC injector upgrades
• CERN has asked us to join in an informal collaboration to model space charge in the CERN injector accelerators
• Currently engaged in benchmarking exercise
Current status reviewed at Space Charge 2013 workshop at CERN
Most detailed benchmark of PIC space charge codes to date, using both data and analytic models
Breaking new ground in accelerator simulation
Synergia has emerged as the leader in fidelity and performance
PTC-Orbit has been shown to have problems reproducing individual particle tunes
Individual particle tune vs. initial position
PTC-Orbit displays noise and
growth over time
Synergia results are
smooth and stable
Phase space showing trapping
benchmark Synergia
SCD has more work than human resources
• Insufficiently Staffed at the moment Improving Event generators – especially for Intensity
Frontier Modeling of neutrino beamline/target Simulation effort – all IF experiments want more
resources; both technical and analysis Muon Collider Simulation – both accelerator and
detector R&D in Sofware definable networks
25
Closing Remarks
• SCD has transitioned to fully support the Intensity Frontier
• We also have a number of projects underway to prepare for the paradigm shift in computing
• We are short handed and are having to make choices
26
Back Up – AT the Moment….
27
Riding the Wave of Progress…
28
MKIDs (Microwave Kinetic Inductance Devices)• Pixelated micro-size resonator array.• Superconducting sensors with meV energy gap. Not only a single photon detector:
Theoretically, allow for energy resolution (E/ΔE) of about 100 in the visible and near infrared spectrum.
Best candidate to provide medium resolution spectroscopy of >1 billion galaxies, QSO and other objects from LSST data if the energy resolution is improved to 80 or better, currently at ~16. Note that scanning that number of galaxies is outside the reach of current fiber based spectrometers.
An MKID array of 100,000 pixels will be enough to obtain medium resolution spectroscopic information for all LSST galaxies up to magnitude 24.5 with an error .
High bandwidth: Allows for filtering of atmospheric fluctuations at ~100 Hz or faster.
Multi 10K-pixel instrument and science with MKIDs• PPD and SCD teamed up to build an instrument with a number of pixels between 10K and 100K.
External collaborators: UCSB (Ben Mazin, Giga-Z) , ANL, U. Michigan. Potential collaboration: strong coupling with the next CMB instrument proposed by John Carlstrom U.
Chicago and Clarence Chang ANL that also requires the same DAQ readout electronics.
• Steve Heathcote, director of the SOAR telescope, Cerro Tololo, has expressed interest in hosting the MKID R&D instrument in 2016. (Ref. Steve Heathcote letter to Juan Estrada (FNAL)).
• SOAR telescope operations in late 2016: 10 nights x 10 hours/night. would give a limiting magnitude of ~ 25. Potential science (under consideration): Photometric redshift calibration for DES, Cluster of galaxies,
Supernovae host galaxy redshift, Strong lensing.
• SCD/ESE will design the DAQ for up to 100K pixel instrument. 1000 to 2000 MKIDs per RF feed-line, 50 feedlines. Input bandwidth: 400 GB/s Triggerless DAQ. Data reduction: ~200 MB/s to storage. Digital signal processing for FPGAs, GPUs, processors, etc.
• Status: Adiabatic dilution refrigerator (ADR) functioning at Sidet. Test of low noise electronics underway. MKID testing to start this summer. Electronic system design underway.
31 Scientific Computing - Fermilab S&T Review, Sept 5 2012
• The Open Science Grid (OSG) advances science through open distributed computing. The OSG is a multi-disciplinary partnership to federate local, regional, community and national cyberinfrastructures to meet the needs of research and academic communities at all scales.
• Total of 95 sites; ½ million jobs a day, 1 million CPU hours/day; 1 million files transferred/day.
• It is cost effective, it promotes collaboration, it is working!
Open Science Grid (OSG)
The US contribution and partnership with the LHC
Computing Grid is provided through OSG
for CMS and ATLAS