Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National...

Post on 16-Oct-2020

2 views 0 download

Transcript of Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National...

Prabhat XLDB May 24, 2016

Realtime Data Analytics at NERSC

- 1 -

Lawrence Berkeley National Laboratory

- 2 -

3

National Energy Research Scientific Computing Center

NERSC is the Production HPC & Data Facility for DOE

Biological and Environmental Systems

Applied Math, Exascale Materials, Chemistry, Geophysics

Particle Physics, Astrophysics

Largest funder of physical science research in U.S.

Nuclear Physics Fusion Energy, Plasma Physics

- 4 -

Focus on Science • NERSC supports the broad

mission needs of the six DOE Office of Science program offices

• 6,000 users and 750 projects • Extensive science engagement

and user training programs • 2078 refereed publications in

2015

- 5 -

NERSC - 2016

2 x 10 Gb

1 x 100 Gb

Software Defined Networking

Data-Intensive Systems PDSF, JGI,KBASE,HEP

14x QDR

Vis & Analytics Data Transfer Nodes Adv. Arch. Testbeds Science Gateways

Global Scratch

3.6 PB 5 x SFA12KE

/project

5 PB DDN9900 & NexSAN

/home 250 TB NetApp 5460

50 PB stored, 240 PB capacity HPSS

80 GB/s

50 GB/s

5 GB/s

12 GB/s

32x FDR IB

28 PB Local

Scratch >700 GB/s

Cori: Cray XC-40

Ph1: 1630 nodes, 2.3GHz Intel “Haswell” Cores, 203TB RAM Ph2: >9300 nodes, >60cores, 16GB HBM, 96GB DDR per node

- 6 -

7.6 PB Local

Scratch 163 GB/s

16x FDR IB

Edison: Cray XC-30

5,576 nodes, 133K, 2.4GHz Intel “IvyBridge” Cores, 357TB RAM

Ethernet & IB Fabric

Science Friendly Security Production Monitoring

Power Efficiency

WAN

1.5 PB “DataWarp”

>1.5 TB/s

The Cori System • Cori will transition HPC and data-

centric workloads to energy efficient architectures

- 7 -

System named after Gerty Cori, Biochemist and first American woman to receive the Nobel prize in science.

Astronomy

Physics Light Sources

Genomics Climate

DOE facilities are facing a data deluge

- 9 -

- 11 -

- 12 -

- 13 -

- 14 -

- 15 -

- 16 -

- 17 -

4 V’s of Scientific Big Data

- 18 -

Science Domain

Variety Volume Velocity Veracity

Astronomy Multiple Telescopes, multi-band/spectra

O(100) TB 100 GB/night – 10 TB/night

Noisy, acquisition artefacts

Light Sources Multiple imaging modalities

O(100) GB 1 Gb/s-1 Tb/s Noisy, sample preparation/acquisition artefacts

Genomics Sequencers, Mass-spec, proteomics

O(1-10) TB TB/week Missing data, errors

High Energy Physics

Multiple detectors O(100) TB – O(10) PB

1-10 PB/s reduced to GB/s

Noisy, artefacts, spatio-temporal

Climate Simulations Multi-variate, spatio-temporal

O(10) TB 100 GB/s ‘Clean’, need to account for multiple sources of uncertainty

Why Real-time Analytics? Why Now?

• Large instruments are producing massive data streams – Fast, predictable turnaround is integral to the processing

pipeline – Traditional HPC systems use batch queues with long or

unpredictable wait times

• Computational Steering <-> Experimental Steering – Change experimental configuration during your precious

beam-time!

• Follow-on analysis might be time critical – Supernovae candidates, asteroid detection

- 19 -

Real-time Use Cases

• Realtime interaction with experimental facilities – Light Sources: ALS, LCLS

• Realtime jobs driven by web portals – OpenMSI, MetAtlas

• Computational Steering – DIII – D reactor

• Experimental Steering – iPTF follow-on

- 20 -

Real-time Queue at NERSC

• NERSC has made a small pool of nodes available for immediate turnaround / “Realtime” computing – Up to 32 nodes in realtime queue (1024 cores) – Realtime nodes have higher priority than other queues – Pool can shrink or grow as needed based on demand

• Approved projects have a small number of nodes available on-demand without queue wait times – Configurations on a per-repo basis for

• Maximum number of jobs • Maximum number of cores • Wallclock • …

- 21 -

Usage (12/2015-04/2016)

- 22 -

Distribution

- 23 -

TOTALS: 332,625 hours used 23,244 jobs

Science Use Case: iPTF

- 24 -

DISCOVERIES Yi Cao, et al. (2015) Nature, “A strong ultraviolet pulse from a newborn Type Ia supernova”

PI: Kasliwal, Nugent, Cao

• Nightly images transferred • Subtractions performed • Candidates inserted in database • Typical turn-around time < 5

minutes

Science Use Case: Advanced Light Source

- 25 -

Production running at ALS beamlines: • 24x7 Operation • 176,293 Datasets • 155 Beamline Users • 1,050 TB Data Stored • 2,379,754 Jobs at NERSC

• Image reconstruction algorithms run on Cori

• 3D volume rendered on SPOT web portal

• ALS beamline users receive instant feedback

Science Use Case: Metabolite Atlas

Ben Bowen, LBL

- 26 -

• Pre-computed fragmentation trees for 10,000+ compounds

• Real-time queue used for comparing raw spectra to trees to obtain possible matches

• Results obtained in minutes • iPython interface to NERSC

Science Use Case: Cryo-Electron Microscopy

• Structure determination of TFIID

• 10-100 GB image stacks • Image classification • Real time queue used for

• Assessment of data quality during electron microscopy data collection

• Rapid optimization of data processing strategies

3D structure of TFIID-containing complex Nogales Lab Louder et al. (2016), Nature 531 (7596): 604-619

LCLS Workflow Today: 150 TB Analysis in 5 days

stream XTC format

hitfinder

spotfinder

index

integrate

Cornell–SLAC Pixel Array

Diffraction Detector

Injector

DAQ multilevel data acquisition and

control system

HPSS

Global Scratch

/Project (NGF)

hitfinder

spotfinder

index

integrate

hitfinder

spotfinder

index

integrate

… psana

Prompt analysis

requires Fast Networks

& Real-time HPC

Queues

Compute Engine Cray XC30 Science DMZ

HPSS

Global Scratch

/Project (NGF)

Reconstruction

Actionable knowledge for Next Beamtime

HPC 2GB/s

Streaming data from the detector to HPC ● 100-1000x data rates ● Indexing, classification, reconstruction, via on-the-fly veto system ● Quasi real-time response (<10 min) ● Terabit/s throughput from front-end

electronics ● Petaflop scale analysis on-demand

Indexed Diffraction Image

Reconstructed structure

LCLS-II 2019: Nanocrystallography Pipeline

Key Takeaways

• Data streaming and real-time analytics are emerging requirements at NERSC

• Experimental facilities are heaviest users – Light sources, Telescopes

• SDN capabilities are needed to enable data flows directly between compute node and workflow DBs

• Users would like to use realtime nodes to do more long-running interactive work/debugging

• Provisioning resources for real-time queue is an ongoing exercise

- 30 -

Acknowledgments

• Shreyas Cholia • Doug Jacobsen (NERSC) • NERSC Real-time queue users!

- 31 -

Thanks!

- 32 -