SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest...

8
Website: https://press3.mcs.anl.gov/cpac/projects/scidac Software portal: http://www.hep.anl.gov/cosmology/CosmicEmu/emu.html Workshop: Argonne, Sep 24-25, 2018 – “Advanced Statistics Meets Machine Learning” (https://indico.fnal.gov/event/18318/overview) SciDAC: Accelerating HEP Science — Inference and Machine Learning at Extreme Scales 1 Team: P. Balaprakash, M. Binois, S. Habib (PI), K. Heitmann (Argonne PI), E. Kovacs, N. Ramachandra, S. Wild (Argonne); A. Fadikar, R. Gramacy, D. Higdon (Va Tech PI) (Va Tech); E. Lawrence (LANL, Dep. PI); Y. Lin, A. Slosar (BNL PI), S. Yoo (BNL); Z. Lukic (LBNL PI), D. Morozov (LBNL) Focus Areas: • Cosmology: Unique arena for advanced stats/ML applications — big data, big compute, large-scale inverse problems ‘Stats/ML at Scale’: Need to speed up methods by many orders of magnitude to enable dealing with datasets and science requirements in the multi-PB to EB era • Accuracy: Many problems in a regime where statistical errors are subdominant — need to understand how to deal with modeling/ mitigating systematics

Transcript of SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest...

Page 1: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Website: https://press3.mcs.anl.gov/cpac/projects/scidac Software portal: http://www.hep.anl.gov/cosmology/CosmicEmu/emu.html Workshop: Argonne, Sep 24-25, 2018 – “Advanced Statistics Meets Machine Learning” (https://indico.fnal.gov/event/18318/overview)

SciDAC: Accelerating HEP Science — Inference and Machine Learning at Extreme Scales

!1

Team: P. Balaprakash, M. Binois, S. Habib (PI), K. Heitmann (Argonne PI), E. Kovacs, N. Ramachandra, S. Wild (Argonne); A. Fadikar, R. Gramacy, D. Higdon (Va Tech PI) (Va Tech); E. Lawrence (LANL, Dep. PI); Y. Lin, A. Slosar (BNL PI), S. Yoo (BNL); Z. Lukic (LBNL PI), D. Morozov (LBNL)

Focus Areas: • Cosmology: Unique arena for advanced stats/ML applications —

big data, big compute, large-scale inverse problems • ‘Stats/ML at Scale’: Need to speed up methods by many orders of

magnitude to enable dealing with datasets and science requirements in the multi-PB to EB era

• Accuracy: Many problems in a regime where statistical errors are subdominant — need to understand how to deal with modeling/mitigating systematics

Page 2: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Science with Surveys as an Inverse Problem: Extreme-Scale Computing meets Statistics and Machine Learning

!2

StatsML+Stats ML+Stats

ML+Stats

ML+Stats

▪ Use of HPC resources as high-fidelity, large data-volume sources for state-of-the-art data-intensive statistical and machine learning (ML) methods

▪ Need to speed up the forward modeling process, deal with ‘curse of dimensionality’ in the inverse problem

▪ How to control errors if the modeling and measurement error PDFs are uncertain?

Cosmologicalscientificinferenceprocessshowingforwardmodelingandsystematicerrorexploration/controlloop

Page 3: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

New Techniques for Photometric Redshift Estimation

!3

Scientific Achievement Estimationofgalaxyredshiftdistributionusingphotometricinformation,morphology,andspatialcorrelations;applicationtoLSST

Significance and Impact Characterizationandreductionofphotometricredshiftestimationerrorsessentialforsuccessofimagingsurveys

Research Details • Largesyntheticdatasetbasedonrealistictemplatesfor

spectralenergydistributions(SEDs)ofdifferentgalaxytypes

• Machinelearningtechniquesforclassification(hiddenspacevariables),useofmixturemodels;BayesianlearningforposteriorPDFs

• Earlyresultsshowgreatpromiseforphotometricredshiftestimationapplicationsanderrormitigation

Multi-GaussianProcessapproachtoobtainestimatedredshiftPDFsandcomparisonstotrainingsetvalues(redverticallines)

True redshift

Imagecredit:A.Fadikar

Page 4: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Precision Emulation of CMB Power Spectra

!4

Scientific Achievement Fast,accuratepredictionofcosmicmicrowavebackground(CMB)variables(~2000Xspeedup)with0.2%errorsoverthedesireddynamicrange

Significance and Impact Predictions/forecastsfornext-generationCMBsurveys(CMB-S4),analysisofcurrent-generationdata(ACTPol,Planck,SPT-3G,—)

Research Details • Largetraining/validationdatasetgeneratedusing

theCAMBcodewithsymmetricLatinhypercubesampling

• Dimensionalreductionviaunsupervisedlearning(comparisonofvariationalautoencodersandPCA)

• Non-parametric,GaussianProcess-basedinterpolation;errorestimatesviaMCMC

PlanckvsWMAPcosmologicalparametersusingtheemulator

CMBTTpowerspectrumwitherrorestimates(lowerpanel)

Imagecredit:M.Binois

Imagecredit:N.Ramachandra

Page 5: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Neural Network Prediction of CMB Dust Foreground

!5

Scientific Achievement SignificantimprovementinpredictionfordustforegroundusingconvolutionalneuralnetworksandGalacticneutralhydrogendata

Significance and Impact Currentworkusesintensitydata,thenextgenerationwillfocusonpolarizationtohelpwithoptimalfieldselectionanddataanalysisforasmallapertureCMB-S4experiment

Research Details • Inputdataare50velocityslicesingalacticneutralhydrogen

astracedbythe21cmline • OutputdataarethedifferenceinPlanck353GHzand143GHz

datawhichisdominatedbythedustsignal • Trainedonsoutherngalactichemisphere,validated/tested

onthenortherngalactichemisphere • Optimallinearmodelgivesnegligibleimprovement:neural

netispickingupnontrivialinformation

Improvementincross-correlationcoefficientwithtargetmapcomparedtonaivetotalintensitymap;abovered-lineindicatesimprovement,belowindicatesdeterioration(Greenisfor~1degscales,blueisfor~10degscales)

Exampleprediction:Planckdifferencemap(left)andmodelprediction(right);blackcirclesarepoint-sourcemask

Imagescredit:G.Zhang

Page 6: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Basic Emulation for Ly-alpha Forest Statistics

!6

Scientific Achievement HPCframeworktoinfercosmologicalandthermalparametersusingLy-alphapowerspectrumandselectedcomputationalmodelruns

Significance and Impact Ly-alphaforestobservationsarethemainwindowintostructureformationathighredshifts(2<z<5)andasensitiveprobeofnon-CDMcosmologies.P(k)emulationisnecessaryforrecoveryofcosmologicalparametersfromobservations

Research Details • Automatedsystemforiterativelyrunningcosmological

simulationsandanalysistasksonHPCsystems • Newiterativemethodtodeterminemostinformative

pointsinparameterspaceforrunningthenextbatchofsimulations

• MultiplewaystodoGPemulationofvectorsummarystatistics,i.e.,exploringwaysofcombiningk-andcosmologicalparameterdependenceofemulatedP(k)

Spaceofcosmologicalparameters(θ) Expensive3Dsimulation

Posteriorprobability

Summarystatistic

Inferringcosmologicalparametersina3-parametertestcase:Simulationsproducematterconfigurationsthatdependoncosmologicalparameters;Ly-alphaanalysisproducesoutputscomparabletoskysurveymeasurements.Combiningpredictionsandmeasurementsweinfer“currentbest”parameterprobabilitiesaswellasthe“promising”pointsforthenextsetofsimulationsinouriterativeprocedure.

Imagecredit:Z.Lukic

Page 7: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Image Classification/Regression for Strong Lensing

!7

Scientific Achievement Fast(10microseconds/image),robust(80-90%accuracy)classificationofstronglylensedbackgroundgalaxies

Significance and Impact LSSTwillhavetensofbillionsofobjectswith~100Kstronglylensedsources—automatedsourcedetection/filteringisessential

Research Details • Largesyntheticdatasetbasedonfullray

tracingalgorithmwith1)modelhalomassdistributionaslensesand2)halosfromcosmologicalsimulations,realistictelescopeproperties;singleandstackedimages

• DeepCNNclassification/regression • GANsforfastgenerationofnewimages

SingleandstackednoisylensedtrainingimagesforLSST;acompanionsetofnon-lensedimagesisnotshown

Performance:ModernGPUsaresignificantlyfasterthanmanycorearchitectures

Imagecredit:N.Li

Imagecredit:N.Ramachandra

Page 8: SciDAC: Accelerating HEP Science — Inference and ......Basic Emulation for Ly-alpha Forest Statistics!6 Scientific Achievement HPC framework to infer cosmological and thermal parameters

Future Work

!8

▪Emulation Landscape: ▪Extend work on summary statistics to problems with significantly higher

dimensionality, O(10) to O(100) ▪Multi-fidelity emulation ▪Develop new methods for applications to likelihood-free scenarios (e.g.,

semi-analytic galaxy modeling) ▪Fast generation of multiple realizations of ‘raw’ sky data; develop

techniques for ensuring dynamic consistency (causality vs. correlations) ▪ Image Applications: Image cross-validation, source de-blending algorithms,

application to calibration studies ▪ML/DL Methods on HPC Platforms: Work on scaling up ML and statistical

methods on HPC platforms with GPU acceleration (e.g., Cooley@ALCF, Summit@OLCF) ▪Stats meets ML: Improve methods by incorporating model information into

‘black box’ techniques; incorporate optimization methods into Bayesian calibration, many other topics —