A Vision for Exascale, Simulation, and Deep Learning

“AVisionforExascale:Simulation,DataandLearning”

RickStevensArgonneNationalLaboratory

TheUniversityofChicago

Crescatscientia;vitaexcolatur

Data-DrivenScienceExamplesFormanyproblemsthereisadeepcouplingof observation(measurement)andcomputation(simulation)

Cosmology:Thestudyoftheuniverseasadynamicalsystem

SampleExperimentalscattering

Material composition

Simulated structure

Simulatedscattering

La60%Sr 40%

Materialsscience:Diffusescatteringtounderstanddisorderedstructures

ImagesfromSalmanHabibetal.(HEP,MCS,etc.)andRayOsborneetal.(MSD,APS,etc.)

HowManyProjects?

By 2020, the market for machine learning will reach $40

billion, according to market research firm IDC.

Deep Learning market is projected to be ~$5B by 2020

MarketsareDevelopingatDifferentRates~2020

• HPC(Simulation)à ~$30B@5.45%• DataAnalysisà ~$200B@11.7%• DeepLearningà ~$5B@65%

• DL>HPCin2024• DL>DAin2030

BigPicture

• Mixofapplicationsischanging• HPC“Simulation”,“Big”DataAnalytics,MachineLearning“AI”

• Manyprojectsarecombiningallthreemodalities– Cancer– Cosmology– MaterialsDesign– Climate– DrugDesign

DeepLearning inClimateScience

• StatisticalDownscaling• Subgrid ScalePhysics• DirectEstimateofClimate

Statistics• EnsembleSelection• Dipole/AntipodeDetection

DeepLearninginGenomics

PredictingMicrobialPhenotypes

ClassificationofTumors

Usingdeeplearningtoenhancecancerdiagnosisandclassification,ICML2013

HighThroughputDrugScreening

DeepLearningasanOpportunityinVirtualScreening,NIPS2014

DeepNetworksScreenDrugs

DeepLearningandDrugDiscovery

DeepLearningInDiseasePrediction

LearningClimateDisease

EnvironmentAssociations

BigDataOpportunitiesforGlobalInfectiousDiseaseSurveillanceSimonI.Hay,DylanB.George,CatherineL.Moyes,JohnS.Brownstein

NeuralNetworksin

Materialsscience

• EstimateMaterialsPropertiesfromCompositionParameters

• EstimateProcessingParametersforSynthesis

• MaterialsGenome

SearchingForLensedGalaxies

15TB/NightUseCNNtofindGravitationalLenses

DeepLearningisbecomingamajorelementofscientificcomputingapplications

• AcrosstheDOElabsystemhundredsofexamplesareemerging– Fromfusionenergytoprecisionmedicine– Materialsdesign– Fluiddynamics– Genomics– Structuralengineering– Intelligentsensing– Etc.

WEESTIMATEBY2021ONETHIRDOFTHESUPERCOMPUTINGJOBSONOURMACHINES

WILLBEMACHINELEARNINGAPPLICATIONS

SHOULDWECONSIDERARCHITECTURESTHATAREOPTIMIZEDFORTHISTYPEOFWORK?

HOWTOLEVERAGEEXASCALE?

TheNewHPC“Paradigm”

SIMULATION

DATAANALYSIS

LEARNING

VISUALIZATION

SIMULATION

DATAANALYSIS

LEARNING

VISUALIZATION

TheCriticalConnectionsI

• EmbeddingSimulationintoDeepLearning– Leveragingsimulationtoprovide“hints”viatheTeacher-StudentparadigmforDNN

– DNNinvokes“SimulationTraining”toaugmenttrainingdataortoprovidesupervised“labels”forgenerallyunlabeleddata

– Simulationscouldbeinvokedmillionsoftimesduringtrainingruns

– Trainingratelimitedbysimulationrates– Ex.CancerDrugResistance

HybridModelsinCancer

Teacher-StudentNetworkModel

Teacher-StudentNetworkModelSimulationBasedPredictions

IntegratingMLandSimulation

TheCriticalConnectionsII

• EmbeddingMachineLearningintoSimulations– Replacingexplicitfirstprinciplesmodelswithlearnedfunctions

– Faster,LowerPower,LowerAccuracy(?)– FunctionsinsimulationsaccessingMLmodelsathighthroughput

– Onnodeinvocationofdozensorhundredsofmodelsmillionsoftimespersecond?

– Ex.Nowcasting inWeather

AlgorithmApproximation

NeuralAccelerationforGeneral-PurposeApproximateProgramsHadi Esmaeilzadeh AdrianSampsonLuisCeze DougBurger∗UniversityofWashington∗MicrosoftResearch

ReplacingImperativeCodewithNNComputedApproximations

2.3xSpeedup,3xPowerReduction,~7%Error

JointDesignofAdvancedComputingSolutionsforCancerDOE-NCIpartnershiptoadvancecancerresearchandhighperformancecomputingintheU.S.

NCINationalCancerInstituteDOE

DepartmentofEnergy

Cancerdrivingcomputingadvances

Computingdrivingcanceradvances

DOESecretaryofEnergy

DirectoroftheNationalCancerInstitute

ScalableDataAnalytics

DeepLearning

Large-ScaleNumericalSimulation

DOEObjective:DirveIntegrationofSimulation,DataAnalyticsandMachineLearning

CORALSupercomputersandExascaleSystems

TraditionalHPC

Systems

Exascale Node ConceptSpace

AbstractMachineModelsandProxyArchitecturesforExascale ComputingRev1.1SandiaNationalLaboratoryandLawrenceBerkeleyNationalLaboratory

LeverageResourcesontheDie,inPackageorontheNode

• Localhigh-bandwidthmemorystacks• Nodebasednon-volitile memory• High-BandwidthLowLatencyFabric• GeneralPurposeCores• DynamicPowerManagement

WhatKindofAccelerator(s)toAdd?

• VectorProcessors• DataFlowEngines• PatchesofFPGA• Many“Nano”Cores(<5MTr each?)

Hardwareandsystemsarchitecturesareemergingforsupportingdeeplearning

• CPUs– AVX,VNNI,KNL,KNM,KNH,…

• GPUs– Nvidia P100,V100,AMDInstinct,BaiduGPU,…

• ASICs– Nervana,DianNao,Eyeriss,GraphCore,TPU,DLU,…

• FPGA– Arria10,Stratix10,FalconMesa,…

• Neuromorphic– TrueNorth,Zeroth,N1,…

Aurora21

• Argonne’sExascale System• Balancedarchitecturetosupportthreepillars

– Large-scaleSimulation(PDEs,traditionalHPC)– DataIntensiveApplications(sciencepipelines)– DeepLearningandEmergingScienceAI

• Enableintegrationandembeddingofpillars• Integratedcomputing,acceleration,storage• Towardsacommonsoftwarestack

DeepLearningApplications• DrugResponsePrediction• ScientificImage

Classification• ScientificText

Understanding• MaterialsPropertyDesign• GravitationalLens

Detection• FeatureDetectionin3D• StreetSceneAnalysis• OrganismDesign• StateSpacePrediction• PersistentLearning• HyperspectralPatterns

ArgonneTargetsforExascaleSimulationApplications• MaterialsScience• Cosmology• MolecularDynamics• NuclearReactorModeling• Combustion• QuantumComputer

Simulation• ClimateModeling• PowerGrid• DiscreteEventSimulation• FusionReactorSimulation• BrainSimulation• TransportationNetworks

BigDataApplications

• APSDataAnalysis• HEPDataAnalysis• LSSTDataAnalysis• SKADataAnalysis• MetagenomeAnalysis• BatteryDesignSearch• GraphAnalysis• VirtualCompound

Library• NeuroscienceData

Analysis• GenomePipelines

DeepLearningApplications

• LowerPrecision(fp32,fp16)• FMAC@32and16okay• Inferencingcanbe8bit(TPU)• Scaledintegerpossible• Trainingdominatesdev• Inferencedominatespro• Reuseoftrainingdata• Datapipelinesneeded• DenseFPtypicalSGEMM• SmallDFT,CNN• EnsemblesandSearch• SingleModelsSmall• ImoreimportantthanO• Outputismodels

DifferingRequirements?SimulationApplications

• 64bitfloatingpoint• MemoryBandwith• RandomAccesstoMemory• SparseMatrices• DistributedMemoryjobs• SynchronousI/Omultinode• ScalabilityLimitedComm• LowLatencyHighBandwidth• LargeCoherencyDomains

helpsometimes• OtypicallygreaterthanI• Orarelyread• Outputisdata

BigDataApplications

• 64bitandIntegerimportant• DataanalysisPipelines• DBincludingNoSQL• MapReduce/SPARK• Millionsofjobs• I/Obandwidthlimited• Datamanagementlimited• Manytaskparallelism• Large-datainandLarge-data

out• IandObothimportant• Oisreadandused• Outputisdata

Aurora21Exascale Software

• SingleUnifiedstackwithresourceallocationandschedulingacrossallpillarsandabilityforframeworksandlibrariestoseamlesslycompose

• Minimizedatamovement:keeppermanentdatainthemachineviadistributedpersistentmemorywhilemaintainingavailabilityrequirements

• SupportstandardfileI/OandpathtomemorycoupledmodelforSim,DataandLearning

• Isolationandreliabilityformulti-tenancyandcombiningworkflows

TowardsanIntegratedStack

SIMULATION

DATAANALYSIS

LEARNING

VISUALIZATION

Acknowledgements

ManythankstoDOE,NSF,NIH,DOD,ANL,UC,MooreFoundation,SloanFoundation,Apple,Microsoft,Cray,Intel,NVIDIAandIBMforsupportingourresearchgroupovertheyears

OurVisionAutomateandAccelerate

TheCANDLEExascaleProject

DrugResponse CANDLEGeneralWorkflow

ECP-CANDLE :CANcerDistributedLearningEnvironmentCANDLEGoals

Developanexascaledeeplearningenvironmentforcancer

BuildingonopensourceDeeplearningframeworks

OptimizationforCORALandexascaleplatforms

Supportallthreepilotprojectneedsfordeep

CollaboratewithDOEcomputingcenters,HPCvendorsandECPco-designandsoftwaretechnologyprojects

CANDLESoftwareStack

HyperparameterSweeps,DataManagement(e.g.DIGITS,Swift,etc.)

ArchitectureSpecificOptimizationLayer(e.g.cuDNN,MKL-DNN,etc.)

Tensor/GraphExecutionEngine(e.g.Theano,TensorFlow,LBANN-LL,etc.)

Networkdescription,ExecutionscriptingAPI(e.g.Keras,Mocha)

Workflow

Scripting

Engine

Optimization

DLFrameworks“TensorEngines”• TensorFlow(c++,symbolicdiff+)• Theano(c++,symbolicdiff+)• Neon (integrated)(python+GPU,symbolicdiff+)• Mxnet (integrated)(c++)• LBANN (c++,aimedatscalablehardware)• pyTorch7THTensor(clayer,symbolicdiff-,pgks)• Caffe (integrated)(c++,symbolicdiff-)• Mocha backend(julia+GPU)• CNTKbackend(microsoft)(c++)• PaddlePaddle(Baidu)(python,c++,GPU)

• Variational AutoEncoder– Learning(non-linear)featuresofcoredatatypes

• AutoEncoder– Moleculardynamicstrajectorystatedetection

• MLP+LCNNClassification– Cancertypefromgeneexpression/SNPs

• MLP+CNNRegression– Drugresponse(geneexp,descriptors)

• CNN– Cancerpathologyreporttermextraction

• RNN-LSTM– Cancerpathologyreporttextanalysis

• RNN-LSTM– Moleculardynamicssimulationcontrol

CANDLEBenchmarks..Representativeproblems

ProgressinDeepLearningforCancer• AutoEncoders – learningdatarepresentationsforclassificaitonandpredictionofdrugresponse,moleculartrajectories

• VAEsandGANs– generatingdatatosupportmethodsdevelopment,dataaugmentationandfeaturespacealgebra,drugcandidategeneration

• CNNs – typeclassification,drugresponse,outcomesprediction,drugresistance

• RNNs– sequence,textandmoleculartrajectoriesanalysis

• Multi-TaskLearning– terms(fromtext)andfeatureextraction(data),datatranslation(RNAseq<->uArray)

CANDLE- FOM– RateofTraining• “Numberofnetworkstrainedperday”

– sizeandtypeofnetwork,amountoftrainingdata,batchsize,numberofepochs,typeofhardware

• “Numberof‘weight’updates/second”– ForwardPass+BackwardPass

• TrainingRate=∑ni=1 aiRi whereRi istherateforourbenchmarki andaiisaweight

7 CANDLEBenchmarks

Benchmark Type Data ID OD SampleSize

SizeofNetwork

Additional(activation,layer

types,etc.)1.P1:B1Autoencoder MLP RNA-Seq 105 105 15K 5layers Log2(x+1)à [0,1]

KPRM-UQ2.P1:B2Classifier MLP SNPà

Type106 40 15K 5layers TrainingSetBalance

issues3.P1:B3Regression MLP+LCN expression;

drug descs105 1 3M 8layers DrugResponse

[-100,100]

4.P2:B1Autoencoder MLP MDK-RAS 105 102 106-108 5-8layers StateCompression

5.P2:B2RNN-LSTM RNN-LSTM MDK-RAS 105 3 106 4layers StatetoAction

6.P3:B1RNN-LSTM RNN-LSTM Pathreports

103 5 5K 1-2layers Dictionary12K+30K

7.P3:B2Classification CNN Pathreports

104 102 105 5layers Biomarkers

BenchmarkOwners:• P1:FangfangXia(ANL)• P2:BrianVanEssen(LLNL)• P3:ArvindRamanathan(ORNL)

https://github.com/ECP-CANDLE

TypicalPerformanceExperienceCANDLE- Predictingdrugresponseoftumorsamples• MLP/CNNonKeras• 7layers,30M- 500Mparameters• 200GBinputsize• 1hour/epochonDGX-1;200epochstake8days(200GPU

hrs)• Hyperparametersearch~200,000GPUhrsor8MCPUhrs

Proteinfunctionclassificationingenomeannotation• DeepresidualconvolutionnetworkonKeras• 50layers• 1GBinputsize• 20minutes/epochonDGX-1;200epochstake3days(72

GPUhrs)• Hyperparametersearch~72,000GPUhrsor2.8MCPUhrs

GithubandFTP

• ECP-CANDLEGitHubOrganization:• https://github.com/ECP-CANDLE

• ECP-CANDLEFTPSite:• TheFTPsitehostsallthepublicdatasetsfor thebenchmarksfromthreepilots.

• http://ftp.mcs.anl.gov/pub/candle/public/

ThingsWeNeed• DeepLearningWorkflowTools• DataManagementforTrainingDataandModels• PerformanceMeasurement,ModelingandMonitoringofTrainingRuns

• DeepNetworkModelVisualization• Low-levelSolvers,OptimizationandDataEncoding

• ProgrammingModels/RuntimestosupportnextgenerationParallelDeepLearningwithsparsity

• OSSupportforHigh-ThroughputTraining

A Vision for Exascale, Simulation, and Deep Learning

Technology

Transcript of A Vision for Exascale, Simulation, and Deep Learning

Designing Efficient HPC, Big Data, Deep Learning, and for ... · 11/2/2017 · Designing Efficient HPC, Big Data, Deep Learning, and Cloud Computing Middleware for Exascale Systems

SIMULATION OF EXASCALE NODES THROUGH RUNTIME HARDWARE MONITORING JOSEPH L. GREATHOUSE, ALEXANDER LYASHEVSKY, MITESH MESWANI, NUWAN JAYASENA, MICHAEL IGNATOWSKI.

Resolution Requirements for the Simulation of Deep Moist ...

An Evolutionary Exascale Programming Model Deserves ... · Exascale Systems: The Planning ! ... Candidate exascale programming models defined . DOE Workshop’s Reverse Timeline .

MainIssues - Exascale

Center for Hybrid Rocket Exascale Simulation Technology

Deep potential generation scheme and simulation protocol ...

THE$EXASCALE$SOFTWARE$CENTER · THE$EXASCALE$SOFTWARE$CENTER Pete$Beckman$ Director,$Exascale$Technology$and$Compu

Towards Exascale Direct Numerical Simulations of …on-demand.gputechconf.com/.../SC436-exascale-numerical-simulations... · Towards Exascale Direct Numerical Simulations of Turbulent

WhatistheDOE, Exascale%Initave

NUMERICAL SIMULATION ON A DEEP-DRAFT SEMI-SUBMERSIBLE

Physical Simulation of Conservation Laws: …babrodtk.at.ifi.uio.no/files/publications/brodtkorb...Physical Simulation of Conservation Laws: Reaching for Exascale 2014-04-07 University

Exascale Computing

Wave-dynamics simulation using deep neural networks

Algorithmic/Exascale Simulation Opportunities in …computing.ornl.gov/workshops/town_hall/presentations/B2...May 17, 2007 Algorithmic/Exascale Simulation Opportunities in Modeling

Modelling and simulation for optimizing a deep, closed ...

Exascale storage

Software Libraries and Middleware for Exascale Systems · Increasing Usage of HPC, Big Data and Deep Learning Convergence of HPC, Big Data, and Deep Learning! Increasing Need to Run

Waiting for Exascale

The U.S. D.O.E. Exascale Computing Project – …ipcc.soic.iu.edu/Big-Simulation-and-Big-Data/slides/...2 Exascale Computing Project, What is the Exascale Computing Project (ECP)?