A Vision for Exascale, Simulation, and Deep Learning
-
Upload
inside-bigdatacom -
Category
Technology
-
view
512 -
download
2
Transcript of A Vision for Exascale, Simulation, and Deep Learning
“AVisionforExascale:Simulation,DataandLearning”
RickStevensArgonneNationalLaboratory
TheUniversityofChicago
Crescatscientia;vitaexcolatur
Data-DrivenScienceExamplesFormanyproblemsthereisadeepcouplingof observation(measurement)andcomputation(simulation)
Cosmology:Thestudyoftheuniverseasadynamicalsystem
SampleExperimentalscattering
Material composition
Simulated structure
Simulatedscattering
La60%Sr 40%
Materialsscience:Diffusescatteringtounderstanddisorderedstructures
ImagesfromSalmanHabibetal.(HEP,MCS,etc.)andRayOsborneetal.(MSD,APS,etc.)
HowManyProjects?
By 2020, the market for machine learning will reach $40
billion, according to market research firm IDC.
Deep Learning market is projected to be ~$5B by 2020
MarketsareDevelopingatDifferentRates~2020
• HPC(Simulation)à [email protected]%• DataAnalysisà [email protected]%• DeepLearningà ~$5B@65%
• DL>HPCin2024• DL>DAin2030
BigPicture
• Mixofapplicationsischanging• HPC“Simulation”,“Big”DataAnalytics,MachineLearning“AI”
• Manyprojectsarecombiningallthreemodalities– Cancer– Cosmology– MaterialsDesign– Climate– DrugDesign
DeepLearning inClimateScience
• StatisticalDownscaling• Subgrid ScalePhysics• DirectEstimateofClimate
Statistics• EnsembleSelection• Dipole/AntipodeDetection
DeepLearninginGenomics
PredictingMicrobialPhenotypes
ClassificationofTumors
Usingdeeplearningtoenhancecancerdiagnosisandclassification,ICML2013
HighThroughputDrugScreening
DeepLearningasanOpportunityinVirtualScreening,NIPS2014
DeepNetworksScreenDrugs
DeepLearningandDrugDiscovery
DeepLearningInDiseasePrediction
LearningClimateDisease
EnvironmentAssociations
BigDataOpportunitiesforGlobalInfectiousDiseaseSurveillanceSimonI.Hay,DylanB.George,CatherineL.Moyes,JohnS.Brownstein
NeuralNetworksin
Materialsscience
• EstimateMaterialsPropertiesfromCompositionParameters
• EstimateProcessingParametersforSynthesis
• MaterialsGenome
SearchingForLensedGalaxies
15TB/NightUseCNNtofindGravitationalLenses
DeepLearningisbecomingamajorelementofscientificcomputingapplications
• AcrosstheDOElabsystemhundredsofexamplesareemerging– Fromfusionenergytoprecisionmedicine– Materialsdesign– Fluiddynamics– Genomics– Structuralengineering– Intelligentsensing– Etc.
WEESTIMATEBY2021ONETHIRDOFTHESUPERCOMPUTINGJOBSONOURMACHINES
WILLBEMACHINELEARNINGAPPLICATIONS
SHOULDWECONSIDERARCHITECTURESTHATAREOPTIMIZEDFORTHISTYPEOFWORK?
HOWTOLEVERAGEEXASCALE?
TheNewHPC“Paradigm”
SIMULATION
DATAANALYSIS
LEARNING
VISUALIZATION
TheNewHPC“Paradigm”
SIMULATION
DATAANALYSIS
LEARNING
VISUALIZATION
TheCriticalConnectionsI
• EmbeddingSimulationintoDeepLearning– Leveragingsimulationtoprovide“hints”viatheTeacher-StudentparadigmforDNN
– DNNinvokes“SimulationTraining”toaugmenttrainingdataortoprovidesupervised“labels”forgenerallyunlabeleddata
– Simulationscouldbeinvokedmillionsoftimesduringtrainingruns
– Trainingratelimitedbysimulationrates– Ex.CancerDrugResistance
HybridModelsinCancer
Teacher-StudentNetworkModel
Teacher-StudentNetworkModelSimulationBasedPredictions
IntegratingMLandSimulation
TheCriticalConnectionsII
• EmbeddingMachineLearningintoSimulations– Replacingexplicitfirstprinciplesmodelswithlearnedfunctions
– Faster,LowerPower,LowerAccuracy(?)– FunctionsinsimulationsaccessingMLmodelsathighthroughput
– Onnodeinvocationofdozensorhundredsofmodelsmillionsoftimespersecond?
– Ex.Nowcasting inWeather
AlgorithmApproximation
NeuralAccelerationforGeneral-PurposeApproximateProgramsHadi Esmaeilzadeh AdrianSampsonLuisCeze DougBurger∗UniversityofWashington∗MicrosoftResearch
ReplacingImperativeCodewithNNComputedApproximations
NeuralAccelerationforGeneral-PurposeApproximateProgramsHadi Esmaeilzadeh AdrianSampsonLuisCeze DougBurger∗UniversityofWashington∗MicrosoftResearch
2.3xSpeedup,3xPowerReduction,~7%Error
NeuralAccelerationforGeneral-PurposeApproximateProgramsHadi Esmaeilzadeh AdrianSampsonLuisCeze DougBurger∗UniversityofWashington∗MicrosoftResearch
JointDesignofAdvancedComputingSolutionsforCancerDOE-NCIpartnershiptoadvancecancerresearchandhighperformancecomputingintheU.S.
NCINationalCancerInstituteDOE
DepartmentofEnergy
Cancerdrivingcomputingadvances
Computingdrivingcanceradvances
DOESecretaryofEnergy
DirectoroftheNationalCancerInstitute
ScalableDataAnalytics
DeepLearning
Large-ScaleNumericalSimulation
DOEObjective:DirveIntegrationofSimulation,DataAnalyticsandMachineLearning
CORALSupercomputersandExascaleSystems
TraditionalHPC
Systems
Exascale Node ConceptSpace
AbstractMachineModelsandProxyArchitecturesforExascale ComputingRev1.1SandiaNationalLaboratoryandLawrenceBerkeleyNationalLaboratory
LeverageResourcesontheDie,inPackageorontheNode
• Localhigh-bandwidthmemorystacks• Nodebasednon-volitile memory• High-BandwidthLowLatencyFabric• GeneralPurposeCores• DynamicPowerManagement
WhatKindofAccelerator(s)toAdd?
• VectorProcessors• DataFlowEngines• PatchesofFPGA• Many“Nano”Cores(<5MTr each?)
Hardwareandsystemsarchitecturesareemergingforsupportingdeeplearning
• CPUs– AVX,VNNI,KNL,KNM,KNH,…
• GPUs– Nvidia P100,V100,AMDInstinct,BaiduGPU,…
• ASICs– Nervana,DianNao,Eyeriss,GraphCore,TPU,DLU,…
• FPGA– Arria10,Stratix10,FalconMesa,…
• Neuromorphic– TrueNorth,Zeroth,N1,…
Aurora21
• Argonne’sExascale System• Balancedarchitecturetosupportthreepillars
– Large-scaleSimulation(PDEs,traditionalHPC)– DataIntensiveApplications(sciencepipelines)– DeepLearningandEmergingScienceAI
• Enableintegrationandembeddingofpillars• Integratedcomputing,acceleration,storage• Towardsacommonsoftwarestack
DeepLearningApplications• DrugResponsePrediction• ScientificImage
Classification• ScientificText
Understanding• MaterialsPropertyDesign• GravitationalLens
Detection• FeatureDetectionin3D• StreetSceneAnalysis• OrganismDesign• StateSpacePrediction• PersistentLearning• HyperspectralPatterns
ArgonneTargetsforExascaleSimulationApplications• MaterialsScience• Cosmology• MolecularDynamics• NuclearReactorModeling• Combustion• QuantumComputer
Simulation• ClimateModeling• PowerGrid• DiscreteEventSimulation• FusionReactorSimulation• BrainSimulation• TransportationNetworks
BigDataApplications
• APSDataAnalysis• HEPDataAnalysis• LSSTDataAnalysis• SKADataAnalysis• MetagenomeAnalysis• BatteryDesignSearch• GraphAnalysis• VirtualCompound
Library• NeuroscienceData
Analysis• GenomePipelines
44
DeepLearningApplications
• LowerPrecision(fp32,fp16)• FMAC@32and16okay• Inferencingcanbe8bit(TPU)• Scaledintegerpossible• Trainingdominatesdev• Inferencedominatespro• Reuseoftrainingdata• Datapipelinesneeded• DenseFPtypicalSGEMM• SmallDFT,CNN• EnsemblesandSearch• SingleModelsSmall• ImoreimportantthanO• Outputismodels
DifferingRequirements?SimulationApplications
• 64bitfloatingpoint• MemoryBandwith• RandomAccesstoMemory• SparseMatrices• DistributedMemoryjobs• SynchronousI/Omultinode• ScalabilityLimitedComm• LowLatencyHighBandwidth• LargeCoherencyDomains
helpsometimes• OtypicallygreaterthanI• Orarelyread• Outputisdata
BigDataApplications
• 64bitandIntegerimportant• DataanalysisPipelines• DBincludingNoSQL• MapReduce/SPARK• Millionsofjobs• I/Obandwidthlimited• Datamanagementlimited• Manytaskparallelism• Large-datainandLarge-data
out• IandObothimportant• Oisreadandused• Outputisdata
45
Aurora21Exascale Software
• SingleUnifiedstackwithresourceallocationandschedulingacrossallpillarsandabilityforframeworksandlibrariestoseamlesslycompose
• Minimizedatamovement:keeppermanentdatainthemachineviadistributedpersistentmemorywhilemaintainingavailabilityrequirements
• SupportstandardfileI/OandpathtomemorycoupledmodelforSim,DataandLearning
• Isolationandreliabilityformulti-tenancyandcombiningworkflows
TowardsanIntegratedStack
TheNewHPC“Paradigm”
SIMULATION
DATAANALYSIS
LEARNING
VISUALIZATION
Acknowledgements
ManythankstoDOE,NSF,NIH,DOD,ANL,UC,MooreFoundation,SloanFoundation,Apple,Microsoft,Cray,Intel,NVIDIAandIBMforsupportingourresearchgroupovertheyears
End!
OurVisionAutomateandAccelerate
TheCANDLEExascaleProject
56
DrugResponse CANDLEGeneralWorkflow
56
ECP-CANDLE :CANcerDistributedLearningEnvironmentCANDLEGoals
Developanexascaledeeplearningenvironmentforcancer
BuildingonopensourceDeeplearningframeworks
OptimizationforCORALandexascaleplatforms
Supportallthreepilotprojectneedsfordeep
CollaboratewithDOEcomputingcenters,HPCvendorsandECPco-designandsoftwaretechnologyprojects
57
CANDLESoftwareStack
HyperparameterSweeps,DataManagement(e.g.DIGITS,Swift,etc.)
ArchitectureSpecificOptimizationLayer(e.g.cuDNN,MKL-DNN,etc.)
Tensor/GraphExecutionEngine(e.g.Theano,TensorFlow,LBANN-LL,etc.)
Networkdescription,ExecutionscriptingAPI(e.g.Keras,Mocha)
Workflow
Scripting
Engine
Optimization
58
DLFrameworks“TensorEngines”• TensorFlow(c++,symbolicdiff+)• Theano(c++,symbolicdiff+)• Neon (integrated)(python+GPU,symbolicdiff+)• Mxnet (integrated)(c++)• LBANN (c++,aimedatscalablehardware)• pyTorch7THTensor(clayer,symbolicdiff-,pgks)• Caffe (integrated)(c++,symbolicdiff-)• Mocha backend(julia+GPU)• CNTKbackend(microsoft)(c++)• PaddlePaddle(Baidu)(python,c++,GPU)
• Variational AutoEncoder– Learning(non-linear)featuresofcoredatatypes
• AutoEncoder– Moleculardynamicstrajectorystatedetection
• MLP+LCNNClassification– Cancertypefromgeneexpression/SNPs
• MLP+CNNRegression– Drugresponse(geneexp,descriptors)
• CNN– Cancerpathologyreporttermextraction
• RNN-LSTM– Cancerpathologyreporttextanalysis
• RNN-LSTM– Moleculardynamicssimulationcontrol
CANDLEBenchmarks..Representativeproblems
ProgressinDeepLearningforCancer• AutoEncoders – learningdatarepresentationsforclassificaitonandpredictionofdrugresponse,moleculartrajectories
• VAEsandGANs– generatingdatatosupportmethodsdevelopment,dataaugmentationandfeaturespacealgebra,drugcandidategeneration
• CNNs – typeclassification,drugresponse,outcomesprediction,drugresistance
• RNNs– sequence,textandmoleculartrajectoriesanalysis
• Multi-TaskLearning– terms(fromtext)andfeatureextraction(data),datatranslation(RNAseq<->uArray)
CANDLE- FOM– RateofTraining• “Numberofnetworkstrainedperday”
– sizeandtypeofnetwork,amountoftrainingdata,batchsize,numberofepochs,typeofhardware
• “Numberof‘weight’updates/second”– ForwardPass+BackwardPass
• TrainingRate=∑ni=1 aiRi whereRi istherateforourbenchmarki andaiisaweight
7 CANDLEBenchmarks
Benchmark Type Data ID OD SampleSize
SizeofNetwork
Additional(activation,layer
types,etc.)1.P1:B1Autoencoder MLP RNA-Seq 105 105 15K 5layers Log2(x+1)à [0,1]
KPRM-UQ2.P1:B2Classifier MLP SNPà
Type106 40 15K 5layers TrainingSetBalance
issues3.P1:B3Regression MLP+LCN expression;
drug descs105 1 3M 8layers DrugResponse
[-100,100]
4.P2:B1Autoencoder MLP MDK-RAS 105 102 106-108 5-8layers StateCompression
5.P2:B2RNN-LSTM RNN-LSTM MDK-RAS 105 3 106 4layers StatetoAction
6.P3:B1RNN-LSTM RNN-LSTM Pathreports
103 5 5K 1-2layers Dictionary12K+30K
7.P3:B2Classification CNN Pathreports
104 102 105 5layers Biomarkers
BenchmarkOwners:• P1:FangfangXia(ANL)• P2:BrianVanEssen(LLNL)• P3:ArvindRamanathan(ORNL)
64
https://github.com/ECP-CANDLE
TypicalPerformanceExperienceCANDLE- Predictingdrugresponseoftumorsamples• MLP/CNNonKeras• 7layers,30M- 500Mparameters• 200GBinputsize• 1hour/epochonDGX-1;200epochstake8days(200GPU
hrs)• Hyperparametersearch~200,000GPUhrsor8MCPUhrs
Proteinfunctionclassificationingenomeannotation• DeepresidualconvolutionnetworkonKeras• 50layers• 1GBinputsize• 20minutes/epochonDGX-1;200epochstake3days(72
GPUhrs)• Hyperparametersearch~72,000GPUhrsor2.8MCPUhrs
GithubandFTP
• ECP-CANDLEGitHubOrganization:• https://github.com/ECP-CANDLE
• ECP-CANDLEFTPSite:• TheFTPsitehostsallthepublicdatasetsfor thebenchmarksfromthreepilots.
• http://ftp.mcs.anl.gov/pub/candle/public/
ThingsWeNeed• DeepLearningWorkflowTools• DataManagementforTrainingDataandModels• PerformanceMeasurement,ModelingandMonitoringofTrainingRuns
• DeepNetworkModelVisualization• Low-levelSolvers,OptimizationandDataEncoding
• ProgrammingModels/RuntimestosupportnextgenerationParallelDeepLearningwithsparsity
• OSSupportforHigh-ThroughputTraining