The Convergence of HPC and Big DataThe Convergence of HPC and Big Data Pete Beckman Senior Scien6st,...
Transcript of The Convergence of HPC and Big DataThe Convergence of HPC and Big Data Pete Beckman Senior Scien6st,...
TheConvergenceofHPCandBigData
PeteBeckmanSeniorScien6st,ArgonneNa6onalLaboratoryCo-Director,Northwestern/ArgonneIns6tuteforScienceandEngineering(NAISE)SeniorFellow,UniversityofChicagoComputa6onIns6tute
2
ArgonneNa0onalLaboratory
§ $675M/yrbudget§ 3,200employees§ 1,450scien6sts/eng§ 750Ph.D.s
3
Argonne’sNextBigSupercomputer:Aurora
4
Following the International Exascale Software Initiative (IESP 2008-2012 èBig Data and Extreme Computing workshops (BDEC)
http://www.exascale.org/bdec/ Overarching goal: 1. Create an international collaborative process focused on the co-design of software infrastructure for
extreme scale science, addressing the challenges of both extreme scale computing and big data, and supporting a broad spectrum of major research domains,
2. Describe funding structures and strategies of public bodies with Exascale R&D goals worldwide 3. Establishing and maintaining a global network of expertise and funding bodies in the area of
Exascale computing 1 – BDEC Workshop, Charleston, SC, USA, April 29-May1, 2013 2 – BDEC Workshop, Fukuoka, Japan, February 26-28, 2014 3 – BDEC Workshop, Barcelona, Spain, January 28-30, 2015 4 – BDEC Workshop, Frankfurt, Germany, June 15-17, 2016
Europe-USA-AsiaInterna6onalseriesofWorkshopsonExtremeScaleScien6ficCompu6ng
5
6
Courtesy:MarkAsch
Courtesy:MarkAsch
Hardware&OS
Applica0ons
Shared
11
12
WhyConverge?Independentpaths:MoreCost,LessScience,
• $mul6plehardwareso]wareinfrastructures• $developingso]warefortwocommuni6es• $learningtwocompu6ngmodels• $smallerdiscoverycommunity,fewerideas• Lessscience
13Hardware&OS
Applica0ons
Shared
14
ANL:PeteBeckman(PI),MarcSnir(ChiefScien,st),PavanBalaji,RinkuGupta,KamilIskra,FranckCappello,RajeevThakur,KazutomoYoshii
LLNL:MayaGokhale,EdgarLeon,BarryRountree,Mar6nSchulz,BrianVanEssenPNNL:SriramKrishnamoorthy,RobertoGioiosaUC:HenryHoffmannUIUC:LaxmikantKale,EricBohm,RamprasadVenkataramanUO:AllenMalony,SameerShende,KevinHuckUTK:JackDongarra,GeorgeBosilca,ThomasHerault
See http://www.argo-osr.org/ for more information
AnExascaleOpera0ngSystemandRun0meSoJwareResearch&DevelopmentProject
Developingvendorneutral,open-sourceOS/Rso]ware
15
WhatOS/RGapsMustWeAddress?• Extremein-nodeparallelism
– Poormechanismsforpreciseresourcemanagement(cores,power,memory,network)– Legacythreads/tasksimplementa6onsperformpoorlyatscale
• DynamicvariabilityofplaPorm;Powerisconstrained– Poorrun6memechanismsformanagingdynamicoverclocking,provisioningpower,
adjus6ngworkloads– Nomechanismsformanagingpowerdynamically,globally,andincoopera6onwith
user-levelrun6melayers
• Hierarchicalmemory– Poorinterfaces/strategiesformanagingdeepeningmemory
• NewmodesforHPC– Noportableinterfacesforeasilybuildingworkflows,in-situanalysis,coupledphysics,
advancedI/O,applica6onresilience
10/28/16 ArgoOSRPeteBeckman 15
16HierarchyofEnclaves
connectedviaaBackplane
Elas6cintranodecontainerswithresourceknobs
.
.
. Lightweightthread/tasksdesignedforcontainers,messaging,andmemoryhierarchy
Adap6ve,learning,integratedcontrolsystem
ArgoExplora6onstoAddressExascaleGaps
17Hardware&OS
Applica0ons
Shared
18
Understanding Cities
80% GDP
70% Energy
70% GHG
19
Whymeasureci6es?
20
PM 2.5, 10, 100
Acollabora6veproject:ArgonneNa6onalLaboratory,theUniversityofChicago,andtheCityofChicago
Supportedbycollabora6ngins6tu6onsandtheU.S.Na6onalScienceFounda6on.IndustryIn-Kindpartners:AT&T,Cisco,Intel,Microso],MotorolaSolu6ons,SchneiderElectric,Zebra
21
22
Waggle:AnOpenPlamormforIntelligentSensorsExploi6ngDisrup6veTechnology,EdgeCompu,ng,ResilientDesign
MachineLearningComputerVision
NovelSensorsNano/MEMS
LowPowerCPUsGPU/Smartphones
23
RelaysCurrentSensors
Cont
rol
Proc
esso
r
Real6meclock&Internalsensors
Mul6plebootmedia(μSD/eMMC)
4-coreARM
NodeControl&Communica6ons
4+4-coreARM8-coreGPU
In-Situ/EdgeProcessing
Powerful,Resilient&Hackable
Hear
tbea
tMon
itors
Re
setp
ins
“DeepSpaceProbe”DesignLinuxDevelopmentEnvironment
24
25
26
Waggle/AoTRobustTes0ng
27
EdgeCompu0ng:AnalysisandFeatureRecogni0onPreservingPrivacy……
• ParallelCompu6ng• OpenPlamorm• DeepLearning
28
30
WaggleMachineLearning&EdgeCompu6ng• WeareexploringCaffe&OpenCV
– Convolu6onalNeuralNetworks
• TrainingwillbedoneonsystemsatArgonne
• Classifica6ononWaggle
32
TheData
33
PM2.5Alert
PowerOutage
Damen&Ashland
34Pete Beckman, Charlie Catlett, Rajesh Sankaran (ANL)
35
36
37
Waggle:APlaPormforResearch
• OpenSource/OpenPlaPorm– Reusable,extensibleso]warecommuni6es
• MachineLearning:ComputerVision– Datamustbereducedin-situ
• NovelSensors:Nano/MEMS/μfluidics– Explosionofnano/MEMS&imagingtech
• Low-PowerCPUs:GPU/Smartphones– Powerful,low-power,smartphoneCPUs
Opportunity:BigData+Predic6veModelsSmartSensors+Supercomputers/CloudCompu6ng=predic6onsandanalysis
38
WhyHPCGeeksShouldCare• Newsensorsareprogrammableparallelcomputers
– Mul6core+GPUs&OpenCLorOpenMP– Newalgorithmsforin-situdataanalysis,featuredetec6on,compression,deeplearning– Neednewprogmodfor“stackable”in-situanalysis(forsensorsandHPC)– NeedadvancedOS/Rresilience,cybersecurity,networking,over-the-airprogramming
• 1000sofnodesmakeadistributedcompu0ng“instrument”– Newstreamingprogrammingmodelneeded– Newtechniquesformachinelearningforscien6ficdatarequired
• Bothforwithina“node”andcollec6velyacross6meseries
• HowwillHPCstreaminganaly0csandsimula0onbeconnectedtolivedata?– CanwetriggerHPCsimula6onsa]erfirstapproxima6ons?(weather,energy,transporta6on)– UnstructureddatabasewithprovenanceandmetadataforQA/collabora6on
• UsenovelHPChardwaretosolvepowerissue?– CanweuseneuromorphicorFPGAstoreducepowerforin-situanalysis&compression?
• Wearetradingprecision&costforgreaterspa6alresolu6on:Whatispossible?
39
CloudDatabase
NearRealTimeHPCSimula0ons
DataAggrega0onMul0pleSources
DataAnalysisandHPCsimula0ons
ParallelComputa0onattheEdge
NewEdgeAlgorithms
40
RISCversionofConvergenceStory:Startbyenabling…removeroadblocks(theneveryone’swishlistfollows)
• Applica6ons(ScienceDrivers):So]wareneeds&workflowpauerns• Opera0ons
– Supportreal-6meandstreamingfromfastnetworks– Supportnodesharing,long-livedservices,storagerequestsforyears…
• Architecture– Mothballcurrentparallelfilesystems,replacewithpersistentstorageservices
(databases,KV,etc.)– Acceleratemoveofstorageintocomputeinfrastructure
• SoJware– Linuxso]waredevelopmentenvironment.– Na6vesupportforlow-levelinfrastructure:Docker,VMs,Mesos,etc.– NewfocusonQoS;So]wareDefinedStorage,on-demandservices,etc.
41
Courtesy:MarkAsch
BOFSC2016
42
ChicagoPittsburgh
New York Portland
Atlanta
Boston
Delhi Chattanooga
2016-17Phase2Pilots
Developingapilotprojectstrategyaimedatempoweringpartneruniversi,esandna,onallaboratoriestoworkwith
theirlocalci,es.
Chicago
2016Phase1Pilots
Seattle Bristol
Newcastle
Developingapilotprojectstrategyaimedatempoweringpartneruniversi,esandna,onallaboratoriestoworkwith
theirlocalci,es.
Denver
Ini6aldiscussions
43
Ques6ons?