Simulation on Reconfiguraable Heterogeneous Architectures · Expansion of the...
Transcript of Simulation on Reconfiguraable Heterogeneous Architectures · Expansion of the...
Simulation on ReconfiguraSimulation on ReconfiguraSimulation on ReconfiguraDipl Inf Alexander Schöll Prof Dr Hans JoDipl.-Inf. Alexander Schöll, Prof. Dr. Hans-JopI tit t f C t A hit t d C tInstitute of Computer Architecture and Computp p
MotivationMotivationHeterogeneous computer architectures are integrated into single chipsHeterogeneous computer architectures are integrated into single chips
Heterogeneous ArchitecturesHeterogeneous Architectures
Latency‐optimized Throughput‐optimized Reconfigurabley p g p p g
GPU FPGACPU GPUGraphics
FPGAField
CPUCentral Graphics
ProcessingField
ProgrammableCentral
Processing ProcessingUnit
ProgrammableGate Array
Processing Unit y
CPU GPU CPU FPGACPU GPUCPU GPU FPGACPU GPU CPU FPGACPU GPUCPU GPU FPGA
x86‐64ARM
Intel MIC AMD PiledriverI t l H ll
AMD RXN idi K l
Xilinx Zynq Xilinx VirtexAlt St tiARM
MIPSIntel HaswellNvidia Maxwell
Nvidia Kepler Altera StratixMIPSSPARC
Nvidia MaxwellSPARC
Overview of current heterogeneous architecturesOverview of current heterogeneous architectures
Upcoming: Reconfigurable Heterogeneous Computer ArchitecturesUpcoming: Reconfigurable Heterogeneous Computer Architectures
GoalGoal
Development of new methods that enable the direct mapping of− Development of new methods that enable the direct mapping of simulation applications to innovative reconfigurable heterogeneoussimulation applications to innovative reconfigurable heterogeneous computer architecturesp
Current CollaborationsCurrent CollaborationsExpansion of the Pro-Apoptotic Mapping of DomaExpansion of the Pro-Apoptotic Mapping of Doma
Receptor Clustering Model the Nonlinear WReceptor Clustering Model the Nonlinear WH tCooperation with M. Daub • G. Schneider Heterogeneoug
C ti ithInclusion of the extracellular space Cooperation withpProblem: Solve theSimulation of random impact of Ligands on Problem: Solve the Simulation of random impact of Ligands on
th ll b on large domains wthe cell membrane on large domains wnodes
Simulation on multiple time scales for optimal nodes
p pbalance between biological process u u=balance between biological process tt xxu u= −resolution and computation costsresolution and computation costs
Mapping of equationMapping of particle simulation to
Mapping of equationMapping of particle simulation to heterogeneous archheterogeneous architectures
heterogeneous archheterogeneous architectures
SimTech Phase I: Simulation Applications andSimTech Phase I: Simulation Applications andppParallel Simulation of Pro-Apoptotic Acceleration of MParallel Simulation of Pro-Apoptotic Acceleration of M
Receptor Clustering on GPGPU Carlo MoleculaReceptor Clustering on GPGPUM C A hit t [1]
Carlo MoleculaH b id C tMany-Core Architectures[1] Hybrid Computy y p
C. Braun • M. Daub • A. Schöll • G. Schneider • H.-J. Wunderlich C. Braun • S. Holst • J. Cas
Apoptosis: Structured decomposition of Markov-Chain MontApoptosis: Structured decomposition of d d i f t d ll i lti ll l
Markov Chain Monti l ti thdamaged or infected cells in multi-cellular simulations are the
organisms thermodynamicsorganisms thermodynamics
Initiated by signal competent clusters of Problem: Inherent seInitiated by signal competent clusters of Problem: Inherent setumor necrosis factor receptors (TNFR) and this method very hartumor necrosis factor receptors (TNFR) and this method very harcorresponding TNF Ligands on the cell Typically coarse grap g gmembrane
Typically coarse-gramembrane. distribution of simulaP ti l i l ti h b d t
distribution of simulak t ti idParticle simulation has been mapped to or workstation grids pp
GPGPU many core architectureg
GPGPU many-core architecture. MCMC molecular simR d ti f i l ti ti f d t GPGPUReduction of simulation times from mapped to GPGPUsmonths to hours enables practical energy calculationsmonths to hours enables practical energy calculations application of the model in research evaluation of Monte-application of the model in research (S d 300 )
evaluation of Monte(Speedup: 300x) Speedups of up to( p p ) Speedups of up to
[1] C Braun M Daub A Schöll G Schneider and H J Wunderlich "Parallel Sim[1] C. Braun, M. Daub, A. Schöll, G. Schneider, and H.-J. Wunderlich, Parallel Simi P IEEE I t ti l C f Bi i f ti d Bi di i BIBin Proc. IEEE International Conference on Bioinformatics and Biomedicine, BIB
[2] C Braun S Holst J Castillo J Groß and H J Wunderlich "Acceleration of[2] C. Braun, S. Holst, J. Castillo, J. Groß, and H.-J. Wunderlich, Acceleration ofi P 30th IEEE I t ti l C f C t D i ICCD'12 Min Proc. 30th IEEE International Conference on Computer Design, ICCD'12, Mo
able Heterogeneous Architecturesable Heterogeneous Architecturesable Heterogeneous Architecturesoachim Wunderlichoachim Wunderlich
E i i U i it f St tt t Ger Engineering, University of Stuttgart, Germanyg g, y g , y
ChallengesChallengesgMapping of simulation applicationsMapping of simulation applications
− Simulation applications contain algorithms from distinct problem classesSimulation applications contain algorithms from distinct problem classesDense
Spectral StructuredSparseLinear
SpectralM h d
Structured Grids U t t d
pLinearAlgebra
MethodsN‐Body
Grids Unstructuredd
Linear Algebra
gAlgebra N BodyGrids
gAlgebra
Many different programming paradigms and languages− Many different programming paradigms and languages
Achieving optimal performanceAchieving optimal performance
P i l f d l i h d b id d− Particular aspects of underlying hardware structures must be consideredPerformance depends on the combination of implementation and utilized− Performance depends on the combination of implementation and utilized architecturearchitecture
CPU
GPUOptimal
GPUMapping?pp g
FPGA
Simulation Mapping Heterogeneous Reliability Applications
Mapping gArchitecturesReliability Applications Architectures
Simulation applications are executed for days and months− Simulation applications are executed for days and months− Fault-tolerant execution memories and communication requiredFault tolerant execution, memories and communication required
Current WorkCurrent WorkImplementation of specific compute modulesin Decomposition for Implementation of specific compute modules in Decomposition for for heterogeneous architecturesWave Equation to for heterogeneous architecturesWave Equation to
A hit t − Modules perform basic tasks in targetedus Architectures Modules perform basic tasks in targeted h M D b G S h id simulation applicationsh M. Daub • G. Schneider ppnonlinear wave equation − Implementation in distinctive realizations fornonlinear wave equation Implementation in distinctive realizations for
diff t t f tiwith a huge amount of different types of computing resourceswith a huge amount of
− Goal: Abstract and hide the actual execution f d l f th i l ti li ti
3d u n u+ of modules from the simulation applicationc cd u n u− +
n solver to reconfigurablen solver to reconfigurable CPU CPU CPU CPUFFT
hitecturesCPU CPU CPU CPUFFT
hitecturesMM
GPU GPU GPU GPUMapping
CFD
d Collaborations FPGA FPGA FPGA FPGA…d CollaborationsSimulation Modules Heterogeneous
Markov-Chain Monte- Applications ArchitectureMarkov-Chain Monte- pp
ar Simulations onDevelopment of resource management and
ar Simulations on t A hit t [2] Development of resource management and ter Architectures[2]
load-balancing mechanismsload balancing mechanismsstillo • J. Groß • H.-J. Wunderlich
− Investigation of efficient module organizationInvestigation of efficient module organization and execution strategieste-Carlo (MCMC) gte Carlo (MCMC)
f t k i − Goal: Optimization of simulation performancecore of many tasks in Goal: Optimization of simulation performance d d ti t h i t ditiand adaption to changing system conditions
during runtimeerial dependencies make during runtime erial dependencies make rd to parallelizerd to parallelize
GPUCPU …
ined parallelization andined parallelization and ation instances on clustersation instances on clusters i li d
FFT MM CFD
is appliedDevelopment of fault tolerance measures for
ppDevelopment of fault-tolerance measures for mulation has been modulesl iti ll l moduless, exploiting parallel
− Repeated or redundant execution of modulesand speculative − Repeated or redundant execution of modules and speculative and comparison of results-Carlo moves pCarlo moves
− Algorithm-based fault-tolerance400x Algorithm based fault tolerance 400x
mulation of Apoptotic Receptor Clustering on GPGPU Many Core Architectures"mulation of Apoptotic Receptor-Clustering on GPGPU Many-Core Architectures ,BM 2012 Phil d l hi USA 4 7 O t b 2012 155 160BM 2012, Philadelphia, USA, 4.-7. October, 2012, pp. 155-160.f Monte Carlo Molecular Simulations on Hybrid Computing Architectures"f Monte-Carlo Molecular Simulations on Hybrid Computing Architectures ,
t l C d 30 9 3 10 2012 207 212ontreal, Canada, 30.9.-3.10.2012, pp. 207-212.