System Simulation Of 1000-cores Heterogeneous SoCs
description
Transcript of System Simulation Of 1000-cores Heterogeneous SoCs
![Page 1: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/1.jpg)
System Simulation Of 1000-cores Heterogeneous SoCs
Shivani RaghavEmbedded System Laboratory (ESL)
Ecole Polytechnique Federale de Lausanne (EPFL)
![Page 2: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/2.jpg)
ESL Work on Energy-Aware Datacenter Design
2
System Simulation for many-core
SIMinG-1k
communic.
now
Load profile Nw
Datacenter infrastructure
IPS
IPS
Load profile 1, 2 and 3
IPS
PMSM: Power/Therm. Manager
New server cooling tech.
network
now
Load profile 1
w$
now
Price profile 1
Internet
Grid
IPS
IPS
IPS
$
now
Price profile N
![Page 3: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/3.jpg)
Emerging Data-Intensive Workloads
Cloud Servers
Molecular Dynamics
Monte CarloSimulations
Gene Sequencing
Online Gaming Services
Financial SimulationsMedical Imaging
![Page 4: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/4.jpg)
Demand for Hardware Acceleration
Tile based ManycoresIntel SCC, Tile 64(Integrated)
GPU Clusters (off –chip
Accelerators)
Hybrid Cores AMD Fusion (on-chip)
![Page 5: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/5.jpg)
Urgent Need for Simulation of Heterogeneous SoCs
Thermal& Power
Evaluations
BenchmarkingProfiling
Debugging
Design Space Exploration
Early Software
Development
Simulation
![Page 6: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/6.jpg)
How to Design a Fast and Scalable Many-Core
Simulator?Parallel Target
Parallel Simulator
Parallel Host
![Page 7: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/7.jpg)
Simulating Parallel Target on Parallel Hostis an Old Technology…
FPGA GPGPU
FlexusRAMP Opportunity
WWT IIGraphite
Cotson, OVPSim
Large ParallelSystems
![Page 8: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/8.jpg)
Target ArchitectureData-Parallel Coprocessors
Simple In-order Cores
1000s of cores in a tile network
Fine grain parallelism
Core
Caches
Memory
Switch
![Page 9: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/9.jpg)
Solution – Accelerating Simulation using GPGPUs
Target Architecture Host Platform
APerfectMatch
![Page 10: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/10.jpg)
Outline
• Problem Overview Simulation of Heterogeneous SoCs
• SolutionSIMinG-1k: A GPU accelerated
simulator• Evaluation• Summary
![Page 11: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/11.jpg)
Overall Simulation Framework
Host Platform
Sequential Code
Data Parallel Code
Simulator SIMinG-1k
TargetArchitecture
General Purpose
CPU
Many-Core Accelerator
Application
![Page 12: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/12.jpg)
SIMinG-1k - Features
• Instruction Accurate
• Inexpensive and Easily Available
• Fast Development Cycle
• Equation Performance Model
• Portability (Target Independent)
• Interpretation based core-simulation
![Page 13: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/13.jpg)
Challenges of using GPU as a host
• SIMT (Single inst multiple threads)• Divergent Code is a problem• Synchronization outside thread block• Slow CPU-GPU communication• Global Memory is slow and limited
![Page 14: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/14.jpg)
Outline
• Problem Overview Simulation of Heterogeneous
SoCs• Solution
SIMinG-1k (GPU accelerated simulator)• Evaluation• Summary
![Page 15: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/15.jpg)
Results – Architecture 1MIPS - Number of simulated instruction in host wall clock time
ARM
ISA
Data Scratchpad
Single tile of target AcceleratorInst Scratchpad
128 256 512 1024 2048 40960
100
200
300
400
500
600
700
MMNCCIDCTEPCCDQFFTSYNC1
Number of Simulated Cores
S-M
IPS
![Page 16: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/16.jpg)
Speed Up – Architecture 1
32 64 128 256 512 1024 2048 40960
200
400
600
800
1000
SIMinG-1k
OVP
# Simulated Cores
MIPS
Matrix Multiply
Speedup compared to simulation on OVPSim (thousands of ARM cores)
![Page 17: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/17.jpg)
Single tile of Data-parallel Accelerator(cores, caches, on-chip interconnect)
Results – Architecture 2
Core
Caches
Memory
Switch
128 256 512 1024 2048 409605101520253035404550
0.180 0.077 0.026 0.006 0.002 0.001
NCCMMIDCTDQFFTEPCCSYNC1
Number of Simulated Cores
S-M
IPS
![Page 18: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/18.jpg)
Speed Up – Architecture 2Speedup compared to serial simulation on QEMU
![Page 19: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/19.jpg)
Outline
• Problem Overview Simulation of Heterogeneous
SoCs• Solution
SIMinG-1k (GPU accelerated simulator)• Evaluation• Summary
![Page 20: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/20.jpg)
Conclusion Challenge
Fast and parallel simulator for heterogeneous SoCs Solution
Parallelize 1000 core simulation using GPUs Design
Full System Simulation using QEMU and SIMinG-1k Results
High Scalability and speedup upto 4096 cores
Extend the simulator for thermal and power evaluations Complete simulation of Cloud Data Centers
Future Work
![Page 21: System Simulation Of 1000-cores Heterogeneous SoCs](https://reader036.fdocuments.in/reader036/viewer/2022081507/56816222550346895dd24d52/html5/thumbnails/21.jpg)
Thanks!
Questions?