Final version is available
-
Upload
flashdomain -
Category
Documents
-
view
783 -
download
1
Transcript of Final version is available
Architectures and Compilers for Embedded Systems (ACES) Laboratory Center for Embedded Computer Systems
University of California, [email protected]
http://www.cecs.uci.edu/~dutt
Architectural Exploration for Programmable Embedded Systems
With Contributions from the EXPRESSION team: Peter Grun, Ashok Halambi, Nick Savoiu, Radu Cornea, Prabhat Mishra, Aviral Shrivastava, Partha
Biswas, Srikanth Srinivasan, Ilya Issenin, Marcio Buss, Dr. Hiroyuki Tomiyama, and Prof. Alex Nicolau
Work Partially Supported by NSF, ONR, and DARPA
Nikil D. Dutt
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 2
Outline
Methodology for Architectural Exploration
Survey of Architectural Description Languages (ADLs)
Software Toolkit Generation
Architectural Exploration
Summary and Conclusions
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 3
Traditional Processor-Centric Designs
Performance driven designs Limitations:
from application: only limited by available parallelism from architecture: widening processor-memory gap => memory bottleneck
Solution: expose maximally the available parallelism in application (compiler) devise memory hierarchy to exploit effectively this parallelism
Can increase performance by explicit exploitation of available parallelism implicit exploitation of parallelism to mask operations and memory latencies
Match processor architecture w/ memory configuration for application suite(s)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 4
Embedded System-on-Chip (SOC) Designs
One or few dedicated applications Opportunity to customize design
Diverse requirements (Real-time) performance, power, data/code density, testability,….
Approach: aggressively exploit application behavior: Use coarse-grain and fine-grain compiler techniques Evaluate different architectures and memory organizations
Need for exploration capability without loss of efficiency Rapid software toolkit generation (compiler, simulator, debugger,...)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 5
Embedded S-O-C Design Issues
Technology Trends 1G transistor chips by ~2010 (SIA Roadmap) Faster processors => Migration of functionality from HW to SW Reconfigurable logic => SW DRAM merged with logic (plus analog, RF, etc.)
Market Trends Shrinking time-to-market Design Reuse
Componentization, decreasing time between design starts Product “versioning”
New standards, but unique implementations (e.g., Bluetooth, G3)
Result: Intense pressure to rapidly innovate, explore, and differentiate, while meeting
complex design contraints
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 6
What to do with all these transistors?
New processor architectures E.g., Ultra Large Instruction Word Machines (i.e., VLIW-like) Aggressive use of compiler technology (speculation, sophisticated disambiguation)
Multiprocessors on a chip Heterogeneous processors tuned for specific tasks/functions Enhanced compiler technology for better communication/synchronization Integration of OS/Multithreading
Novel memory organizations and hierarchies Different types of on-chip memories: multiple cache hierarchies, frame buffers,
stream buffers, etc. Need “memory-aware”compiler, and processor-memory coexploration
RESULT: Software issues WILL dominate, requiring rapid generation of software toolkits to support design
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 7
Programmable Embedded Systems: Boards to SOCs
Past Board-level IC’s
Present System-on-a-chip (SOC) and IP “cores” Core types
Hard: layout Firm: structural HDL Soft: RT-synthesizable HDL
Processor Memory Peripheral
Board
Peripheral Mem
Processor
IP cores
Core libraryPeripheralA
PeripheralB
ProcessorX
SOC[Source: F. Vahid]
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 8
Networked Embedded System
Power Supply
Bat
tery
DC-DCConverter
Communication
RadioModem
RFTransceiver
Processing
ProgrammablePs & DSPs
(apps, protocols etc.) Memory
ASICs
Peripherals
Disk Display
Signaling protocols, choice of modulation, TX/RX architecture, RF/IF circuits
Baseband DSP
[Courtesy: R. Gupta]
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 9
Programmable SOC Platforms
Domain-specific Parameterized Cores Sample Parameters:
Voltage scale Size, line, associativity Bus width, encoding (gray,
invert) UART tx/rx buffer size DCT resol.
Configurations impact power/performance
[Source: T. Givargis]
UART
MIPSI-Cache
D-Cache
Bridge
Peripheral Bus
DCT CODEC
Memory
DMA
System-on-a-Chip (SOC)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 10
Why Explore Architectures?
JPEG
0
200
400
600
800
1000
1200
1400
0 200 400 600 800 1000 1200 1400 1600 1800
Execution time (usec)
Po
wer
(u
W)
5.10x exe.7.51x power2.73x energy
[Source: T. Givargis]
Example: JPEG implemented on prog. SOC platform
Tremendous Variation in Power/Performance!
Variations:
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 11
Philips Velocity SoC Platform
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 12
Configurable Processor Platform : Tensilica Xtensa
MMU
ALU
Pipe Cache
I/O
Timer
Register File
Controller
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 13
Fixed Programmable SOC Template
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 14
Programmable Architectural Trends
Recent advances in System-On-Chip Technology customizable processor cores, coprocessors, multiple processors on SOC novel on-chip/off-chip memory hierarchies, heterogeneous memory organizations mixed memory/logic fabrication (on-chip DRAM)
Customization of SOC architectures for specific embedded applications/tasks.
Software content of SOCs increasing rapidly
Tune SOC for diverse goals: power, code size, area, ...
Shrinking time-to-market + short product lifetimes
Need: rapidly evaluate SOC architectures Design Space Exploration (DSE)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 15
Architecture-Compiler Coupling
Parameters:no, size of unitsno, size, ports of reg filescachesmemory hierarchy
Instruction Set Definition:basic instructionssub-word parallelismapplication-specific instructionscache control instructions…. ….
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 16
Compiler-Architecture-CAD Coupling
Parameters:no, size of unitsno, size, ports of reg filescachesmemory hierarchy
Instruction Set Definition:basic instructionssub-word parallelismapplication-specific instructionscache control instructions…. ….
Tasks:estimate global memoryidentify bottlenecksreduce memory traffic
….partition and organize memories
Hardware/Software Partitioning
Memory-related Optimizations
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 17
Programmable Arch’s: Traditional Design Flow
DesignSpecification
Hw/Sw Partitioning
Off-ChipMemory
ProcessorCore
On-ChipMemory
SynthesizedHW Interface
HWVHDL, Verilog
SWC
Synthesis Compiler
Cosimulation
Estimators
- Application-to-architecture mapping
- Early HW/SW partitioning
- Ensuing tasks of synthesis, SW compilation
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 18
Programmable Arch’s: Traditional Design Flow
DesignSpecification
Hw/Sw Partitioning
Off-ChipMemory
ProcessorCore
On-ChipMemory
SynthesizedHW Interface
HWVHDL, Verilog
SWC
Synthesis Compiler
Cosimulation
Estimators Issues:
-- Multiple specificationsFunctional, IS, RT (synthesis)
-- Software after Hardware
-- Limited Exploration Spaceneed compiler/simulator in-the-loop
-- Consistency and Validation
-- Verification and Testing
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 19
Traditional Design Flow
DesignSpecification
Hw/Sw Partitioning
Off-ChipMemory
ProcessorCore
On-ChipMemory
SynthesizedHW Interface
HWVHDL, Verilog
SWC
Synthesis Compiler
Cosimulation
Estimators
Predefined Architectural Model
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 20
IP-Centric Design Flow
Increasing use of IP blocks COTS => IP, Soft/Hard IP blocks Processor Core Families
RISC, DSP, VLIW, ASIPs: many attributes parametrizable
Custom Memory Configurations Special-purpose HW blocks
(video/audio compression/decompression engines, encryption engines, etc.)
Design Reuse Leveraged through predesigned, preverified blocks Customization, adaptation
Reduce time-to-market Key Bottleneck: lack of software tools to support use of IP Again, urgent need to rapidly generate optimized software toolkits
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 21
Main Bottleneck
SOC Customization with IP Blocks COTS: SW tools available (already developed) IP Blocks: no support tools, huge time lag until SW tools are
generated/modified
Need rapid generation of SW toolkit for Embedded SOC (compilers, simulators, debuggers, etc.)
Language-Based Design Methodology for Embedded SOC Application=> Specification Language Architecture=> ADL (drives SW tools generation)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 22
ADL-Driven Design Flow
DesignSpecification
Hw/Sw Partitioning
Off-ChipMemory
ProcessorCore
On-ChipMemory
SynthesizedHW Interface
HWVHDL, Verilog
SWC
Synthesis Compiler
Cosimulation
Estimators ADLSpecification
P1
M1P2
IP Library
Verification
Rapid design space exploration
Quality tool-kit generation
Design reuse
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 23
Outline
Methodology for Architectural Exploration
Survey of Architectural Description Languages (ADLs)
Software Toolkit Generation
Architectural Exploration
Summary and Conclusions
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 24
Specify architecture templates of SOCs
Blocks/components which reside on the SOC How they are connected or interact Functionality of each component
Support Automated SW toolkit generation
ILP compilers Simulators (instruction-set-, cycle-, phase-accurate) Debuggers Real-time OSs
Verification / Validation
Architecture Description Languages
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 25
ADL-Based SOC Codesign Flow
ProcessorProcessorASICs
MemoriesIFs
ASICs
MemoriesIFs
Cosimulation
HW SW
HW/SW Partitioning
Synthesis Compiler
Application
ProcessorsASICs
MemoriesIFs
Interconnection
System on Chip
SpecifySynthesize
IPLibrary
Verify/Validate
GenerateADL Specification
Estimator
Reuse
EstimateModify
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 26
Survey of ADLs
Classification Based on Type of Information Captured
Behavior-centric ADLs Structure-centric ADLs Mixed-level ADLs
Classification Based on Their Main Objective
Synthesis-Oriented ADLsCompiler-Oriented ADLs Simulation-Oriented ADLs Validation-Oriented ADLs
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 27
Behavior Centric ADLs
Primarily capture Instruction Set (IS) Provide programmer’s view Organized in a hierarchical manner for
conciseness
Advantages: Capture easily available information Good for regular architectures
Disadvantages: Tedious for irregular architectures Hard to specify pipelining Contain an implicit architecture model
Instruction-Set
Arithmetic Operations:
Addition
…………………..
Memory Operations:
…………………..
…………………..
Constraints:
……………………
Examples:
nML, ISDL, ValenC, CSDL
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 28
Structure Centric ADLs
Provide net-list view of the architecture
Advantages: Common specification for both
software toolkit generation and hardware synthesis
Can capture detailed pipelining information
Disadvantages: Hard to extract IS view
Instruction-Set
Arithmetic Operations:
Addition
…………………..
Memory Operations:
…………………..
…………………..
Constraints:
……………………
Examples:
MIMOLA, COACH
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 29
Mixed-Level ADLs
Capture Instruction Set viewCapture high-level architecture view
Combine benefits of both
Advantages: Common specification for both
software toolkit generation and hardware synthesis
Can validate/verify structure versus behavior (and vice-versa)
Disadvantages: May require specification of redundant information
Instruction-Set
Arithmetic Operations:
Addition
…………………..
Memory Operations:
…………………..
…………………..
Constraints:
……………………
Examples:
MDes, LISA/RADL, EXPRESSION
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 30
Survey of ADLs
Classification Based on Type of Information Captured
Behavior-centric ADLs Structure-centric ADLs Mixed-level ADLs
Classification Based on Their Main Objective
Synthesis-Oriented ADLsCompiler-Oriented ADLs Simulation-Oriented ADLs Validation-Oriented ADLs
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 31
Synthesis-Oriented ADLs
ProcessorProcessorASICs
MemoriesIFs
ASICs
MemoriesIFs
Cosimulation
HW SW
HW/SW Partitioning
Synthesis Compiler
Application
ProcessorsASICs
MemoriesIFs
Interconnection
System on Chip
Synthesize
IPLibrary
Verify/Validate
GenerateADL Specification
Estimator
Reuse
EstimateModify
Enable early synthesis of architectures
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 32
Synthesis-Oriented ADLs
MIMOLA (Univ. of Dortmund, Germany)Synthesizable HDL Mainly targeted to DSPs with tightly constrained datapaths Used in the MSSQ and RECORD compiler systems Capture the structure (RT-level netlist) of the target processor Behavior (instruction set) is automatically extracted ILP constraints are automatically detected
COACH (Kyushu Univ., Japan) CAD system for ASIPs Mainly targeted to simple RISC processors without ILP Use the UDL/I HDL for processor description Capture the structure Behavior is automatically extracted Generate compilers and instruction-set simulators
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 33
Synthesis-Oriented ADLs
Summary Synthesis and simulation tools available Capture only the structural aspect (RT-level netlist) of the processors Low abstraction level => not suited to early and rapid DSE of SOCs Behavior extraction and compiler generation are successful for a limited class of
processor architectures
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 34
Compiler-Oriented ADLs
ProcessorProcessorASICs
MemoriesIFs
ASICs
MemoriesIFs
Cosimulation
HW SW
HW/SW Partitioning
Synthesis Compiler
Application
ProcessorsASICs
MemoriesIFs
Interconnection
System on Chip
Synthesize
IPLibrary
Verify/Validate
GenerateADL Specification
Estimator
Reuse
EstimateModify
Support automatic generation of compilers
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 35
Compiler-Oriented ADLs
nML (TU Berlin, Germany) Mainly targeted to DSPs and ASIPs Generate compilers, instruction-set simulators, and assemblers at TU Berlin,
IMEC, Cadence, etc. Capture the behavior (instruction set) of the processors as an attribute grammar ILP constraints are described in a form of a set of legal combinations of
operations
ISDL (MIT, USA) Mainly targeted to VLIW processors Generate compilers, assemblers, and cycle-accurate simulators Capture the behavior ILP constraints are described in a form of a set of Boolean rules all of which
must be satisfied Can be translated to synthesizable Verilog code
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 36
Compiler-Oriented ADLs
MDES (HPLabs & UIUC, USA) Used for design space exploration of high-performance processors in the
Trimaran system Generate compilers and cycle-accurate simulators Retargetability of cycle-accurate simulators are limited to the HPL-PD
processor family Mainly captures the behavior (instruction set) ILP constraints are described in a form of reservation tables
EXPRESSION (UC Irvine, USA) Targeted to a wide range of architectures (e.g., RISC, VLIW, SS, DSP) Generate compilers and cycle-accurate simulators Capture both the behavior and the structure (high-level netlist) Models complex memory organizations/hierarchies ILP constraints are automatically detected through reservation tables Graphical front-end for specification and analysis
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 37
Compiler-Oriented ADLs
Other Compiler-Oriented ADLs The FlexWare CAD system supporting compiler and simulator generation for
DSPs and ASIPs (TIMA, France) The Valen-C compiler system supporting bit-width optimization of RISC-like
ASIPs (Kyushu Univ., Japan) The Zephyr compiler system supporting development of custom compilers
(Univ. of Virginia, USA)
Summary In most compiler-oriented ADLs, the behavior of the target processor is mainly
captured. In addition, manual description of ILP constraints is need for ILP scheduling.
EXPRESSION captures both the behavior and the structure, enabling automatic detection of ILP constraints
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 38
Simulator-Oriented ADLs
ProcessorProcessorASICs
MemoriesIFs
ASICs
MemoriesIFs
Cosimulation
HW SW
HW/SW Partitioning
Synthesis Compiler
Application
ProcessorsASICs
MemoriesIFs
Interconnection
System on Chip
Synthesize
IPLibrary
Verify/Validate
GenerateADL Specification
Estimator
Reuse
EstimateModify
Support automatic generation of simulators
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 39
Simulator-Oriented ADLs
LISA (RWTH Aachen, Germany) Mainly targeted to DSPs Generate bit-true cycle-accurate compiled simulators Explicit support for modeling pipeline behaviors such as interlocking,
bypassing, stalls, flushes, etc. No support for compiler generation
RADL (Rockwell Semiconductor, USA) Extension of the LISA approach Mainly targeted to DSPs Generate phase-accurate simulators Explicit support for modeling delay slots, interrupts, zero-overhead loops,
hazards and multi-pipelines in addition to features of LISA No support for compiler generation
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 40
Simulator-Oriented ADLs
Summary Capture both the structural and architectural aspect of the processors Explicit support for modeling pipeline behaviors such as stalls and flushes No explicit support for ILP compiler generation
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 41
Validation-Oriented ADLs
ProcessorProcessorASICs
MemoriesIFs
ASICs
MemoriesIFs
Cosimulation
HW SW
HW/SW Partitioning
Synthesis Compiler
Application
ProcessorsASICs
MemoriesIFs
Interconnection
System on Chip
Synthesize
IPLibrary
Verify/Validate
GenerateADL Specification
Estimator
Reuse
EstimateModify
Enable early verification/validation of architectures
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 42
Validation-Oriented ADLs
AIDL (Univ. of Tsukuba, Japan) Targeted to high-performance superscalar processors Describe timing behavior of pipelines (e.g., data-forwarding, out-of-order
completion, etc.) using temporal logic The timing behavior is validated/verified through simulation No support for SW toolkit generation Can be translated to synthesizable VHDL code
Summary Limited previous work Few properties can be validated No support for SW toolkit generation
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 43
Future Directions for ADLs
Formal Verification Detection of pipeline conflicts (resource, data, and control conflicts) Consistency checking between the behavior and the structure
SOC Architecture Synthesis from ADL SpecificationsAutomatic Generation of Real-Time OSs
Optimization of task scheduling, interrupt handling, memory management, etc. IP Libraries
Standard mechanisms to specify SOC architectures Standard mechanisms to encapsulate design attributes such as performance,
power consumption, feature size, etc.)Support for Future SOC Architectures
Heterogeneous multi-processors with multi-threaded architectures On-chip memory hierarchies with various memory types (e.g., DRAM, flash
memories, etc.) On-chip reconfigurable devices
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 44
Outline
Methodology for Architectural Exploration
Survey of Architectural Description Languages (ADLs)
Software Toolkit Generation
Architectural Exploration
Summary and Conclusions
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 45
Software Toolkits for Processor Cores
SOC designers using processor cores. Major bottleneck: lack of supporting software tools (compiler, simulator, …) Traditionally: toolkit built at later stages of system design Design Space Exploration meaningless w/o toolkit support
Solution: Generate Toolkit from a Target machine specification Architecture Description Language (ADL) used to define architectural template ADL is used to drive generation of compiler, simulators, validation/verification, and
synthesis Approach allows compiler-in-the-loop architectural exploration
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 46
Objectives: Support automated SW toolkit generation
exploration (through parametrization & generality) production quality SW tools (cycle-accurate simulator, memory-aware compiler..)
Specify from a variety of architecture classes (VLIWs, DSP, RISC, ASIPs…)
Specify novel memory organizations
Specify pipelining and resource constraints
Architecture Description Languages (ADLs)
ArchitectureArchitecture
DescriptionDescription
FileFile
Compiler
Simulator
Synthesis
Architecture ModelADL
Compiler
Formal Verification
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 47
Software Tools
Estimators Code Size, Memory Requirements, Performance, Power etc.
Compilers Coarse-grain (task-level) and ILP (microarchitecture-level)
Assembler, Linker, Loader
Profiler, Debugger, Code Development Environment
Simulators Bus-functional, instruction-, cycle-, and phase- accurate, structural
Real Time Operating Systems (RTOS)
Validation/Verification
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 48
Software Tools
Estimators Code Size, Memory Requirements, Performance, Power etc.
Compilers Coarse-grain (task-level) and ILP (microarchitecture-level)
Assembler, Linker, Loader
Profiler, Debugger, Code Development Environment
Simulators Bus-functional, instruction-, cycle-, and phase- accurate, structural
Real Time Operating Systems (RTOS)
Validation/Verification
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 49
Compiler Issues for Embedded SOC
Traditional ES Software Handcoded in assembly
Poor code quality from compilers Idiosyncratic architectural features (specialized IS, register banks, etc.)
Embedded SOC Widely heterogeneous, customized processors Multiple levels of parallelism Complex, non-traditional memory organization/hierarchy Complex constraints (hard RT, code size, power, cost,…)
Embedded SOC Software Cannot do handcoding Need powerful retargetable compiler technology Must fully exploit unique/non-traditional IS or architecture features Compiler is CRITICAL for Embedded SOC
Compiler Issues for Embedded SOC
Language-driven Software Toolkit Generation
Architectural Exploration of Embedded SOC
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 50
Compiler as an Exploration Tool
Analysis Phase of Compiler: Estimation Memory size parallelism resources
“Fast” Compiler Algorithms to Evaluate Tradeoffs on-chip parallelism vs. memory effect on speed, power, code size
“Fast” Simulator to evaluate architectural modifications/enhancements Customized instructions customized units data path size (bitwidth) customized memory organization/hierarchy
Compiler Critical for Embedded SOC exploration
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 51
Retargetable Compilers
ProcessorProcessorASICs
MemoriesIFs
ASICs
MemoriesIFs
Cosimulation
HW SW
HW/SW Partitioning
Synthesis Compiler
Application
ProcessorsASICs
MemoriesIFs
Interconnection
System on Chip
Synthesize
IPLibrary
Verify/Validate
GenerateADL Specification
Estimator
Reuse
EstimateModify
Automatic generation of compilers from ADLs
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 52
Retargetable Compilers
Issues: Produce efficient code for a wide variety of processor architectures
DSP, VLIW, RISC, Superscalar Multi-processor/Multi-threaded architectures
Need efficient code optimization techniques ILP, Predicated Execution Techniques for novel instruction-sets, architectures
Multimedia instructions, cache control instructions Specialized addressing modes, specialized functional units
Need dynamic phase ordering capability
Produce code that satisfies varied constraints Instruction Memory size, Data Memory size Power, Performance
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 53
Lexical Analysis
Semantic Analysis
Compiler Flow (Front-End)
Analysis:Data dependenceArray, PointerLoop
Memory/Power:Estimation
Loop/Arrayoptimizations
Parallelization
Task-level
Loop-level
Program
High-level IR
High-level IR
Multi-processor/Multi-threading Info.
Memory Subsystem/Power Info.
ADL
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 54
Lowering:
Complex Expressions, Array Subscripts
Compiler Flow (Back-End)
Pre-scheduling optimizationsDead code removal,Induction Variable EliminationPartial Redundancy Elimination, …..
Memory/Power:Initial memory assignmentData-Cache OptimizationsLoop blocking, skewing, etc.
Transformations:
Software Pipelining
Instruction Selection Register Allocation
Scheduling (ILP)
High-Level IR
Medium-level IR
Low-level IR
Memory Subsystem/Power Info.ADL
Optimizations:
Tree Height Reduction
Strength Reduction
Spill code optimization
Memory Subsystem/Resource Info.
Operation Behavior
Register File Info.
Pipeline Conflicts/Constraints Info.
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 55
Post-scheduling optimizations:
Peephole Optimizations
Machine Specific Optimizations
Compiler Flow (Back-End)
Memory/Power:Block reorderingInstruction-Cache OptimizationsFinal memory assignment
Low-Level IR
Low-level IR
Memory Subsystem/Power Info.ADL
Code Generation
Object Code
Operation Format/Image Info.
InterProcedural:Register AllocationCall convention implementationGlobal references aggregation
Call Convention/Register Info.
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 56
Retargetable Compilers Survey (1)
CHESS (using nML ADL) Mainly targeted to fixed-point DSPs and ASIPs Performs instruction selection, register allocation, and scheduling. Fixed phase ordering ILP constraints described as a set of legal combinations of operations
AVIV (using ISDL ADL) Mainly targeted to VLIW processors Optimizes for minimal code size Branch-and-bound techniques for concurrent scheduling, resource allocation ILP constraints described as a set of Boolean rules which must be satisfied
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 57
Retargetable Compilers Survey (2)
ELCOR (using MDES ADL) Mainly targeted to VLIW architectures with speculative execution Used for design space exploration of high-performance processors in the
Trimaran system ILP constraints are explicitly described as reservation tables
EXPRESS (using EXPRESSION ADL) Targeted to a wide range of processor architectures such as
RISC, VLIW, Superscalar, and DSP Mutation-Scheduling based dynamic phase ordering capability ILP constraints are automatically detected using reservation tables
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 58
Retargetable Compilers Survey (3)
Other Retargetable Compilers The FlexWare CAD system
Supports compiler generation for DSPs and ASIPs (TIMA, France)
The Valen-C compiler system Supports bit-width optimization of RISC-like ASIPs (Kyushu Univ., Japan)
The Zephyr compiler system Supports development of custom compilers (Univ. of Virginia, USA)
SUIF Compiler Infrastructure Open compiler insfrastructure (Stanford Univ., USA)
Other Efforts discussed at this workshop Dortmund, EPFL, IITB, IITD, ...
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 59
Software Tools
Estimators Code Size, Memory Requirements, Performance, Power etc.
Compilers Coarse-grain (task-level) and ILP (microarchitecture-level)
Assembler, Linker, Loader
Profiler, Debugger, Code Development Environment
Simulators Bus-functional, instruction-, cycle-, and phase- accurate, structural
Real Time Operating Systems (RTOS)
Validation/Verification
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 60
Simulators/Simulator Generators
ProcessorProcessorASICs
MemoriesIFs
ASICs
MemoriesIFs
Cosimulation
HW SW
HW/SW Partitioning
Synthesis Compiler
Application
ProcessorsASICs
MemoriesIFs
Interconnection
System on Chip
Synthesize
IPLibrary
Verify/Validate
GenerateADL Specification
Estimator
Reuse
EstimateModify
Support automatic generation of simulators
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 61
Simulators/Simulator Generators
Issues: Level of abstraction
Functional (no timing information)
Cycle-accurate (cycle level timing information)
Bit-, Phase-accurate (detailed timing information)
Simulation model Interpretation based(easy to generate, flexible but slower)
Compilation based (fast but not very flexible) Static compiled simulation Dynamic compiled simulation
Interoperability (the ability to integrate with other tools)
Ability to simulate a wide variety of architectures
Faster, less detail
Slower, more detail
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 62
Simulators/Simulator Generators Survey (1)
GENSIM/XSIM (using ISDL ADL) Mainly targeted to VLIW architectures Generate cycle-accurate, bit-true Instruction Level Simulator Interpretation based, but perform disassembly off-line to improve speed Used for architecture evaluation
SIMPRESS (using EXPRESSION ADL) Targeted to wide range of processor architectures such as
RISC, VLIW, Superscalar, and DSP Generate cycle-accurate, structural simulator Interpretation based. Used for design space exploration and architecture evaluation
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 63
Simulators/Simulator Generators Survey (2)
LISA/S (using LISA ADL) Mainly targeted to DSPs Generate bit-true, cycle-accurate, static compiled simulators Explicit support for modeling pipeline behaviors such as
interlocking, bypassing, stalls, flushes, etc.
RADL (Rockwell Semiconductor, USA) Extension of the LISA approach Mainly targeted to DSPs Generate phase-accurate simulators Explicit support for modeling delay slots, interrupts, zero-overhead loops,
hazards and multi-pipelines in addition to features of LISA
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 64
Simulators/Simulator Generators Survey (3)
Other Retargetable Simulators/Simulator Generators: HPL-PD simulator (using the MDES ADL)
Limited retargetability in the form of parameters such as number of FUs, etc.
MIMOLA ADL Convert the processor description into a simulatable HDL model
Insulin Uses a VHDL model of a generic parameterizable machine
Several Commercial Offerings Axys, Lisa, Vast,….
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 65
Software Tools
Estimators Code Size, Memory Requirements, Performance, Power etc.
Compilers Coarse-grain (task-level) and ILP (microarchitecture-level)
Assembler, Linker, Loader
Profiler, Debugger, Code Development Environment
Simulators Bus-functional, instruction-, cycle-, and phase- accurate, structural
Real Time Operating Systems (RTOS)
Validation/Verification
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 66
ADL-driven Validation/Verification
ProcessorProcessorASICs
MemoriesIFs
ASICs
MemoriesIFs
Cosimulation
HW SW
HW/SW Partitioning
Synthesis Compiler
Application
ProcessorsASICs
MemoriesIFs
Interconnection
System on Chip
Synthesize
IPLibrary
Verify/Validate
GenerateADL Specification
Estimator
Reuse
EstimateModify
Support validation/verification of architecture spec and implementation
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 67
Bottom-up Validation Approach
RTLRTL
ReverseReverse
EngineeringEngineering
High LevelHigh Level
DescriptionDescription
ManualManual
VerificationVerification
PropertyProperty
CheckingChecking
PropertyProperty
CheckingChecking
SpecificationSpecification
(English Document)(English Document)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 68
ADL-driven Validation
RTLRTL
ReverseReverse
EngineeringEngineering
High LevelHigh Level
DescriptionDescription
ManualManual
VerificationVerification
PropertyProperty
CheckingChecking
PropertyProperty
CheckingChecking
PropertyProperty
CheckingChecking
SpecificationSpecification
(English Document)(English Document)
ADL Description inADL Description in
EXPRESSIONEXPRESSION
High LevelHigh Level
DescriptionDescription
EquivalenceEquivalence
CheckingChecking
PropertyProperty
CheckingChecking
RTLRTLEquivalenceEquivalence
CheckingChecking
Ref: papers from EXPRESSION group at HLDVT99-01, VLSI02, DATE02 (Mishra et al.)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 69
Outline
Methodology for Architectural Exploration
Survey of Architectural Description Languages (ADLs)
Software Toolkit Generation
Architectural Exploration
Summary and ConclusionsEXPRESSION ADL Toolkit/Framework
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 70
Memory Libraries
Cache SRAM
PrefetchBuffer
Frame Buffer
EDO
On-chipRD RAM
SD RAM
VLIW
DSP
ASIPToolkit
Generator
Toolkit Generator
SIMPRESS
EXPRESS
SIMPRESS
EXPRESS
Profiler
Profiler
ApplicationExploration Phase
Generation Phase
Processor Libraries
Verification Feedback
EXPRESSION: Our ADL Approach
EXPRESSION
ADL
Feedback
EXPRESSION, EXPRESS, and SIMPRESS comprise the toolkit to aid the System Designer.Compiler-in-the-loop architectural exploration
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 71
System -Level Exploration
Alg. spec
C implementation
Proced. code
Cost estimation (mem,...)Perf. estimation
H/S Partitioning
Coarse-grain & algtransformations
HLSEXPRESSCompiler
Target CodeRTOS Kernel
ROM
Proc
ASIC
On-chipMemory
MainMemoryController Datapath
HW SW
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 72
MEMOREX: Memory Exploration Environment
System spec in C
Parser, FG Generator w/ Semantics Retention
Memory Disambiguation, Multi-dim DF analysis
Hw/Sw Partitioning (SpecSyn)
SW Synthesis (EXPRESS) HW Synthesis (ISE, Synopsys)
HW/SW Codesign
Memory Estimation
Transformations
Memory Optimizations
Virtual Memory Mapping
UserInterface
Control/DFGraph
CDFG withreal memories
MemoryLibrary
Physical Memory Mapping
MEMOREX
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 73
Software Toolkit for the System Designer
EXPRESS - An Extensible, Retargetable, Instruction-Level Parallelizing (ILP) Compiler State-of-art ILP techniques:
Resource Directed Loop Pipelining (RDLP), Trailblazing Percolation Scheduling (TiPS)
Mutation Scheduling : Framework for dynamically exploring tradeoffs between transformations.
Detailed architecture model (for enhanced retargetability and optimizing capability)
Automatic generation of operation conflict information (as Reservation Tables) from EXPRESSION
Very general speculation/predication
SIMPRESS - A Retargetable, Cycle-accurate simulator Runs on EXPRESS IR. (Compiler designers can use the simulator as a debugging tool)
Structural Simulation. (Provides System Designer with detailed statistics)
Highly retargetable. (Can be used to simulate VLIWs, DSPs etc)
V-SAT - A Visual Architecture Specification and Analysis Tool Visual Tool for easy specification of Structural and Instruction-Set Information.
Interfaces with SIMPRESS to collect detailed statistical information about the architecture
Visual display of the statistics in an intuitive manner to aid architecture evaluation .
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 74
GCC + Semantics Retention
AnalysisMutating
Transformations
Simulation,Visualization,
Interaction
(SIMPRESS&
VSAT)
RetargetableBack End
EXPRESS: Compiler Environment for Embedded Processors
Memory Hierarchy Transformations
Proc 1 Proc 2 Proc n.......
Control
EXPRESSION(ADL)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 75
Outline
Methodology for Architectural Exploration
Survey of Architectural Description Languages (ADLs)
Software Toolkit Generation
Architectural Exploration
Summary and Conclusions
Experiments:- Pipelining- Memory-aware compilation- Memory arch exploration
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 76
The DLX Example Architcture
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 77
Design Space Exploration
Designer targets various goals (power, area, perf) Often conflicting
DSE allows trade-offs between these goals. Explore changes to:
processor/memory system architecture changing the pipeline structure changing the data path structure increasing parallelism changing the memory components
instruction set adding new operations (e.g., MAC)
DLX simulation Pipeline stalled 53% of time, due to RAW data hazards INT and FP Adder units are the most utilized
Explored several forwarding path placements
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 78
1. Forwarding path from All_mem_Latch to A1
2. Forwarding path Mem_WB_Latch to INT
3. Both (1) and (2)
4. Forwarding path All_mem_latch to INT and (1)
5. Forwarding path Mem_WB_Latch to A1 and (1)
Example Design Space Exploration: Pipelining
Exploits (mpy,fp_add) sequences
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 79
1. Forwarding path All_mem_Latch to A1
2. Forwarding path Mem_WB_Latch to INT
3. Both (1) and (2)
4. Forwarding path All_mem_latch to INT and (1)
5. Forwarding path Mem_WB_Latch to A1 and (1)
Exploits (ld,int_add) sequences
Example Design Space Exploration: Pipelining
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 80
1. Forwarding path All_mem_Latch to A1
2. Forwarding path Mem_WB_Latch to INT
3. Both (1) and (2)
4. Forwarding path All_mem_latch to INT and (1)
5. Forwarding path Mem_WB_Latch to A1 and (1)
Exploits (mpy,fp_add) and (ld,int_add) sequences
Example Design Space Exploration: Pipelining
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 81
1. Forwarding path All_mem_Latch to A1
2. Forwarding path Mem_WB_Latch to INT
3. Both (1) and (2)
4. Forwarding path All_mem_latch to INT and (1)
5. Forwarding path Mem_WB_Latch to A1 and (1)
Exploits (mpy,fp_add) and (mpy,int_add) sequences
Example Design Space Exploration: Pipelining
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 82
1. Forwarding path All_mem_Latch to A1
2. Forwarding path Mem_WB_Latch to INT
3. Both (1) and (2)
4. Forwarding path All_mem_latch to INT and (1)
5. Forwarding path Mem_WB_Latch to A1 and (1)
Exploits (mpy,fp_add) and (ld,fp_add) sequences
Example Design Space Exploration: Pipelining
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 83
DLX Pipeline DSE Results
Innerp Linear_eq State_eq Integrate 1D_particle GLR
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 84
DLX Pipelining Experiments Summary
Forwarding paths added: average performance improvement: 15%
Reduced the number of pipeline stages Multiply from 7 to 5 stages FP Adder from 4 to 3 stages average performance improvement: 6%
Forwarding paths + reduced number of pipeline stages: average performance improvement: 25.9%
Multi-issue version of DLX: 4 instructions issued every cycle average performance improvement: 11.7%
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 85
Outline
Methodology for Architectural Exploration
Survey of Architectural Description Languages (ADLs)
Software Toolkit Generation
Architectural Exploration
Summary and Conclusions
Experiments:- Pipelining- Memory-aware compilation- Memory arch exploration
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 86
Memory-Aware Compilation
Traditionally, memory system transparent to compiler: Scheduled all loads/stores assuming a uniform behavior
However, memory operations intrinsically non-uniform: Modern DRAMs: Page-mode, burst-mode accesses, banking, pipelining Caches: cache hits and misses have very different timing
Our Approach: TIMGEN Provide accurate memory timing information to compiler Allow compiler to globally hide latencies of lengthy memory operations. Generate significant performance improvements Two instances:
DRAM Efficient Access Modes (page, burst-mode accesses) In the presence of caches: Cache Miss Traffic Management
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 87
Exploiting DRAM Access Modes in Memory-Aware Compiler
Allow Compiler to exploit page-mode, burst-mode accesses DRAM access:
Row-decode, Column-decode, Precharge
Page-mode access: Consecutive accesses to the same row Row-decode and precharge can be omitted.
Burst-mode access: Starting from an initial address, a number of words are clocked out on consecutive cycles
Normal DRAM access:5 cycles
8 cyclesPage-mode DRAM access:
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 88
Example Exploiting DRAM Access Modes in Memory-Aware Compiler
No efficient accessmodes (180 cyc):
...
for(i=0;i<9;i++){ a = a + x[i] + y[i]; b = b + z[i] + u[i];}
Access mode optimization (84 cyc):
114 % gain
Memory-aware compiler (60 cyc):
40 % further gain
Time
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 89
Experiments exploiting DRAM access modes
Dynamic cycle counts exploiting page-mode and burst-modeaccesses in the compiler.
Presented at Design Automation Conference (DAC) 2000.
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 90
MIST: Cache miss traffic management
Cache misses: most time consuming operationsTraditionally, compiler assumed all memory accesses as cache hits, relying on
the memory controller to account for the cache misses. However, hiding latency of cache misses is crucialOur approach: MIST.
Allow compiler to perform global optimizations, and hide the latency of the cache misses.
Cache miss (20 cyc)Cache hit (2 cyc)Add (1 cyc)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 91
Cache miss traffic management Example
Cache line size: 4
Isolate cache misses
...for(i=0;i<12;i+=4){ s=s+temp s=s+a[i+1]; <== HIT s=s+a[i+2]; <== HIT s=s+a[i+3]; <== HIT temp=a[i+4]; <== MISS}...
Shift cache missto previous iteration
for(i=0;i<16;i++){ s=s+a[i];}
...
120 cyc
......
87 cyc (37% gain)
for(i=0;i<16;i+=4){ s=s+a[i]; <== MISS s=s+a[i+1]; <== HIT s=s+a[i+2]; <== HIT s=s+a[i+3]; <== HIT}
Cache Dependences
...
108 cyc (11% gain)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 92
Miss Traffic Management Experiments
Dynamic cycle counts for MIST: Memory Miss Traffic Management Algorithm.
Proc. International Conference on Computer Aided Design (ICCAD) 2000
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 93
Outline
Methodology for Architectural Exploration
Survey of Architectural Description Languages (ADLs)
Software Toolkit Generation
Architectural Exploration
Summary and Conclusions
Experiments:- Pipelining- Memory-aware compilation- Memory arch exploration
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 94
Embedded memories: the programmer’s viewpoint
Register files Explicit usage in instruction set
Caches, TLBs Fully implicit
RAM buffers Explicitly controlled through special LD,ST instructions
Reconfigurable memories Explicitly controlled through control instructions
For embedded systems Expose memory architecture to the compiler
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 95
Memory Organizations and Architectures
Traditional memory hierarchies Caching: spatial and temporal locality
Embedded memories Architectural and circuit techniques
Custom memory architecturesOther storage optimization examples
Spatial locality (multiple banks) Parsimony (compression) Scratch-pad memories, register files,...
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 96
Custom Memory Architectures
Disk File systems: Parsons et al., Patterson et al.: use file access patterns to improve file system.
High-level Synthesis Catthoor et al.: memory allocation, packing data structures into memories Panda et al.: Scratch-pad on-chip SRAM together with cache Bakshi et al.: memory exploration combining different port configurations
Computer Architectures Jouppi: Kessler et al.: hardware stream buffers to enhance memory perf. Graphics processors: frame buffers, FIFOs, etc.
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 97
APEX: Access Pattern based Memory Exploration
Motivation: Majority of memory accesses generated by a few instructions
e.g., Vocoder, 15k LOC: only 15 instructions, 62% Customize memory architecture for these accesses
APEX Approach (Grun, et al. ISSS-2001) Extract, analyze and cluster the most active Access Patterns in the application Use heuristic to prune the design space
many possible mappings with different power/perf/costs Avoid simulation of the entire design space
[Grun ISSS2001]
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 98
Customizing Memory Architectures
Opportunity for wide range of power, cost, performance Analyze application behavior (compile-time) Map memory accesses to structures supporting access patterns
CPU
CacheDRAM
StreambufferLinked-list buff
SRAM
CPU Cache DRAM
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 99
Motivating Example
Illustrative example: 2 cases
1. Traditional Cache-only Memory Architecture All data structures handled by the cache
2. APEX: Access Pattern-based Memory Customization Access Patterns go to Stream buffers, SRAMs,
Linked-list, and self-indirect Memory Modules.
for(i=0;i<1000;i++){ … = a[i] + …;}…
for(i=0;i<1000;i++){ code = codetab[code];}…
while(…){ … p = p->next;}…for(I=0;I<1000;I++){ for(j=0;j<10;j++){ … = coeff[j] + …; }}
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 100
1. Traditional Cache-only Memory Arch.
for(i=0;i<1000;i++){ … = a[i] + …;}…
for(i=0;i<1000;i++){ code = codetab[code];}…
while(…){ … p = p->next;}…for(I=0;I<1000;I++){ for(j=0;j<10;j++){ … = coeff[j] + …; }}
All data structures handled by the cache
CPU Cache DRAMa[]codetab[]Heapcoeff[]
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 101
2. APEX: Access Pattern-based Memory Customization
for(i=0;i<1000;i++){ … = a[i] + …;}…
for(i=0;i<1000;i++){ code = codetab[code];}…
while(…){ … p = p->next;}…for(I=0;I<1000;I++){ for(j=0;j<10;j++){ … = coeff[r] + …; }}
Mapping data structures to memories supporting their access modes: stream buffer, linked-list buffer, SRAM, and
cache
CPU
CacheDRAM
a[]codetab[]Heap
Streambuffer
Linked-list buff
SRAM coeff[]
[Grun ISSS2001]
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 102
Cost/Perf Exploration: Compress
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 103
Memory Exploration: Compress (Perf. Paretos)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 104
Perf/Power Exploration: Compress
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 105
Memory Exploration: Compress (Power Paretos)
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 106
Memory Organizations and Architectures
Traditional memory hierarchies Caching: spatial and temporal locality
Embedded memories Architectural and circuit techniques
Custom memory architecturesOther storage optimization examples
Spatial locality (multiple banks) Parsimony (compression) Scratch-pad memories, registers,..
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 107
Outline
Methodology for Architectural Exploration
Survey of Architectural Description Languages (ADLs)
Software Toolkit Generation
Architectural Exploration
Summary and Conclusions
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 108
Summary
Today we reviewed ADL-driven architectural exploration of programmable embedded systems
methodology, ADL survey, toolkit generation, sample experiments
Tremendous opportunity for architectural exploration Application-specific customization
Performance, power, size variations Processor, coprocessor, memory co-exploration
Key technologies required ADL as an executable specification of the architecture
tookit generation, validation/verification,... Highly tunable/retargettable compiler technology
Compiler-in-the-loop architectural evaluation Application-Architecture co-evolution
Copyright © 2002 Nikil Dutt ACES Laboratory www.cecs.uci.edu/~aces IITD ASIP Wkshp 109
Outlook
Current Focus: Language-driven SW toolkit generation (ADL=>compiler, simulator,…) Memory issues for embedded systems-on-chip: organization, exploration
performance, power, size
Flexible, powerful compilation environment for processor-core based designs compiler as an exploration tool, and as a software synthesis tool
Data and Instruction cache sizing for embedded applications Estimators, tight bounds on WCET for real-time applications using caches
Future Directions Memory/S-O-C architectures for Embedded DRAM/embedded logic Simulation/compilation environment for multiprocessors and novel memory
hierarchies on chip Customized OS support Tight coupling between arch, compiler, CAD, PP and OS