CUSTOMIZABLE EMBEDDED PROCESSORS - GBV
Transcript of CUSTOMIZABLE EMBEDDED PROCESSORS - GBV
CUSTOMIZABLE EMBEDDED PROCESSORS DESIGN TECHNOLOGIES
AND APPLICATIONS
Paolo lenne Ecole Polytechnique Federale de Lausanne (EPFL)
Rainer Leupers RWTH Aachen University
AMSTERDAM • BOSTON • HEIDELBERG • LONDON s
NEWYORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO „ O R G A N KAUFMANN PUBLISHERS Morgan Kaufmann is an imprint of Elsevier
CONTENTS
In Praise of Customizable Embedded Processors i
List of Contributors xix
About the Editors xxvii
Part I: Opportunities and Challenges
1 From Pret-ä-Porter to Tailor-Made Paolo Ienne and Rainer Leupers 3
1.1 The Call for Flexibility 4 1.2 Cool Chips for Shallow Pockets 5 1.3 A Million Processors for the Price of One? 5 1.4 Processors Coming of Age 7 1.5 This Book 7 1.6 Travel Broadens the Mind 9
2 Opportunities for Application-Specific Processors: The Case of Wireless Communications Gerd Ascheid and Heinrich Meyr 11
2.1 Future Mobile Communication Systems 12 2.2 Heterogeneous MPSoC for Digital Receivers 14
2.2.1 The Fundamental Tradeoff between Energy Efhciency and Flexibility 14
2.2.2 How to Exploit the Huge Design Space? 17 2.2.3 Canonical Receiver Structure 19 2.2.4 Analyzing and Classifying the Functions of '
a Digital Receiver 21 2.2.5 Exploiting Parallelism 25
2.3 ASIP Design 26 2.3.1 Processor Design Flow 26
X Contents
2.3.2 Architecture Description Language Based Design 28
2.3.3 Too Much Automation Is Bad 29 2.3.4 Processor Design: The LISATek Approach . . . . 30 2.3.5 Design Competence Rules the World 33 2.3.6 Application-Specific or Domain-Specific
Processors? 35
3 Customizing Processors: Lofty Ambitions, Stark Realities Joseph A. Fisher, Paolo Faraboschi, and Cliff Young 39
3.1 The "CFP" project at HP Labs 41 3.2 Searching for the Best Architecture Is Not a
Machine-Only Endeavor 45 3.3 Designing a CPU Core Still Takes a Very
Long Time 46 3.4 Don't Underestimate Competitive Technologies 48 3.5 Software Developers Don't Always Help You 49 3.6 The Embedded World Is Not Immune to Legacy
Problems 51 i'.l Customization Can Be Trouble 52 3.8 Conclusions 53
Part II: Aspects of Processor Customization
4 Architecture Description Languages Prabhat Mishra and Nikil Dutt 59
4.1 ADLs and other languages 60" 4.2 Survey of Contemporary ADLs 62
4.2.1 Content-Oriented Classification of ADLs 62 4.2.2 Objective-Based Classification of ADLs 72
4.3 Conclusions 75
5 C Compiler Retargeting Rainer Leupers 77
5.1 Compiler Construction Background 79 5.1.1 Source Language Frontend 79 5.1.2 Intermediate Representation and
Optimization 80 5.1.3 Machine Code Generation 83
5.2 Approaches to Retargetable Compilation 91 5.2.1 MIMOLA 92 5.2.2 GNU C Compiler 94
Contents XI
5.2.3 Little C Compiler 94 5.2.4 CoSy 95
5.3 Processor Architecture Exploration 98 5.3.1 Methodology and Tools for ASIP Design 98 5.3.2 ADL-Based Approach 100
5.4 C Compiler Retargeting in the LISATek Platform 104 5.4.1 Concept 104 5.4.2 Register Allocator and Scheduler 105 5.4.3 Code Selector 107 5.4.4 Results 111
5.5 Summary and Outlook 113
Automated Processor Configuration and Instruction Extension David Goodwin, Steve Leibson, and Grant Martin 117
6.1 Automation Is Essential for ASIP Proliferation 118 6.2 The Tensilica Xtensa LX Configurable Processor 119 6.3 Generating ASIPs Using Xtensa 121 6.4 Automatic Generation of ASIP Specifications 123 6.5 Coding an Application for Automatic ASIP
Generation 125 6.6 XPRES Benchmarking Results 126 6.7 Techniques for ASIP Generation 128
6.7.1 Reference Examples for Evaluating XPRES 128
6.7.2 VLIW-FLIX: Exploiting Instruction Parallelism 129
6.7.3 SIMD (Vectorization): Exploiting Data , Parallelism 131
6.7.4 Operator Fusion: Exploiting Pipeline Parallelism 133
6.7.5 Combining Techniques 134 6.8 Exploring the Design Space 136 6.9 Evaluating Xpres Estimation Methods 137
6.9.1 Application Performance Estimation 139 6.9.2 ASIP Area Estimation 139 6.9.3 Characterization Benchmarks 140 6.9.4 Performance and Area Estimation 141
6.10 Conclusions and Future of the Technology 142
Automatic Instruction-Set Extensions Laura Pozzi and Paolo Ienne 145
7.1 Beyond Traditional Compilers 144 7.1.1 Structure of the Chapter 147
xii Contents
7.2 Building Block for Instruction Set Extension 147 7.2.1 Motivation 148 7.2.2 Problem Statement: Identification and
Selection 148 7.2.3 Identification Algorithm 152 7.2.4 Results 155
7.3 Heuristics 160 7.3.1 Motivation 160 7.3.2 Types of Heuristic Algorithms 161 7.3.3 A Partitioning-Based Heuristic Algorithm 162 7.3.4 A Clustering Heuristic Algorithm 162
7.4 State-Holding Instruction-Set Extensions 163 7.4.1 Motivation 164 7.4.2 Local-Memory Identification Algorithm 165 7.4.3 Results 167
7.5 Exploiting Pipelining to Relax I/O Constraints 170 7.5.1 Motivation 171 7.5.2 Reuse of the Basic Identification
Algorithm 173 7.5.3 Problem Statement: Pipelining 174 7.5.4 I/O Constrained Scheduling Algorithm 176 7.5.5 Results 177
7.6 Conclusions and Further Challenges 183
8 Challenges to Automatic Customization Nigel Topham 185
8.1 The ARCompact Instruction Set Architecture 186 8.1.1 Mechanisms for Architecture Extension 190 8.1.2 ARCompact Implementations 190
8.2 Microarchitecture Challenges 191 8.3 Case Study—Entropy Decoding 193
8.3.1 Customizing VLD Extensions 195 8.4 Limitations of Automated Extension 203 8.5 The Benefits of Architecture Extension 205
8.5.1 Customization Enables CoDesign 205 8.5.2 Customization Offers Performance
Headroom 206 8.5.3 Customization Enables Platform IP 206 8.5.4 Customization Enables Differentiation 207
8.6 Conclusions 207
9 Coprocessor Generation from Executable Code Richard Taylor and David Stewart 209
9.1 Introduction 209 9.2 User Level Flow 210
Contents xiii
9.3 Integration with Embedded Software 214 9.4 Coprocessor Architecture 215 9.5 ILP Extraction Challenges 218 9.6 Internal Tool Flow 220 9.7 Code Mapping Approach 225 9.8 Synthesizing Coprocessor Architectures 228 9.9 A Real-World Example 229 9.10 Summary 231
10 Datapath Synthesis Philip Brisk and Majid Sarrafzadeh 233
10.1 Introduction 233 10.2 Custom Instruction Selection 234 10.3 Theoretical Preliminaries 236
10.3.1 The Minimum Area-Cost Acyclic Common Supergraph Problem 236
10.3.2 Subsequence and Substring Matching Techniques 237
10.4 Minimum Area-Cost Acyclic Common Supergraph Heuristic 238 10.4.1 Path-Based Resource Sharing 238
- 10.4.2 Example 238 10.4.3 Pseudocode 240
10.5 Multiplexer Insertion 246 10.5.1 Unary and Binary Noncommutative
Operators 246 10.5.2 Binary Commutative Operators 247
10.6 Datapath Synthesis 249 10.6.1 Pipelined Datapath Synthesis 249 10.6.2 High-Level Synthesis 249
10.7 Experimental Results 250 10.8 Conclusion 255
11 Instruction Matching and Modeling Sri Parameswaran, Jörg Henkel, and Newton Cheung 257
11.1 Matching Instructions 259 11.1.1 Introduction to Binary Decision Diagrams . . . . 259 11.1.2 The Translator 261 11.1.3 Filtering Algorithm 264 11.1.4 Combinational Equivalence Checking Model . . . 265 11.1.5 Results 265
11.2 Modeling 268 11.2.1 Overview 269 11.2.2 Customization Parameters 270
Contents
11.2.3 Characterization for Various Constraints 271 11.2.4 Equations for Estimating Area, Latency,
and Power Consumption 273 11.2.5 Evaluation Results 274
11.3 Conclusions 277
Processor Verification Daniel Große, Robert Siegmund, and Rolf Drechsler 281
12.1 Motivation 281 12.2 Overview of Verification Approaches 282
12.2.1 Simulation 282 12.2.2 Semiformal Techniques 284 12.2.3 Proof Techniques 284 12.2.4 Coverage 285
12.3 Formal Verification of a RISC CPU 285 12.3.1 Verification Approach 286 12.3.2 Specification 287 12.3.3 Systeme Model 288 12.3.4 Formal Verification 289
12.4 Verification Challenges in Customizable and Configurable Embedded Processors 293
12.5 Verification of Processor Peripherals 294 12.5.1 Coverage-Driven Verification Based on
Constrained-Random Stimulation 294 12.5.2 Assertion-Based Verification of Corner
Cases 297 12.5.3 Case Study: Verification of an On-Chip
Bus Bridge 298 12.6 Conclusions 302
Sub-RISC Processors Andrew Mihal, Scott Weber, and Kurt Keutzer 303
13.1 Concurrent Architectures, Concurrent Applications . . . . 303 13.2 Motivating Sub-RISC PEs 306
13.2.1 RISC PEs 307 13.2.2 Customizable Datapaths 311 13.2.3 Synthesis Approaches 311 13.2.4 Architecture Description Languages 311
13.3 Designing TIPI Processing Elements 316 13.3.1 Building Datapath Models 317 13.3.2 Operation Extraction 318 13.3.3 Single PE Simulator Generation 318 13.3.4 TIPI Multiprocessors 319
Contents xv
13.3.5 Multiprocessor Simulation and RTL Code Generation 321
13.4 Deploying Applications with Cairn 321 13.4.1 The Cairn Application Abstraction 323 13.4.2 Model Transforms 325 13.4.3 Mapping Models 325 13.4.4 Code Generation 326
13.5 IPv4 Forwarding Design Example 327 13.5.1 Designing a PE lor Click 327 13.5.2 ClickPE Architecture 328 13.5.3 ClickPE Control Logic 329 13.5.4 LuleaPE Architecture 330
13.6 Performance Results 331 13.6.1 ClickPE Performance 332 13.6.2 LuleaPE Performance 333 13.6.3 Performance Comparison 334 13.6.4 Potentials for Improvement 335
13.7 Conclusion 335
Part III: Case Studies
Application Specific Instruction Set Processor for UMTS-FDD Cell Search Kimmo Puusaari, Timo Yli-Pietilä, and Kim Rounioja 339
14.1 ASIP on Wireless Modem Design 340 14.1.1 The Role of ASIP 340 14.1.2 ASIP Challenges for a System House f 343 14.1.3 Potential ASIP Use Cases in Wireless
Receivers 344 14.2 Functionality of Cell Search ASIP 346
14.2.1 Cell Search-Related Channels and Codes 346 14.2.2 Cell Search Functions 347 14.2.3 Requirements for the ASIP 347
14.3 Cell Search ASIP Design and Verification 348 14.3.1 Microarchitecture 348 14.3.2 Special Function Units 350 14.3.3 Instruction Set 353 14.3.4 HDL Generation 354 14.3.5 Verification 355
14.4 Results 356 14.4.1 Performance 356 14.4.2 Synthesis Results 357
14.5 Summary and Conclusions 359
xvi Contents
15 Hardware/Software Tradeoffs for Advanced 3G Channel Decoding Daniel Schmidt and Norbert When 361
15.1 Channel Decoding for 3G Systems and Beyond 361 15.1.1 Turbo-Codes 363
15.2 Design Space 366 15.3 Programmable Solutions 368
15.3.1 VLIW Architectures 369 15.3.2 Customizable Processors 370
15.4 Multiprocessor Architectures 374 15.5 Conclusion 379
16 Application Code Profiling and ISA Synthesis on MIPS32 Rainer Leupers 381
16.1 Profiling of Application Source Code 384 16.1.1 Assembly and Source Level Profiling 385 16.1.2 Microprofiling Approach 387 16.1.3 Memory Access Microprofiling 391 16.1.4 Experimental Results 391
16.2 Semi-Automatic ISA Extension Synthesis 394 16.2.1 Sample Platform: MIPS CorExtend 394 16.2.2 CoWare CorXpert Tool 395 16.2.3 ISA Extension Synthesis Problem 395 16.2.4 Synthesis Core Algorithm 402 16.2.5 ISA Synthesis Based Design Flow 406 16.2.6 Speedup Estimation 408 16.2.7 Exploring the Design Space 410 16.2.8 SW Tools Retargeting and Architecture
Implementation 412 16.2.9 Case Study: Instruction Set Customization for
Blowfish Encryption 414 16.3 Summary and Outlook 422
17 Designing Soft Processors for FPGAs Göran Bilski, Sundararajarao Mohan, and Ralph Wittig 425
17.1 Overview 425 17.1.1 FPGA Architecture Overview 426 17.1.2 Soft Processors in FPGAs 428 17.1.3 Overview of Processor Acceleration 429
Contents XVII
17.2 MicroBlaze Soft Processor Architecture 430 17.2.1 Short Description of MicroBlaze 430 17.2.2 Highlights of Architectural Features 431
17.3 Discussion of Architectural Design Tradeoffs in MicroBlaze 432 17.3.1 Architectural Building-Blocks and Their FPGA
Implementation 432 17.3.2 Examples of Architectural Decisions
in MicroBlaze 434 17.2.3 Tool Support 441
17.4 Conclusions 441
Chapter R e f e r e n c e s 4 4 3
B i b l i o g r a p h y 4 6 5
I n d e x 4 8 5