Efﬁcient Design of Variation-Resilient Ultra-Low Energy ...978-3-030-12485-4/1.pdf · W.,...

Efficient Design of Variation-Resilient Ultra-LowEnergy Digital Processors

Hans Reyserhove • Wim Dehaene

Efficient Design ofVariation-ResilientUltra-Low EnergyDigital Processors

123

Hans ReyserhoveESAT—MICASKU LeuvenSeattle, USA

Wim DehaeneKU LeuvenHeverlee, Belgium

ISBN 978-3-030-12484-7 ISBN 978-3-030-12485-4 (eBook)https://doi.org/10.1007/978-3-030-12485-4

Library of Congress Control Number: 2019931385

© Springer Nature Switzerland AG 2019This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.The publisher, the authors, and the editors are safe to assume that the advice and information in this bookare believed to be true and accurate at the date of publication. Neither the publisher nor the authors orthe editors give a warranty, express or implied, with respect to the material contained herein or for anyerrors or omissions that may have been made. The publisher remains neutral with regard to jurisdictionalclaims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG.The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

https://doi.org/10.1007/978-3-030-12485-4

Preface

This book describes the continuation of the work in the KU Leuven-MICASdivision on ultra-low energy processors. In previous work (Reynders N., DehaeneW., Ultra-Low-Voltage Design of Energy-Efficient Digital Circuits, Springer, 2015),we showed that using transmission gates can result in variation-resilient energy-efficient digital signal processing blocks. However, at that time, these techniquescould only be used in a handcrafted way on relatively regular data paths. Besidesfurther optimizing the circuit techniques, we take this technique a step further inthis work. Our transmission gates end up in a library that is compatible with regulardigital design flows. An extension to the flow, to deal with the differential nature ofthe transmission gate-based logic, is also described. This results in an ARM Cortex-M0 as a demonstrator of the excellent energy efficiency these techniques allow.

The circuits presented run at a supply voltage below 500 mV. This calls for largedesign margins, even if intra-die variability is properly dealt with. These marginsare canceling part of the energy improvements that comes with low power supplyvoltages. To deal with this, we introduce in situ timing detection in the system. Latetransitions on paths with small timing slack are detected, and by means of specialsoft-edge flip-flops, timing errors are avoided. This results again in an ARM Cortex-M0 which can now be operated at very low energy without the need for large marginon the power supply.

This book is the result of 5 years of PhD work: a close cooperation between ayoung researcher and his advisor. As you will read, it was a very fruitful cooperationwhich we enjoyed a lot. We hope that sharing our results with you also brings youthe professional achievements you strive for.

Seattle, USA Hans ReyserhoveHeverlee, Belgium Wim DehaeneDecember 2018

v

Contents

1 Energy-Efficient Processors: Challenges and Solutions . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Energy-Efficient Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 CMOS Technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Efficient VLSI Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Tackling Variations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.7 Goals of This Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.8 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Near-Threshold Operation: Technology, Building Blocks andArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 Weak Inversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1.2 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.1.3 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1.4 Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1.5 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.1.6 Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.1.7 VT and Technology Flavour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.1.8 Application in Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2 Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.2.1 Standard CMOS Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.2.2 Transmission Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.2.3 Application in Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.3 Sequential Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.3.1 Latch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.3.2 Flip-Flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.3.3 Application in Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

vii

viii Contents

2.4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.4.1 Logic Depth and Pipelining Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.4.2 Application in Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.5 CMOS Technology Advancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.5.1 Fully Depleted Silicon-on-Insulator Technology. . . . . . . . . . . . . . 472.5.2 FinFET CMOS Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.5.3 Application in Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3 Efficient VLSI Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.2 VLSI Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.2.1 Physical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.2.2 Logic Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.2.3 Typical Physical Design Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.2.4 Typical Standard Cell Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.4 Proposed Design Flow Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.4.1 Library Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.4.2 Logic Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.4.3 Physical Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.5 Proof-of-Concept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4 Ultra-Low Voltage Microcontrollers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.2.1 Microcontroller System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.2.2 Microcontroller Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954.2.3 Simulation and Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.2.4 Benchmarking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.3 Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.3.1 RTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.3.2 Voltage and Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.3.3 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.3.4 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.3.5 Synthesis and Place-and-Route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034.3.6 Simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.4 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064.4.1 Silicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.4.2 Measurement Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.4.3 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.4.4 Power and Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.4.5 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Contents ix

4.4.6 Variations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.4.7 Simulation Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.4.8 Full System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

4.5 Energy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1174.5.1 Energy Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1174.5.2 fmax vs. ftarget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184.5.3 Active vs. Standby Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.6 State-of-the-Art Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5 Error Detection and Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275.1 Predictability-Enabling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.1.1 Silicon Lottery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1285.1.2 Worst Case Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.1.3 Binning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.1.4 Replica Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.1.5 In Situ Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.3 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.3.1 Time Redundancy Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345.4 EDAC Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

5.4.1 Error Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1365.4.2 Sequential Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385.4.3 Error Detection Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1395.4.4 Error Correction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1475.4.5 PoFF Operation and DVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

5.5 ULV Variation-Resilient Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1505.5.1 Variation Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1505.5.2 Variation Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1515.5.3 Ultra-Low Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

5.6 Baseline Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1565.6.1 Ideal Baseline Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1575.6.2 Other Baseline Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6 Timing Error-Aware Microcontroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

6.1.1 Transition Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1656.1.2 Soft-Edge Flip-Flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1656.1.3 Timing/Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1666.1.4 Error Latch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1676.1.5 Error Processor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1676.1.6 Timing Error Masking and Aware Operation. . . . . . . . . . . . . . . . . . 167

x Contents

6.2 Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1686.2.1 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1696.2.2 ULV Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1716.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

6.3 Modelling and Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1736.3.1 Standard Cell Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1736.3.2 Augmented Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746.3.3 Gate Level Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

6.4 Timing Error Masking-Aware Microcontroller . . . . . . . . . . . . . . . . . . . . . . . . 1776.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1776.4.2 Design Trade-Offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1796.4.3 SRAM Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

6.5 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1846.5.1 Dynamic Voltage Scaling and PoFF Performance . . . . . . . . . . . . 1866.5.2 Baseline Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1906.5.3 Replica Circuit Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1916.5.4 Variation Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

6.6 State-of-the-Art Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1936.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1997.1 General Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2007.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Abbreviations and Symbols

AHB Advanced high-performance busALU Arithmetic logic unitAMBA Advanced microcontroller bus architectureAPB Advanced peripheral busBTWC Better-than-worst-caseCCS Composite current source modelCMOS Complementary metal-oxide-semiconductorCPF Common Power FormatCSCD Current-sensing completion detectionCTS Clock tree synthesisDIBL Drain-induced barrier loweringDMIPS Dhrystone million instructions per secondDS Double samplingDSP Digital signal processingDVFS Dynamic voltage frequency scalingDVS Dynamic voltage scalingDW Detection windowECG ElectrocardiogramECSM Effective current source modelEDA Electronic design automationEDAC Error detection and correctionEDP Energy-delay productFD-SOI Fully depleted silicon-on-insulatorFF Fast nMOS fast pMOS process cornerFS Fast nMOS slow pMOS process cornerFTDI Future Technology Devices InternationalFO4 Fan-out 4GALS Globally asynchronous locally synchronousGP General purpose technology typeGPIO General purpose input/output signalHDL Hardware description language

xi

xii Abbreviations and Symbols

HVT High threshold voltage transistorI/O Input/outputIoT Internet-of-ThingsIP Intellectual propertyIRQ Interrupt requestISR Interrupt service routineJTAG Joint Test Action GroupLEF Library exchange formatLVT Low threshold voltage transistorLP Low power technology typeμP MicroprocessorMC Monte Carlo analysisMDP Minimum delay pointMEP Minimum energy pointMMMC Multi-mode multi-cornerMOSFET Metal-oxide-semiconductor field-effect transistorMTBPF Mean time between potential failuresNFC Near-field communicationNLDM Non-linear delay modelNMI Non-maskable interruptnMOS n-channel MOS transistorNoC Network-on-chipNVIC Nested vectored interrupt controllerOCV On-chip variationPCB Printed circuit boardpMOS P-channel MOS transistorPMU Power management unitPoFF Point-of-first-failurePVT Process-voltage-temperature conditionPWM Pulse width modulationRDF Random dopant fluctuationsRFID Radio-frequency identificationROM Read-only memoryRSCE Reverse short channel effectRTL Register-transfer languageRVT Regular threshold voltage transistorSBOCV Stage-based on-chip variationSDC Synopsys design constraintsSDL Set-dominant latchSF Slow nMOS Fast pMOS process cornerSoC System-on-chipSOI Silicon-on-insulatorSRAM Static random access memorySS Slow nMOS Slow pMOS process cornerSSTA Statistical static timing analysis

Abbreviations and Symbols xiii

TD Transition detectionTG Transmission gate logic gateTT Typical nMOS typical pMOS process cornerTTM Time-to-manufacturingUART Universal asynchronous receiver-transmitterULV Ultra-low voltage operationUTBB Ultra-thin buried boxUVFR Unified voltage frequency regulationVCO Voltage-controlled oscillatorVLSI Very-large-scale-integrationVR Virtual realityWIC Wake-up interrupt controller

Symbols

C CapacitanceCtotal Total capacitanceE EnergyEdynamic Dynamic energyEmargined Margined energyEoptimal Optimal energyEPoFF Point-of-first-failure energyEstatic Static energyEtot Total energyepi Number of endpoints in bin i

fclk Clock frequencyfmax Maximum clock frequencyftarget Target clock frequencyFFi Flip-flop in stage iId Drain current of a transistorIds Drain-source current of a transistorIleak Leakage currentIoff Off-current or leakage current of a transistorIoff,n Off-current of the nMOS transistorIoff,p Off-current of the pMOS transistorIon On-current or drive current of a transistorIon,n On-current of the nMOS transistorIon,p On-current of the pMOS transistorLgate Length of the gate of the transistorn Process-dependent parameterNML/H Low/high noise marginPdynamic Dynamic powerPleak Leakage or static power

xiv Abbreviations and Symbols

Pstatic Static or leakage powerPtot Total powerpi Non-EDAC monitored logic pathqi EDAC monitored logic pathRNM Relative noise marginRon On-resistance of a (combination of) transistorsS Subthreshold slopetborrow Time borrowing amountTclk Clock periodTsystem System clock periodtclk−q Clock-to-output delaytc,clk−q Contamination clock-to-output delaytd−q Input-to-output delaytDW Detection window delaythold Hold timetp,logic,i Propagation delay of logic stage i

tprop Propagation delaytsetup Setup timetslack Slack delayVBB Body voltage of a transistorVdd Supply voltageVdd,MEP Minimum energy point supply voltageVdd,margined Margined supply voltageVdd,min Minimum supply voltageVdd,nom Nominal supply voltageVdd,PoFF Point-of-first-failure supply voltageVdd,step Supply voltage step sizeVds Drain-source voltage of a transistorVgs Gate-source voltage of a transistorVI,L/H Low/high input voltageVIO Input/output domain supply voltageVO,L/H Low/high output voltageVsb Source-bulk voltage of a transistorVsd Source-drain voltage of a transistorVsg Source-gate voltage of a transistorVss Ground voltageVT Threshold voltage of a transistorVT0 Threshold voltage of a transistor for Vsb=0WpMOS Width of the gate of the pMOS transistor

α Activity factorγ Body effect coefficient� Delay or differenceη DIBL coefficient

Abbreviations and Symbols xv

μx Mean value of x

σx Standard deviation of x

� Cumulative distribution functionψ0 Surface potential

List of Figures

Fig. 1.1 Energy delay trade-off in design optimization . . . . . . . . . . . . . . . . . . . . . . 4Fig. 1.2 Energy breakdown and speed performance of the

microcontroller prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Fig. 1.3 Near-threshold operation in sub-micron CMOS technology

results in functional problems, speed degradation andvariability sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Fig. 1.4 Near-threshold system variations can be overcome byautonomous in-circuit timing monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Fig. 2.1 Device cross section of an nMOS and a pMOS transistor ina typical CMOS technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Fig. 2.2 Normalized drain–source current Ids for increasing Vgs for atypical low-VT nMOS device in 40 nm CMOS technology . . . . . . . . . 19

Fig. 2.3 Inversion operation of an nMOS MOSFET . . . . . . . . . . . . . . . . . . . . . . . . . 19Fig. 2.4 On and off current for increasing Vds for a minimal low-VT

nMOS and pMOS in 40 nm CMOS technology . . . . . . . . . . . . . . . . . . . . . 20Fig. 2.5 Logic functionality of the FO4 inverter is realized by

balancing drive and leakage current . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Fig. 2.6 Simulation of the FO4 inverter nMOS and pMOS Ion/Ioff

ratio and the resulting noise margin for different supply voltages. . 22Fig. 2.7 Larger relative pMOS sizing balances Ion/Ioff ratios and

restores relative noise margin of the FO4 inverter . . . . . . . . . . . . . . . . . . 22Fig. 2.8 Propagation delay of a FO4 inverter as function of Vdd . . . . . . . . . . . . . 23Fig. 2.9 Relative leakage power of a FO4 inverter as function of Vdd . . . . . . . 24Fig. 2.10 Cumulative distribution function of FO4 inverter

propagation delay intra-die variation for 0.2 V and 0.9 V. . . . . . . . . . . 25Fig. 2.11 Relative intra-die variation as function of Vdd of a FO4 inverter . . . 26Fig. 2.12 Relative propagation of FO4 inverter as function of Vdd for

different process corners or inter-die variations . . . . . . . . . . . . . . . . . . . . . 26Fig. 2.13 Relative propagation delay and leakage power for different

ambient temperatures as function of Vdd for the FO4 inverter. . . . . . 27

xvii

xviii List of Figures

Fig. 2.14 Relative propagation delay as function of Vdd for the FO4inverter implemented in different VT’s and technology flavours . . . 28

Fig. 2.15 Relative leakage power as function of Vdd for the FO4inverter implemented in different VT’s and technology flavours . . . 28

Fig. 2.16 FO4 inverter intra-die variation as function of Vddimplemented in different VT device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Fig. 2.17 Energy/cycle of the M0 core as predicted by the FO4inverter propagation delay and leakage power . . . . . . . . . . . . . . . . . . . . . . 30

Fig. 2.18 Energy/cycle of the M0 core as predicted by the FO4inverter propagation delay and leakage power . . . . . . . . . . . . . . . . . . . . . . 31

Fig. 2.19 Energy-delay product of the M0 core as predicted by theFO4 inverter propagation delay and leakage power . . . . . . . . . . . . . . . . . 32

Fig. 2.20 Stacked nMOS inverter topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Fig. 2.21 Simulation on the effect of stacking on Ion and Ioff across

different Vdd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Fig. 2.22 Simulation of the noise margin of an nMOS stacked inverter

for optimal relative pMOS sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Fig. 2.23 Propagation delay variation due to intra-die variation and

noise margin sensitivity to inter-die variation of the stackednMOS inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Fig. 2.24 Schematic of the differential transmission gate building block. . . . . 35Fig. 2.25 Schematic of the cascaded transmission gate principle . . . . . . . . . . . . . 36Fig. 2.26 Comparison of the relative noise margin of a single-ended

stacked nMOS inverter and a differential transmission gatelogic inverter as a function of Vdd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Fig. 2.27 Propagation delay variation due to intra-die variation andnoise margin sensitivity due to inter-die variation for theNOR gate implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Fig. 2.28 Normalized leakage power and propagation delay of annMOS stacked inverter with 40 nm and 60 nm gate length,low VT and regular VT devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Fig. 2.29 Normalized leakage power and propagation delay for 40 nmand 60 nm gate length logic gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Fig. 2.30 Transistor implementation of the positive edge triggered flip-flop . 41Fig. 2.31 Block diagram of the error detection flip-flop . . . . . . . . . . . . . . . . . . . . . . . 43Fig. 2.32 Bar graph showing the absolute and relative contribution to

the leakage power of sequential logic to both prototypes . . . . . . . . . . . 44Fig. 2.33 Different pipeline depths result in different activities and

maximum clock periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Fig. 2.34 Energy/cycle of the M0 core as predicted by the FO4

inverter with varying activity factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Fig. 2.35 Cross section of the UTBB FD-SOI transistor device . . . . . . . . . . . . . . 47

List of Figures xix

Fig. 2.36 Chip micrograph of the 16-bit MAC implementation in28 nm UTBB FD-SOI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Fig. 3.1 Y-chart showing a common design partitioning strategy . . . . . . . . . . . 54Fig. 3.2 Overview of the typical top-down design flow . . . . . . . . . . . . . . . . . . . . . 56Fig. 3.3 Overview of the physical design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Fig. 3.4 Multi-mode multi-corner optimization concept . . . . . . . . . . . . . . . . . . . . 60Fig. 3.5 Concept of statistical static timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . 62Fig. 3.6 Concept of stage-based OCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Fig. 3.7 Full standard cell description of a simple inverter cell . . . . . . . . . . . . . 64Fig. 3.8 Characterization approach in NLDM and CCS/ECSM

models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Fig. 3.9 Proposed differential transmission gate design flow chart . . . . . . . . . . 68Fig. 3.10 Standard cell deconstruction into transmission gate and

inverters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Fig. 3.11 Example logic implementations using 1, 2 or 3 levels of

transmission gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Fig. 3.12 Example characterization setup of a differential AND2 cell . . . . . . . 72Fig. 3.13 Differential to single-ended mapping of characterized data,

applied on an AND2 gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Fig. 3.14 Standard cell library single-ended transformation flow . . . . . . . . . . . . 74Fig. 3.15 Logic function usage for the microcontroller systems . . . . . . . . . . . . . . 75Fig. 3.16 Gate-level netlist differential transformation flow . . . . . . . . . . . . . . . . . . 76Fig. 3.17 Cross domain interfacing between differential and

single-ended domain occurs through power intentspecification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Fig. 3.18 A SKILL representation of the inverter and transmissiongate is used to build a full standard cell library . . . . . . . . . . . . . . . . . . . . . 80

Fig. 3.19 Maximum routing length mismatch between differentialpins in the microcontroller implementation . . . . . . . . . . . . . . . . . . . . . . . . . 81

Fig. 3.20 Principle of the applied hold time optimization . . . . . . . . . . . . . . . . . . . . 82

Fig. 4.1 Applications in the ultra-low voltage microcontroller domainconstrained by three key factors: energy, performance andform factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Fig. 4.2 Typical microcontroller system and subdivision in its mostimportant building blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Fig. 4.3 Overview of the full microcontroller system with interfacing . . . . . . 90Fig. 4.4 Block diagram of the M0 core internal subsystems . . . . . . . . . . . . . . . . . 91Fig. 4.5 Block diagram of a AHB-lite bus system. . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Fig. 4.6 Memory map prescription of the M0 architecture. . . . . . . . . . . . . . . . . . . 93Fig. 4.7 Typical microcontroller program development and mapping flow . 94Fig. 4.8 System floor plan with power domain subdivision and power grid 99Fig. 4.9 Comparison between conventional clock tree and clock mesh . . . . . 100Fig. 4.10 Clock mesh implementation of the ARM Cortex-M0 core . . . . . . . . . 101Fig. 4.11 Overview of the full system clock hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 101

xx List of Figures

Fig. 4.12 Bus and memory read and write operation on the negativeedge of the clock in relation to the AHB protocol . . . . . . . . . . . . . . . . . . 102

Fig. 4.13 Speed performance from timing analysis of both prototypesat different process corners and supply voltages . . . . . . . . . . . . . . . . . . . . 104

Fig. 4.14 Power consumption from power analysis of both prototypesat different process corners and supply voltages . . . . . . . . . . . . . . . . . . . . 105

Fig. 4.15 Leakage contribution from power analysis of bothprototypes at different process corners and supply voltages . . . . . . . . 105

Fig. 4.16 Energy/cycle from power analysis of both prototypes atdifferent process corners and supply voltages . . . . . . . . . . . . . . . . . . . . . . . 106

Fig. 4.17 Chip micrograph of both prototypes, fabricated in 40 nmCMOS technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Fig. 4.18 Measurement setup for prototype measurement. . . . . . . . . . . . . . . . . . . . . 108Fig. 4.19 Mean and boxplot of the measured maximum operating

frequency as a function of supply voltage for both prototypes . . . . 109Fig. 4.20 Mean and boxplot of the measured M0 core power as

a function of supply voltage for both prototypes at thereported operating speeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Fig. 4.21 Leakage contribution to M0 core energy consumption as afunction of supply voltage for both prototypes . . . . . . . . . . . . . . . . . . . . . . 111

Fig. 4.22 M0 core energy consumption per cycle as a function ofsupply voltage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Fig. 4.23 M0 core energy-delay product as a function of supply voltage. . . . . 113Fig. 4.24 Histogram of Vdd,min and Vdd,MEP of 25 measured dice of

prototype 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Fig. 4.25 Histogram of Vdd,min and Vdd,MEP of 8 measured dice of

prototype 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Fig. 4.26 Histogram of energy/cycle at Vdd,min and Vdd,MEP of 25

measured dice of prototype 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Fig. 4.27 Histogram of energy/cycle at Vdd,min and Vdd,MEP of 8

measured dice of prototype 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Fig. 4.28 Energy consumption of the M0 core of prototype 2 as a

function of supply voltage for a 0–70 ◦C temperature range . . . . . . . 115Fig. 4.29 Total, static and dynamic energy trade-off comparison for

both prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Fig. 4.30 Extensive overview comparing MEP voltage, speed,

energy and voltage of the demonstrated prototypes withstate-of-the-art ULV microcontroller implementations . . . . . . . . . . . . . 120

Fig. 4.31 Speed–energy combination of the demonstrated prototypesand other state-of-the-art prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Fig. 5.1 Conceptual performance qualification probability densityfunction of a large distribution of dice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Fig. 5.2 Worst case based strategy to enable predictable performance . . . . . 129Fig. 5.3 Binning strategy to enable predictable performance . . . . . . . . . . . . . . . 129

List of Figures xxi

Fig. 5.4 Example replica circuit implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130Fig. 5.5 Replica circuit strategy for performance qualification . . . . . . . . . . . . . 130Fig. 5.6 In situ monitoring performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Fig. 5.7 System overview of a typical EDAC system . . . . . . . . . . . . . . . . . . . . . . . . 132Fig. 5.8 Current-sensing completion detection (CSCD) . . . . . . . . . . . . . . . . . . . . . 133Fig. 5.9 Muller C-element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Fig. 5.10 Time-domain majority voter example implementations . . . . . . . . . . . . 135Fig. 5.11 Time-domain perturbation detector implementations . . . . . . . . . . . . . . . 135Fig. 5.12 Schematic overview of error prediction, detection and

masking with timing diagram implemented with doublesampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Fig. 5.13 Double sampling principle overview and timing diagram . . . . . . . . . 140Fig. 5.14 Transition detection principle overview and timing diagram . . . . . . 141Fig. 5.15 Virtual supply node monitoring transition detection . . . . . . . . . . . . . . . 141Fig. 5.16 Additional hold time constraint due to error detection after

the clock edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143Fig. 5.17 Short path padding to overcome hold time problem due to

error detection after the clock edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144Fig. 5.18 Example timing histogram showing all the path endpoints

with their most critical path timing slack . . . . . . . . . . . . . . . . . . . . . . . . . . . 146Fig. 5.19 Data reinsertion in the main flip-flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148Fig. 5.20 Data reinsertion in the data-path after the main flip-flop . . . . . . . . . . . 148Fig. 5.21 Trade-off in error resilient systems with voltage scaling . . . . . . . . . . . 149Fig. 5.22 Classification of variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151Fig. 5.23 Incident variation resulting in a possible timing error . . . . . . . . . . . . . . 152Fig. 5.24 Diagram of clock and detection window generation at the

root node with timing diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Fig. 5.25 Diagram of clock generation at the root node and detection

window generation at the leaf node, with timing diagram . . . . . . . . . 154Fig. 5.26 Diagram of error signal propagation to the clock root node

with timing diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156Fig. 5.27 Energy breakdown of baseline comparison EDAC

implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158Fig. 5.28 Summary of the EDAC properties, considerations and

challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Fig. 6.1 Diagram of the proposed EDAC architecture . . . . . . . . . . . . . . . . . . . . . . . 164Fig. 6.2 Circuit of the proposed EDAC flip-flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166Fig. 6.3 Timing diagram of the proposed EDAC flip-flop . . . . . . . . . . . . . . . . . . . 170Fig. 6.4 Functional description of the timing error flip-flop used for

characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174Fig. 6.5 Flow chart of the differential standard cell design flow

augmented with timing error detection insertion . . . . . . . . . . . . . . . . . . . 175Fig. 6.6 System overview of the microcontroller system equipped

with timing error detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

xxii List of Figures

Fig. 6.7 Error processor enabled as a peripheral in themicrocontroller system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Fig. 6.8 Error detection as a result of DVS step size and detectionwindow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Fig. 6.9 System clock period as a function of supply voltage fordifferent process corners. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Fig. 6.10 Histogram of the path with the smallest timing slack at eachendpoint flip-flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Fig. 6.11 Boxplot of the slack distribution of a subset of timing paths . . . . . . . 183Fig. 6.12 Slack analysis of EDAC equipped paths vs. non-EDAC

equipped paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185Fig. 6.13 Die micrograph of the EDAC-enabled microcontroller

system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186Fig. 6.14 Layout of the microcontroller system floor plan with 224

placed timing error flip-flops highlighted . . . . . . . . . . . . . . . . . . . . . . . . . . . 186Fig. 6.15 Measurement of the PoFF curve for a wide frequency range,

showing required supply voltage and energy consumptionfor the achieved performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Fig. 6.16 Histogram of Vdd,min and Vdd,MEP of 14 measured dice . . . . . . . . . . . 188Fig. 6.17 Histogram of energy/cycle at Vdd,min and Vdd,MEP of 14

measured dice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188Fig. 6.18 PoFF supply voltage and average error rate of monitored

path groups for 6 different dice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188Fig. 6.19 PoFF supply voltage and average error rate of monitored

path groups for a single die across a 0–70 ◦C temperaturerange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

Fig. 6.20 Energy measurement of the PoFF curve for both the EDACand the baseline design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

Fig. 6.21 Ring oscillator speed related to critical path speed atdifferent process corners and under intra-die variations . . . . . . . . . . . . 191

Fig. 6.22 Energy measurement of the PoFF curve for both the EDACand the baseline design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

List of Tables

Table 2.1 Microcontroller operating conditions, silicon measurementresults and derived parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Table 2.2 Microcontroller operating conditions, silicon measurementresults and derived parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Table 2.3 Application of near-threshold building blocks in themicrocontroller prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Table 2.4 Comparison of a 16-bit MAC block in 90 nm bulk CMOSand 28 nm UTBB FD-SOI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Table 3.1 Architectural techniques to improve power with their impacton the physical design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Table 3.2 Application summary of the proposed design flow on aproof-of-concept microcontroller system . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Table 4.1 The two microcontroller implementations of this work withtheir key differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Table 4.2 Synthesis logic mapping results of the ARM Cortex-M0system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Table 4.3 Comparison of the M0 core performance simulation andmeasurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Table 4.4 Same operating condition comparison of the M0 coreperformance simulation and measurement results . . . . . . . . . . . . . . . . . 116

Table 4.5 Full system performance of both prototypes at Vdd,min andVdd,MEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Table 4.6 Performance summary and state-of-the-art comparison . . . . . . . . . . . 121

Table 6.1 Transition detector logic value truth table . . . . . . . . . . . . . . . . . . . . . . . . . . 168Table 6.2 Overview of possible transition cases resulting in a

transition detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169Table 6.3 Setup time for normal flip-flop vs. EDAC flip-flop . . . . . . . . . . . . . . . . . 172Table 6.4 Normal vs. EDAC flip-flop comparison under typical

conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

xxiii

xxiv List of Tables

Table 6.5 Detection window vs. system clock period under processvariations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

Table 6.6 Performance summary and state-of-the-art comparison ofdifferent EDAC implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

Efﬁcient Design of Variation-Resilient Ultra-Low Energy ...978-3-030-12485-4/1.pdf · W.,...

Documents

Transcript of Efﬁcient Design of Variation-Resilient Ultra-Low Energy ...978-3-030-12485-4/1.pdf · W.,...