Power consumption veriﬁcation for a new generation pixel … UNIVERSITÀ DEGLI STUDI DI PERUGIA...

UNIVERSITÀ DEGLI STUDI DI PERUGIA

MASTER THESIS

Power consumption verificationfor a new generation pixel readout chip

in High Energy Physics

Author:

Andrea MARCOTULLI

Supervisor:

Ph.D. Eng. Pisana PLACIDI

Assistant Supervisor:

Ph.D. student Sara MARCONI

A thesis submitted in fulfillment of the requirements

for the degree of Electronics and Telecomunication Engineering

in the

Electronics Research Group

Engineering Department

Academic year 2015/2016

ii

Acknowledgements

I would first like to express my sincere gratitude to my supervisor Pisana Placidi

for helping me whenever I needed it, with continuous support from the beginning

up to the end of this work.

Moreover, this paper would not have been possible without my assistant super-

visor Sara Marconi and her precious comments and teachings during the learning

process of this master thesis.

Furthermore, I can not forget to thank Daniel Magalotti for introducing me to the

topic, his continuous encouragement and interest on my work.

My sincere thanks also goes to Giuseppe Baruffa for his enthusiasm and knowl-

edge which guided me in the last part of this work.

Also, I am very grateful to all my supervisors and to all the other people, be-

tween whom Daniele Passeri and Gianmario Bilei, that have helped me in having

the opportunity of working in a international and stimulating environment.

Finally, I must express my very profound gratitude to my mother, my girlfriend

and my friends for providing me with unfailing support and continuous encourage-

ment throughout my years of study and through the process of researching and writ-

ing this thesis. This accomplishment would not have been possible without them.

iii

Contents

Acknowledgements ii

Contents iii

List of Figures vi

List of Tables x

List of Abbreviations xii

Introduction 1

1 Front-end Electronics for Pixel Detectors in Particle Physics 3

1.1 Hybrid pixel detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 The Front End Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Generic Pixel Unit Cell . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.2 Readout Architectures . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 The RD53 collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Low Power Design 13

2.1 CMOS Power Comsumption . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

(Dis)charging capacitors . . . . . . . . . . . . . . . . . . . . . . . 14

Glitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Short-circuit currents . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.2 Static Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Contents iv

Sub-threshold leakage . . . . . . . . . . . . . . . . . . . . . . . . 18

Gate leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Junction leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Low power Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.1 At functional block level . . . . . . . . . . . . . . . . . . . . . . . 21

Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Operand isolation . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Pin swapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.2 At system level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Power Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Multi VDD Design . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Low Power within RD53 Collaboration . . . . . . . . . . . . . . . . . . 26

3 Low Power Design Flow 29

3.1 ASIC design methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Front-end digital design stage . . . . . . . . . . . . . . . . . . . . 29

3.1.2 Back-end digital design stage . . . . . . . . . . . . . . . . . . . . 32

Floorplan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Power plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Hierarchical partitioning . . . . . . . . . . . . . . . . . . . . . . . 34

Top level placement and routing . . . . . . . . . . . . . . . . . . 35

3.2 Power analysis flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 VEPIX53 simulation and verification framework . . . . . . . . . . . . . 39

3.3.1 System Verilog and UVM . . . . . . . . . . . . . . . . . . . . . . 39

3.3.2 Overall architecture . . . . . . . . . . . . . . . . . . . . . . . . . 41

Top module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Testbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Test scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Device Under Test: CHIPIX65 46

4.1 Pixel Region architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.1 Pixel Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1.2 Region Digital Logic . . . . . . . . . . . . . . . . . . . . . . . . . 51

Contents v

4.2 EOC readout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Power analysis results 56

5.1 Simulation flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.1 Pixel Chip Harness . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 Input files for power analysis . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2.1 SPEF extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2.2 DEF generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.3 VCD generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.3 Average power consumption . . . . . . . . . . . . . . . . . . . . . . . . 65

5.3.1 Influence of signal digitization . . . . . . . . . . . . . . . . . . . 69

5.3.2 Clock tree analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.4 Peak power analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.5 Outcome of the power analysis and optimisation . . . . . . . . . . . . . 75

6 On chip data clustering 78

6.1 RD53A prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.2 Data compression techniques . . . . . . . . . . . . . . . . . . . . . . . . 80

6.2.1 Huffman encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.2.2 Run-Length encoding . . . . . . . . . . . . . . . . . . . . . . . . 81

6.3 Compression strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3.1 Run-Level clustering . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3.2 Modified Run-Level clustering . . . . . . . . . . . . . . . . . . . 85

6.3.3 Run-Level clustering with Huffman . . . . . . . . . . . . . . . . 86

6.4 Data compression results . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Conclusions 88

References 90

vi

List of Figures

1.1 Topology of a short lived particle decay together with ordinary parti-

cles emerging from the same collision . . . . . . . . . . . . . . . . . . . 4

1.2 Traditional MAPs building block . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Basic building block of a HPD . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Sketch of a "blow-up" hybrid pixel detector . . . . . . . . . . . . . . . . 6

1.5 Geometry of a generic PC . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.6 Circuit blocks of a generic PUC . . . . . . . . . . . . . . . . . . . . . . . 8

1.7 Comparison between charge digitization methods: ADC and TOT ap-

proaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.8 Response time of a discriminator as function of the input charge . . . 9

1.9 LHC/ HL-LHC Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Dynamic charging in a common CMOS inverter . . . . . . . . . . . . . 14

2.2 Basic circuit showing the formation of glitches . . . . . . . . . . . . . . 15

2.3 Short-circuit current in CMOS inverter . . . . . . . . . . . . . . . . . . 16

2.4 Short-circuit current function of input/output signals’ slopes . . . . . 17

2.5 Leakage currents in MOS transistor . . . . . . . . . . . . . . . . . . . . . 18

2.6 Impact of reduced threshold voltage on leakage . . . . . . . . . . . . . 19

2.7 Evolution of the gate thickness and the gate leakage over various tech-

nology nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.8 Clock gating implementation . . . . . . . . . . . . . . . . . . . . . . . . 22

2.9 Local and Global clock gating . . . . . . . . . . . . . . . . . . . . . . . 22

2.10 Design with operand isolation . . . . . . . . . . . . . . . . . . . . . . . 23

List of Figures vii

2.11 Pin swapping applied on a NAND gate . . . . . . . . . . . . . . . . . . 24

2.12 Power gating implementation . . . . . . . . . . . . . . . . . . . . . . . . 24

2.13 Multi voltage design techniques: a) static voltage scaling, b) multi

voltage scaling, c) dynamic frequency and voltage scaling, d) adaptive

frequency and voltage scaling . . . . . . . . . . . . . . . . . . . . . . . . 25

2.14 Serial powering scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.15 Illustration of constant current mode of serial powering with varying

power consumption of pixel chip . . . . . . . . . . . . . . . . . . . . . . 27

3.1 ASIC design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Schematic of a syntetized pixel logic . . . . . . . . . . . . . . . . . . . . 31

3.3 Setup and hold slack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Setup-hold window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 Post place&route physical view of CHIPIX65 . . . . . . . . . . . . . . . 36

3.6 General and power ASIC design flow with related Cadence software

packages used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.7 Inputs and accuracy of Cadence Voltus engine . . . . . . . . . . . . . . 38

3.8 Partial UVM class library . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.9 Block diagram of the VEPIX53 simulation and verification environ-

ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.10 Unified Modeling Language (UML) diagrams of the transaction classes

defined for VEPIX53 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.11 Classes of hits generated by VEPIX53: a) single charged particles, b)

jets, c) loopers, d) machine background particles . . . . . . . . . . . . . 43

3.12 Output example of the scoreboard object . . . . . . . . . . . . . . . . . . 44

4.1 CHIPIX65 demonstrator layout . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Analog islands in a PR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3 Block diagram of a distributed pixel region buffering architecture . . . 48

4.4 Pixel Region architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.5 Simulation results on hit loss due to pixel deadtime for both the slow

and fast frontend modes . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.6 Block diagram of a digital PUC . . . . . . . . . . . . . . . . . . . . . . . 50

List of Figures viii

4.7 Timing diagram of ToT and fixed deadtime counters . . . . . . . . . . . 51

4.8 Block diagram of a shared region logic . . . . . . . . . . . . . . . . . . . 51

4.9 Histogram of number of hit pixels per pixel region (4x4) in the ex-

treme scenario at the edges of the barrel . . . . . . . . . . . . . . . . . 52

4.10 Hit loss due to buffer overflow for increasing values of buffer depth . 52

4.11 Arbitration logic among pixels in a PR to access the bus . . . . . . . . 53

4.12 Example of bus contention among consecutive pixels . . . . . . . . . . 54

4.13 Block diagram of the EOC readout logic . . . . . . . . . . . . . . . . . 55

5.1 Multi Snapshot Incremental Elaboration flow . . . . . . . . . . . . . . 57

5.2 Connection between DUT and the simulation environment . . . . . . . 59

5.3 Timing diagram of ToT counter (a) without SDF annotation (b) with

SDF annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4 Restore design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.5 Save SPEF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.6 Save DEF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.7 Floorplan of the final RD53 prototype . . . . . . . . . . . . . . . . . . . 68

5.8 Clock debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.9 Clock tree within a PR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.10 Decoupling capacitances at different levels of pixel chip . . . . . . . . . 72

5.11 Output file from dynamic power analysis with the resolution of 1ns . . 73

5.12 Power profiles in the activity condition with hits and triggers (zoom

on shorter simulation window) . . . . . . . . . . . . . . . . . . . . . . . 73

5.13 Power profiles in the activity condition with only clock sent to the

logic (zoom on shorter simulation window) . . . . . . . . . . . . . . . . 74

5.14 Power profiles in the activity condition with hits and no triggers (zoom

on shorter simulation window) . . . . . . . . . . . . . . . . . . . . . . . 74

5.15 Power histogram related to a PR tested with extreme hit rate and trig-

ger rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.16 Timing diagram of ToT counter . . . . . . . . . . . . . . . . . . . . . . . 76

5.17 Impact of clock tree on power consumption . . . . . . . . . . . . . . . . 77

6.1 Pixel readout system with E-links to opto-conversion modules . . . . 78

List of Figures ix

6.2 Pixel matrix architecture for the RD53A prototype . . . . . . . . . . . . 79

6.3 Example of Huffman tree . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.4 Application of Run Level encoding . . . . . . . . . . . . . . . . . . . . . 83

6.5 AFE behaviours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.6 Histograms of ToT values probability in a) slowA, b) slowB and c)

slowC modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.7 Application of modified Run Level encoding . . . . . . . . . . . . . . . 85

x

List of Tables

1.1 Pixel chip generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.1 Digital average power estimations for the logic relating to the AFE_TO

in the TYPICAL case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.2 Digital average power estimations for the logic relating to the AFE_PV

in the TYPICAL case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3 Digital average power estimations for the logic relating to the AFE_TO

in the WORST case (powerwise) . . . . . . . . . . . . . . . . . . . . . . 67

5.4 Digital average power estimations for the logic relating to the AFE_PV

in the WORST case (powerwise) . . . . . . . . . . . . . . . . . . . . . . 67

5.5 Digital average power estimations for the full pixel matrix (400x192

pixels) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.6 Comparison between the digital average power estimations consider-

ing 4-bits and 5-bits ToT for the logic relating to the AFE_TO in the

TYPICAL case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.7 Comparison between the digital average power estimations consider-

ing 4-bits and 5-bits ToT for the logic relating to the AFE_PV in the

TYPICAL case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.8 Post-placement density check. . . . . . . . . . . . . . . . . . . . . . . . . 70

5.9 Digital average power estimations due to buffers within clock tree in

the TYPICAL case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.1 Current 32-bit output packet . . . . . . . . . . . . . . . . . . . . . . . . . 82

List of Tables xi

6.2 Run Level Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.3 Modified Run Level Encoding . . . . . . . . . . . . . . . . . . . . . . . . 85

6.4 Example of Huffman dictionary . . . . . . . . . . . . . . . . . . . . . . . 86

6.5 Data compression efficiencies on data from PROCs with 50 × 50µm2

pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.6 Data compression efficiencies on data from PROCs with 25× 100µm2

pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

xii

List of Abbreviations

ADC Analog to Digital ConverterAFE Analog Front EndALICE A Large Ion Collider ExperimentASIC Application Specific Integrated CircuitATLAS A Thoroidal LHC ApparatuSAVFS Adaptive Voltage and Frequency ScalingSBVR Bandgap Voltage ReferenceCERN European Organization for Nuclear ResearchCMS Compact Muon SolenoidCVD Chemical Vapor DepositionCTS Clock Tree SynthesisDAC Digital to Analog ConverterDEF Design Exchange FormatDIBL Drain Induced Barrier LoweringDUT Device Under TestDVFS Dynamic Voltage and Frequency ScalingDVS Dynamic Voltage ScalingEOC End Of ColumnEOT Equivalent Oxide TicknessGBT GigaBit TransceiverGUI Graphical User InterfaceHL-LHC High Luminosity - Large Handron ColliderHPD Hybrid Pixel DetectorIC Integrated CircuitINFN Istituto Nazionale di Fisica NucleareIPC Inter Process CommunicationLEF Library Exchange FormatLHC Large Handron ColliderLHCb Large Handron Collider beautyMAPS Monolithic Active Pixel SensorMCD Macro Column DrainerMMMC Multi Mode Multi CornerMSIE Multi Snapshot Incremental ElaborationMSV Multiple Supply VoltageMVS Multi Voltage ScalingOOP Object Oriented ProgrammingOVM Open Verification MethodologyPC Pixel Chip

List of Abbreviations xiii

PDN Power Distribution NetworkPSO Power Shut OffPUC Pixel Unit CellROC Read Out ChipRTL Register Transfer LevelSDC Synopsis Design ConstraintsSDF Standard Delay FormatSLVS Scalable Low Voltage SignalingSPEF Standard Parasitics Exchange FormatSPI Serial Peripheral InterfaceSTA Static Timing AnalysisSVS Static Voltage ScalingTCL Tool Command LanguageTLM Transition Level ModelingTOT Time Over ThresholdUVM Universal Verification MethodologyVCD Value Change DumpVLSI Very Large Scale IntegrationWSN Wireless Sensor Network

1

Introduction

New generation pixel detector systems and ASICs for High Energy Physics (HEP)

application will be a big step forward and will have to face many technical chal-

lenges in terms of smaller pixels to improve tracking resolution, much higher hit

rates (3 GHz/cm2), unprecedented radiation tollerance (10 MGy), much higher out-

put bandwidth, and low power consumption. The collaboration, named RD53,

was enstablished by the ATLAS (A Thoroidal LHC ApparatuS) and CMS (Compact

Muon Solenoid) experiments at CERN to commonly develop the next generation of

hybrid pixel readout chips for the phase 2 pixel upgrades. This formal collaboration

has participating universities and research institutes from Europe and USA.

In my thesis work, I performed the digital power consumption analysis of the

readout prototype designed in the framework of CHIPIX65 and RD53 project. This

analysis has been fullfilled for the power driven optimization of the system architec-

ture.

In the remainder of this section I will describe the contents and the organization

of the thesis. In Chapter 1 an introduction on the state of the art of such detectors,

focusing on the description of the front end electronics for pixel readout and on next

generation requirements, will be presented. Because the front end electronics will

be developed by using a commercial 65 nm CMOS technology, widely adopted in

current VLSI systems, in Chapter 2 will be reported the main sources of power con-

sumption in CMOS circuits. Design techniques for power reduction at system and

functional block level will be also introduced by considering the power results, in

order to choose the best solution. In order to proceed with the analysis, it is re-

quired to understand and follow the design flow of the ASIC from the synthesis to

Introduction 2

the physical implementation of the circuit through an initial placement of all com-

ponents followed by the routing of power and signal wires. The description of each

step, with the description of the pixel simulation platform Verification Environment

for PIXel chips (VEPIX53) will be provided in Chapter 3. The digital architecture of

DUT being tested, will be described in Chapter 4 with the focus on the basic module

called Pixel Region (PR) of 4 by 4 pixels. In Chapter 5, the results of the average and

peak power consumption analysis under different corners and activity conditions,

will be presented and discussed in order to assess power impact of different aspects.

Finally, in Chapter 6 results about a proposed on-chip data clustering algorithms

will be described.

3

Chapter 1Front-end Electronics for Pixel

Detectors in Particle Physics

The developement of pixel detectors in particle physics has been primarily triggered

by two specific requirements which both have to be simultaneously met: the pos-

sibility to study short lived particles and the capability to cope with the increasing

interaction rates and energies (and therefore number of particles produced per col-

lision) of modern particle accelerators. Accelerators generate elementary particle

collisions at rate of 10-100 MHz, with 10-100 particle emerging from every collision.

Same particles live about 1ps and then decay in a few daughter particles [1]. The

topology of such decay is sketched in Figure 1.1; the vertex-collision point where the

particle is created is indicated with the label "V" and, the secondary vertex where the

particle decayed with label "D". Therefore, in particle tracking, pixel detectors have

high granularity to provide unambiguous particle track reconstruction and precise

3D measurements in the harsh environment close to the interaction point.

Particle detection is based on the production of electron-hole pairs along the

track of the charged particles, which in semiconductor Physics is described as the

fact that some electrons jump from the valence band to the conduction one. The

generation of electron-hole pairs can be described approximately by the ration be-

tween the energy released by the particle and the band-gap energy Ebg of the semi-

conductor. The main advantage of semiconductor detectors lies in the smallness of

the ionization energy: for silicon is 3.6 eV, compared with about 30 eV required to

Chapter 1. Front-end Electronics for Pixel Detectors in Particle Physics 4

create an electron-ion pair in typical gas-filled detectors [2]. Therefore the number

of charge carriers is approximately 10 times greater for the semiconductor case, for

a given energy deposited in the detector, and this brings two benefits for what con-

cerns the achievable energy resolution: a) lower statistical fluctuation in the number

of carriers per pulse and b) better signal-to-noise ratio (S/N) at low energies, where

the resolution may be limited by electronic noise.

Figure 1.1: Topology of a short lived particle decay together with or-dinary particles emerging from the same collision

Different approaches exist for pixel detectors depending on the way the sensible

part (i.e. the sensor) is linked to the readout Application Specific Integrated Circuit

(ASIC). In Monolithic Active Pixel Sensors (MAPS), whose basic block is shown in

Figure 1.2 , both electronics and sensor are integrated in the same substrate. Mostly

used for the detection of visible photons by using a CMOS technology, they feature

an inverse biased photodiode as sensing element; therefore they are often called

CMOS image sensors. Meriging both sensor and read-out electronics into a single

detection device, integrating the charge generation volume into the ASIC itself, aims

to reach lower cost, higher resolution, lower mass but has also drawbacks, starting

from the fact that silicon substrate used for electronics chips in most cases is not ideal

as silicon detector, where low resistivity would be the optimal choice.

The connection to the detector in the substrate is also critical, due to getting suffi-

cient charge collection (and speed) efficiency. For those reasons, at the state of the art

the baseline approach used for the running detectors are the Hybrid Pixel Detectors

(HPD) described in paragraph 1.1. Such technique is employed in all the Large Han-

dron Collider (LHC) experiments at the European Organization for Nuclear Physics

(CERN): A Large Ion Collider Experiment (ALICE), A Thoroidal LHC ApparatuS

(ATLAS), Compact Muon Solenoid (CMS), Large Hadron Collider beauty (LHCb)


and other some fixed target experiments.

Figure 1.2: Traditional MAPs building block [3]

Concerning the sensor materials, it should be underlined that the most com-

monly used is silicon. Nevertheless there are others options such as diamond for

which the conduction band is almost empty at room temperature (no depletion zone

required) and the band gap is small enough to create a large number of electron-

hole pairs through ionisation. For that reasons the diamond is an optimal material

to meet the two contradictory requirements of large signal and low noise since his

energy band gap of ~6 eV , but even artificial diamonds such as Chemical Vapor

Deposition (CVD) diamonds, are too expensive for large area detectors.

1.1 Hybrid pixel detectors

The intensive growth of the HPDs was initiated and is still driven by the develop-

ment for the LHC detectors, where very fast and radiation hardened devices are

required. The basic building blocks of such a pixel detector is sketched in Figure

1.3. In this detector a thin, segmented silicon sensor is connected to its own elec-

tronics and implement the so called hybrid detector because pixel Read-Out Chips

(ROC) and sensors are fabricated separately but they are mated via the bump bond

technique.

Planar integration technology allows for putting together several thousands of

those building blocks in a matrix covering few square centemiters. Therefore ma-

trices can then cover larger surfaces. The ionizing particle crosses the sensor and

generates charge that, moving in the depletion region under the action of an electric


Figure 1.3: Basic building block of a HPD [3]

field, produce signals. These are amplified, and hit pixels are identified and stored

by the electronics.

The two-dimensional high density connectivity is the key characteristics of the

hybrid pixel detectors and has three main consequences that are illustrated in Figure

1.4:

• the connectivity between the sensor and the electronics chip must be vertical;

• there is usually matching between the size of the pixel and the size of the front-

end electronics channel;

• the electronics chip must be very close (10 µm to 20 µm) to the sensor.

Figure 1.4: Sketch of a "blow-up" hybrid pixel detector

A sufficiently high bias voltage (~100 V) must be applied to deplete the sensor at

the backside plane while all the pixels are grounded .


1.2 The Front End Electronics

One of the biggest challenge in the implementation of a pixel detector is the design of

a dedicated readout chip with several thousand of electronic channels. These pixel

chips (PC) contain a regular arrangement of pixel unit cells (commonly denoted as

PUC) that must be as small as possible because their area dictates the sensor pixel

size and therefore the spatial resolution. As shown in Figure 1.5, the chip can be

divided into an active area which contains a repetitive matrix of nearly identical

rectangular (or square) PUCs and the chip periphery that controls the active part.

Figure 1.5: Geometry of a generic PC [1]

Furthermore PUCs are grouped in columns because the area in the PUC is mainly

occupied by the bus signals: power, bias and controll signals and output data flow

are routed vertically and only very few signals run horizontally. The chip periphery

is the bottom part of the chip and is usually organized by using repetitive circuits to

interface columns. The columns contain buffer memory to send data, global control

which is responsible for the communication outside of the chip, bias section and

wire bond pads.


1.2.1 Generic Pixel Unit Cell

Several circuit components are nearly always present in the elementary pixel unit

cells in the active area of the chip. The common circuit blocks are shown in Figure

1.6. The first one is the bump pad, a very sensitive node, connected to the input of

an inverting amplifier to convert the input charge to a voltage by using a feedback

capacitor. A feedback circuit is also required to define the dc-operation point of the

charge sensitive preamplifier and to remove signal charges from the input node so

that the preamplifier output voltage returns to its initial value.

Figure 1.6: Circuit blocks of a generic PUC [1]

The band pass filter is ofted included to limit the bandwidth of the preamplifier

output signal for a reduction of high and low frequency noise contributions. To de-

tect hits with a sufficiently large input charge, a discriminator compares the shaper

output to a threshold value which is distributed globally to all pixels and it is set

as low as possible in order to maximize the detection efficiency. In order to pro-

duce a binary output, to be read out from the digital front-end, different approaches

can be avaluated, i.e. the use of Analog-to-Digital Converters (ADC, individual or

shared between different PUCs) or the measurement of the Time Over Threshold

(TOT). The key concept behind each of them can be observed in Figure 1.7. While

the first is a quite generic way of achieving analog-to-digital conversion, the latter is

particularly used for HEP applications: the TOT is the time during which the signal

is higher than the discriminator threshold. It can be measured using a clock signal


and a digital counter. The timing accuracy with which the arrival time of a hits is

determined is important in particle physics experiments at the LHC where hits must

be associated to one particular bunch crossing with a precision of better than 25 ns

and for this reason the response time of the disciminator is crucial.

Figure 1.7: Comparison between charge digitization methods: ADCand TOT approaches [4]

The "time walk" problem can be explained considering the curve in Figure 1.8:

particles that deposit high charge produce a faster responce than those that deposit

a lower charge with amplitude just above the threshold QThr. Considering a ∆T <

25 ns wide time window (some jitter must be allower for other system components),

∆Q is the range of charge detectable "in-time": particles with charge close to the

threshold can be associated to the following collision.

Figure 1.8: Response time of a discriminator as function of the inputcharge [1]

The maximun possible hit rate is limited by the time required to process one

hit called "dead time", mainly dominted by the analog section but it may also be

determined by the time required in the following readout circuitry to process the

hit.


1.2.2 Readout Architectures

The readout of digital hit signals of the discriminators depends on the target appli-

cation. In particular, for particle physics applications, position, time and possibly

the corresponding pulse amplitude of all hits belonging to an interaction must be

provided. This requires a timing precision at least equal to the bunch crossing inter-

val (25 ns) for the detectors at LHC. The readout of data produced by each collision

can be done with two different approaches:

• triggerless: the readout of of every single event start immediately after the in-

teraction;

• triggered: a trigger system select only a fraction of the events for readout in

order to reduce the data volume sent to the data acquisition.

The trigger signal from other detector components (e.g calorimeters or muon

chambers) is often available after a fixed time interval called latency. The signal

generation tipically requires a few microseconds (which corresponds to ~100 inter-

action at LHC where the interraction rate in 40 MHz). Some starage logic is therefore

required to buffer the hits until the trigger signal arrives. The chip should be avail-

able to accept new triggers because several nearly consecutive readout requests may

occur before the data of a previous request has been completely sent out.

Several different readout that have been implemented to solve the problems of

hit buffering during the trigger latency can be found in [1]. The "time stamp" ap-

proach, e.g., is based on the record of the arrival time of the hits, so called time

stamp. When the trigger signal select a certain collision for readout, the time infor-

mation of all accumulated hits is compared to the triggered time stamp. Hence hits

with the correct arrival time are read out.

1.3 The RD53 collaboration

The LHC community tends to refer to pixel ROCs in terms of generations. The

present ATLAS and CMS pixel detectors are equipped with the so-called 1st and

2nd generation chips. 2nd generation chips have became operative after the so called

phase 0 (ATLAS) and phase 1 (CMS) upgrades that occurred after the end of the first


and the second Long Shutdowns of the LHC (LS1 and LS2) which have toke place

in 2013-2014 and in the winter 2016/2017.

Figure 1.9: LHC/ HL-LHC Plan [5]

Concerning the 3rd generation chips, will become operative for the phase 2 up-

grade after the third Long Shutdown (LS3) and the develop of these new generation

of pixel ROCs is the task of the R&D collaboration, named RD53, based on unprece-

dented requirements that are similar for the ATLAS and CMS experiments. The three

long shutdowns of the LHC, reported in Figure 1.9 have been planned in view of

the luminosity upgrade: the idea and goal of the High Luminosity LHC (HL-LHC)

project is to reach 3000 fb−1 of accumulated luminosity in ~10-12 years after the last

upgrade [5].

Table 1.1: Pixel chip generations [6]


As shown in Table 1.1 , the 3rd generation ASICs will have to support high hit rates

(2-3 GHz/cm2) and extremely hostile radiation conditions of 1 Grad. Comparing the

third generation parameters with the previous, we can observe the increase of trigger

latency (of a factor ~10), the use of the 65 nm technology node and three basic pixel

sizes of 25 × 100µm2, 50 × 50µm2 and 100 × 100µm2 that will be used in different

parts of the detector (e.g. elongated pixels in the end of barrel and square pixels in

the middle of barrel).

13

Chapter 2Low Power Design

Power dissipation is a very critical parameter that has to be taken into account dur-

ing the design of very large scale integration (VLSI) circuits. If in the early years

of circuit design the main concern was related to performance and die area, submi-

crometer and nanometer technologies have brought power consumption to a main

role. Some of the related problems are the heating in high performance systems that

leads to the necessity of an adequate cooling system, battery lifetime in portable de-

vices or in Wireless Sensor Networks (WSN) where one of the most critical issues is

represented by the limited availability of energy on network nodes sensor, and the

high power budget (e.g. the cost of a data center is detemined solely by the monthly

power bill, not by the cost of hardware or maintenance). In this chapter we will fo-

cus the attenction to CMOS devices, this technology being the most widely adopted

in current VLSI systems.

2.1 CMOS Power Comsumption

Power consumption in CMOS circuits is a function of switching activity, capacitance,

voltage, and the transistor structure itself. It can be divided into two different cate-

gories called dynamic (proportional to activity) and static (indipendent to activity)

components. Hence a carefull balancing between the two is one of the subtleties of

advanced low-power design.

Chapter 2. Low Power Design 14

In general, the total consumption in a CMOS circuit can be expressed as:

Power = Pdyn + Pleak (2.1)

2.1.1 Dynamic Power

The charging and discharging of capacitance is the main source of dynamic power

dissipation. In fact these operations are at the core of what constitutes MOS dig-

ital circuit design. Other contributions are parasitics effects such as short-circuits

currents and dynimic hazars or glitches (fast spikes usually unwanted).

(Dis)charging capacitors

For simplicity, we consider a CMOS inverter where the PMOS and NMOS transistors

form the resistive charge and discharge networks and the total capacitance of the

network is lumped into the output capacitance CL of the gate as shown in Figure

2.1. As the input switches from high to low, the NMOS pull-down network is cut

off and PMOS pull-up network is activated, charging load capacitance CL up to Vdd.

This charging process takes frome the supply an amount of energy equal to CLV2dd.

Half of this is stored on the capacitor and the other half is dissipated as heat in the

resistance of the charging network.

Then, when the input returns to Vdd the process is reversed and the capacitance

is discharged, its energy being dissipated in the NMOS network.

Figure 2.1: Dynamic charging in a common CMOS inverter

To convert the derived energy per operation into power, it must be multiplied

with the frequency of "power-consuming" transitions f0→1 of the output node, i.e.


the switching probability p0→1 of the output transition multiplied by the clock fre-

quency of the circuit f . This probability is the switching activity factor α, function

of the circuit topology and the activity of the input signals, and his good knowledge

has high influence on the accurancy of the power estimation. The average capacitive

power is expressed as:

Pdyn = CLV2ddαf , 0 ≤ α ≤ 1 (2.2)

Here αCL is called the effective capacitance of the module, and equals the average

amount of capacitance that is being charged in the module every clock cycles.

Glitch

An additional source of dynamic power dissipation (i.e., proportional to the clock

frequency) is the occurrance of events known as glitches. They occurs in CMOS

circuits when differential delay at the inputs of a gate is greater than inertial delay,

which results into increased gate switching and hence notable amount of power con-

sumption. Such a mismatch in signal timing is typically the result of different path

lengths with respect to primary inputs of the network.

Consider Figure 2.2, in the circuit we can see the unbalanced arrival times of the

inputs due to the inverter circuit in the lower input path of the NAND gate. Thus

the differential delay of the NAND gate is 2 units. This differential delay makes the

NAND gate to switch 2 times more than the required functioning forming spuri-

ous transitions at the output which consume some dynamic power called as glitch

power.

Figure 2.2: Basic circuit showing the formation of glitches


The glitch power is becoming more prominent in lower technology nodes and is

a dominant component in long chains of gates. The solution is balancing datapath,

for example with the introduction of buffers at the input of the logic gate.

Short-circuit currents

The other parasitic effect mentioned above are the short circuit currents. The mech-

anism of short-circuit power dissipation is depicted in Figure 2.3 for a CMOS in-

verter. During an input transition, there will be a time period in which both the

NMOS and PMOS will conduct, causing short-circuit current to flow from supply to

ground. This current flows within a time window, where input voltage is higher than

a threshold voltage of NMOS VTn (keeping NMOS on), and lower than a threshold

voltage of PMOS VTp below Vdd (keeping PMOS on).

Figure 2.3: Short-circuit current in CMOS inverter [7]

The peak value of these short circuit or crowbar currents depends on switch-

ing activity, similar to the capacitive power dissipation Pdyn and also on the ratio

between the slopes of the input and the output signals. The latter relationship is il-

lustrated in Figure 2.4, where the best case (left side) and the worst case (rigth side)

are considered.

In the best case the capacitance is very large and the output fall time is signif-

icantly larger than the input rise time because of the slower charging of capacitive

node. The input moves through the transient region, where PMOS and NMOS are

contemporely on, before the output starts to change . Hence VDS is approximately

zero and so the PMOS shuts off without delivering any current. Considering the


Figure 2.4: Short-circuit current function of input/output signals’slopes [7]

worst case, having a small capacitance, the fall time of the output is smaller than the

fall time of the input and VDS is close to Vdd causing a maximal ISC .

The short-circuit power can be modeled as a capacitor CSC function of the input

τin and output τout transition times:

CSC = k(aτinτout

+ b) (2.3)

where:

a, b = technology parameters

k = function of supply and threshold voltages, and transistor sizes

Therefore the short-circuit power will be expressed by:

PSC = CSCV2ddf (2.4)

This analysis may lead to the faulty conclusion that the short- circuit dissipation is

minimized by making the output rise/fall time substantially larger than the input

rise/fall time but a more pratical rule that optimizes the power consumption in a


glogal way is matching the rise/fall times of the input and output signals. This

limits the short-circuit power PSC to 10-15% of the dynimic dissipation Pdyn [8].

2.1.2 Static Power

Although dynamic power traditionally have dominated the power budget, static

power has became an increasing concern when scaling below 100nm and essentially

consists of the power used when the transistor is not in the process of switching.

With the scaling of devices, gate oxide thicknesses decrease and there is increased

probability of tunneling, resulting in larger and larger leakage currents. As shown

in Figure 2.5 , leakage can be divided into sub-threshold leakage, junction leakage

and gate leakage.

Figure 2.5: Leakage currents in MOS transistor

Sub-threshold leakage

Sub-threshold current between source and drain in an MOS transistor occurs when

gate voltage is below VTh. In fact, the current does not drop abruptly to 0 at VGS =

VTh but the MOS transistor is already partially conducting. This effect is called sub-

threshold or weak inversion conduction.

With the scaling of transistor size, threshold voltage decreases forced by the low-

ering of the supply voltage and the drain-source leakage increases. In Figure 2.6 we

can observe such dipendance: the leakage voltage at VGS = 0 goes up exponentially

with a linear reduction in threshold voltage.

An additional factor that affects the off-current is the impact of drain-induced

barrier lowering (DIBL) effect. Increasing the drain voltage, the potential barrier in


Figure 2.6: Impact of reduced threshold voltage on leakage [7]

the channel decreases lowering the threshold voltage. Such reduction is approxi-

mately linear with VDS as shown by the following expression: VTh = VTh0 − λdVDS .

Therefore the sub-threshold current depends exponentially upon both VTh0 and

VDS :

Ileak ∝ 10−VTh0+λVDS

S (2.5)

where S is called slope factor and measure how much VGS has to be reduced for the

drain current to drop by a factor 10. It is the slope of MOSFET’s current-voltage char-

acteristic in the sub-threshold region and its value at room temperature is 60 mV/decade.

Furthermore the dipendence of sub-threshold leakage on temperature is not neg-

ligible. In fact, with an encrease in temperature, we have the reduction of the thresh-

old voltage. This means that the sub-threshold leakage is exponentially dependent

on temperature.

Gate leakage

Another leakage effect that’s becaming significant in the sub 100 nm era is the gate

leakage. One of the attractive properties of the MOS transistor has always been its

very high (not infinite) input resistance but the scaling, leads to gate-oxide thickness

of a couple of molecules that cause a reduction in the gate resistance of the transistor,

as current starts to leak through the dielectric.

The gate oxide is scaled to increase the process transconduttance parameter


k′ = µCg and consequently to maintain the current drive of the short channel tran-

sistor where the saturation of velocity of carrier leads to reduction of mobility. As

shown in Figure 2.7 a way to keep gate leakage under control is obtain the increase

of the transconductance k’ replacing Si02 with materials with a higher permittivity,

so-called high-k dielectrics.

Figure 2.7: Evolution of the gate thickness and the gate leakage overvarious technology nodes [7]

The effectiveness of that kind of dielectric is measured by the equivalent oxide

tickness (EOT), which equals Tg(εox/εg) and the advantages are the reduction of gate

leakage and faster transistors.

Junction leakage

An other leakage contribution is junction leakage substantially smaller than the pre-

viously mentioned. These currents are due to the diffusion of minority carriers

through the reverse-biased junctions formed by the substrate and source and drain

regions.


2.2 Low power Techniques

From a chip-engineering perspective, effective energy management for a SoC must

be built into the design starting at the architecture stage and low-power techniques

need to be employed at every stage of the design, from RTL to GDSII (database file

format which is the de facto industry standard for data exchange of integrated circuit

or IC layout artwork). For this reason it is important to plan, at the first stages of the

design, a low-power design methodology which is capable of covering the following

issues [9]:

• power characterization and modeling (each cell must have a power model in

addition to the usual functional and timing models);

• power analysis method (when and how often to analyze, which modes of the

chip to check, how to use the obtained data for optimization);

• power reduction efforts (which are the power targets and which priority they

have, also with respect to other design parameters like die size, performance,

design time, etc...)

• power integrity (analysis of limits due to the power delivery network like elec-

tromigration, instantaneous peaks, IR drops).

2.2.1 At functional block level

Clock Gating

Clock gating is the most popular technique to reduce dynamic power dissipation in

synchronous circuits where the clock net is responsible for significant part of power

dissipation (up to 40% [10]). It reduces power by disabling the switching of clock

tree net in the parts of circuit that are at particular time inactive.

As shown in Figure 2.8, a clock gating element could be built as a simple AND

gate; one input is is clock while the second input is an enable signal used to control

the output: the clock toggles only when the enable signal is true, and is held steady

when the enable signal is false.


Figure 2.8: Clock gating implementation

Two different flavors of clock gating are commonly use: the local clock gating

which involves gating individual register, or banks of registers, whereas the global

clock gating is used to gate all the regesters within a block of logic.

Figure 2.9: Local and Global clock gating [7]

The modern electronic design automation (EDA) tools support automatic inser-

tion of clock gating that goes deep into design hierarchy.

Operand isolation

Designs which do not fully utilize their arithmetic datapath components typically

exhibit a significant overhead in power consumption. Whenever a module performs

an operation whose result is not used in the downstream circuit, it unnecessarily

consumes power. Hence the idea of operand isolation is to identify such operations

called redundant, and minimize their power overhead by selectively blocking the

propagation of switching activity through the circuit.

A basic application of the operand isoltion technique is shown in Figure 2.10.

If the output C of the adder a0 will be not stored in registers r0 and r1 because of a


certain configuration of multiplexer and register enable signals, a0 will continue to

compute a new output whenever there is switching activity at its inputs A and B,

therefore consuming power by executing redundant computations.

Figure 2.10: Design with operand isolation [11]

In such situation, to minimize the power overhead the concept of operand isola-

tion is applied using for example a transparent latches that "freeze" the inputs of a0

effectively preventing the propagation of switching activity into the module.

Pin swapping

The pin swapping technique consists in the reduction of dynamic power consump-

tion assigning a higher switching rate net to a lower capacitance pin. In fact some

cells can have input pins that are symmetric with respect to the logic function but

have different capacitance values.

For example we can consider 4 input NAND gate with different capacitance

value at the pin as show in the Figure 2.11. The high activity net is connected to

the pin no. 4 that is pin "d" which has the maximum input capacitance. Hence, the


Figure 2.11: Pin swapping applied on a NAND gate [12]

pin swapping is done between the pins "a" and "d" so that the high activity net is

connected to the pin with minimum input capacitance.

2.2.2 At system level

Power Gating

The power gating or power shut-off (PSO) is the most effective technique for re-

ducing leakage power. It consists in the adding of a sleep transistor between actual

ground rail and circuit ground called virtual ground as illustrated in Figure 2.12.

In the sleep mode the transistor is turned-off and the leakage path is cutted-off with

a substantial reduction in leakage. In fact, the virtual ground rail changes up to a

steady state value close to VDD.

Figure 2.12: Power gating implementation

However, it also has a drawbak that while switching back to the active mode

from the sleep mode, the virtual ground rail takes a long time to discharge through

the sleep transistor. This result in a significant wake up latency and limits overall

leakage savings by limiting how often a logic block can go in and out of the sleep

mode.


Multi VDD Design

Multi-voltage design techniques are based on reduction of dynamic power by re-

ducing the V 2DD term. The idea is have a design with different power domains, also

called voltage islands or power islands that have their separate supply voltage and

clock frequency. Several variant are possible and some of them are reported in Fig-

ure 2.13.

The simplest case of multi-voltage design is static voltage scaling (SVS) where

different power islands run at different fixed supply voltages. Its extension, called

multi voltage scaling (MVS), applies multiple fixed voltage levels to different power

domains. Another variant, known as dynamic voltage scaling (DVS) uses time-

varying voltages such that the supply is kept at a high value when maximum per-

formance is needed, but reduces the supply voltage at other times. Then, if the clock

frequency is adjusted along with the supply voltage, the approach is called dynamic

voltage and frequency scaling (DVFS).

Figure 2.13: Multi voltage design techniques: a) static voltage scaling,b) multi voltage scaling, c) dynamic frequency and voltage scaling, d)

adaptive frequency and voltage scaling


In adaptive voltage and frequency scaling (AVFS), the control of frequency/voltage

levels is provided by performance monitors implemented in hardware. The per-

formance of silicon is measured based on process and temperature variation and

the information is fed back to power controller that precisely adjusts the levels of

voltage and frequency according to the needs, minimizing power consumption per

processor domain.

These techniques reduce dynamic power but with the increase in complexity of

design (e.g, power supply grid) and verification. However they are not applicable

for the RD53 pixel chip because only one digital supply is indeed available and fre-

quency is also fixed in the matrix. It’s the same for the power gating treated above

since the system is continuously acquiring data and do not provide any gain in a

serial powered system.

2.3 Low Power within RD53 Collaboration

It is important to highlight that in the framework of RD53 collaboration the target

of low power design is not only the reducing average power dissipation, i.e. con-

sume less energy in the whole operation but also the knowledge of the maximum

current/power consumed by the chip. In fact the specific scheme used to power

pixel chip modules is the Serial Powering scheme [13] which practically means that

a chain of pixel chip modules is powered in series by a constant current as shown in

Figure 2.14.

At the module level, the needed supply voltages (1.2 V analog, 1.2 V digital) are

generated redundantly out of the current supply by several parallel shunt-LDO reg-

ulators which combine the capability of Low Drop-Out regulators to generate a con-

stant supply voltage with the feature of shunt regulators to assure a constant current

flow through the device burning the "surplus" current not drawn by the load.

The choice of such approach respect the usual power scheme of parallel pow-

ering of the modules with a constant voltage has been taken to save cable material

budget with the resulting decrease of power cable losses. If in the classic parallel

powering scheme the current through cables is I = NImod , with serial powering is

only I = Imod leading to a power reduction of a factor 1/N2:


Figure 2.14: Serial powering scheme [14]

Pcable(serialpowering)

Pcable(parallelpowering)=

RcableI2mod

Rcable(NImod)2=

1

N2(2.6)

As can be seen from Figure 2.15 , each module of serial power chain is sup-

plied with a constant current Imod large enough to deliver the required current of

the front-end electronics at its maximum current consumption (approx. 2A [15]) and

the current/power not needed by the front-end will be burned by the shunt regula-

tors.

Figure 2.15: Illustration of constant current mode of serial poweringwith varying power consumption of pixel chip [15]

The major worry is the presence of unavoidable digital power variations, which

could couple into the analog domain and if higher than the current provided to

the serial chain, would cause chip failure as highlighted. This must be handled

by an appropriate combination of decoupling capacitance which filter that dynamic

current peaks.


In that context, different low power techniques can be used for minimizing maxi-

mum current peaks such as clock gating, operand isolation and pin swapping. How-

ever other techniques, such as power gating and multi-voltage design (before de-

scribed), are not applicable in this case

29

Chapter 3Low Power Design Flow

Before analyzing the power consumption of the new generation pixel readout chip

named CHIPIX65 (CHIP for a PIXel detector in 65nm CMOS technology), that will

be treated in Chapter 4, it is necessary to understand and follow the design flow for

an ASIC shown in Figure 3.1.

3.1 ASIC design methodology

The design of an Application Specific Integrated Circuit (ASIC) can be divided into

two phases called the front end which is the starting point of the design process, and

the back end where the physical implementation is carried out. This flow has been

reproduced using Tool Command Language (TCL) scripts that are executed from

within a make file.

3.1.1 Front-end digital design stage

In the digital front end the syntesis of the design is performed. This process consists

in the generation of the gate-level netlist af the ASIC from its RTL description. After

the initial translation into a netlist consisting of generic logic functions and storage

elements (compilation phase), this netlist is then mapped to the target technology

using the elements available in the standard cell library (mapping phase). As exam-

ple, the post-syntesis schematic of a Pixel Region has been reported in Figure 3.2.

Chapter 3. Low Power Design Flow 30

Figure 3.1: ASIC design flow

Regarding the technology library, it consists of a set of standard logic cells (AND,

OR, etc.) and storage elements (flip-flops, latches) that have been designed for a

specific fabrication process.

Technology library contains different types of information which describe each stan-

dard cell:

• structural: description of the connectivity of each cell to the outside world,

including information about cell sizes, buses, and pins;

• timing: description of the parameters for pin-to-pin timing relationships and

delay calculation for each cell in the library. This information ensures accurate

static timing analysis (STA) and timing optimization of a design;

• power: description of leakage and standard cell internal power;

• functional: description of the logical function of every output pin depending on

the cell’s inputs, so that the synthesis program can map the logic of a design to

the actual ASIC technology;


• environmental: description of the manufacturing process, operating tempera-

ture, supply voltage variations, all of which directly affect the efficiency of

every design.

Figure 3.2: Schematic of a syntetized pixel logic

The design is then optimized for area, speed and design rule violations like max-

imum transition time or maximum capacitance violations (optimization phase). The

transition time of a net is the longest time required for its driving input to change

logic value and is available only for "input" pin, while available only for "output"

pin is the maximum (or minimum) capacitive load that the output pin can drive.

Figure 3.3: Setup and hold slack

Especially the timing optimization requires the definition of timing constraints

(Synopsis Design Constraints, SDC), defined using the TCL syntax, to the software


tools. In such files are specified the periods of all clock signals and delays related to

identified asynchronous paths. In a synchronous digital system, where data is stored

in flip-flops or latches, two types of violations are possible: a "setup violation" due to

the signal arriving too late to be captured within the setup time tSU before the clock

edge and a "hold violation" due to the signal changing before the hold time tHD after

the clock edge has elapsed.

These are detected through the hold slack time and the setup slack time defined as:

hold slack = data arrival time− data required time,

setup slack = data required time− data arrival time.(3.1)

As shown in Figure 3.3, data arrival time is the time required for data to travel

through data path while data required time is the time taken for the clock to traverse

through clock path.

Therefore negative slack implies that the circuit will not work at the given clock

frequency because data is not stable whitin the setup-hold window. The last phase

of the digital front end consists in the generation of the top-level netlist of the chip

which includes the analog blocks of the project.

Figure 3.4: Setup-hold window

3.1.2 Back-end digital design stage

The digital back end consists in the physical implementation of the circuit through

an initial placement of all components followed by the routing of power and signal

wires. The software used for this task is "Cadence SoC Encounter" which provides a

complete hierarchical design solution, including:


• flooplanning;

• hierarchical partitioning of the chip;

• block placement;

• logic and timing optimisation;

• signal wire and power routing;

• geometry, connectivity and process antenna verification;

• generation of stream data (GDSII).

The files supplied by the standard cell vendor and which have to be imported

into the tool, are a) technology libraries for I/O cells and standard cells and b) li-

brary exchange format (LEF or OpenAccess) database which contain process details

relevant to the digital back end such as preferred routing directions for the different

metal layers, minimum metal wire width and routing pitch as well as rules for the

via generation among the layers. Moreover, the following data has to be generated:

• synthesized gate-level netlist of the chip (output of digital front end);

• consistent LEF data for the standard cells, digital macros and the analog blocks

as well as the according technology library information. The same data has to

be available for I/O cells that are to be used for the pad connections of the chip;

• a SDC file which includes the timing, power and area constraints for the design

(output of digital front end);

• a CapTable file which serves as a basis for the RC calculations. This file in-

cludes values of metal-metal capacitances and resistances for various configu-

rations, stored within look-up tables (LUTs);

• a very initial floorplan specifying the coordinates of the die boundary, the I/O

ring and the core area.


Floorplan

Floorplanning involves the definition of placement regions for modules, defining

boundaries for top-level modules that will serve as partition boundaries, and macro

cell placement. The quality of the floorplan and the macro cell placement is cru-

cial to the result of the following timing driven placement of the standard cells and

the subsequent timing optimizations. An optimal macro cell and block floorplan

not only speeds up this placement, but also guarantees superior results in terms of

timing and reduced congestion [16].

Power plan

After the floorplan has been completed, the power distribution network (PDN) for

the complete chip is generated. Wide metal structures are generated around the

core (core rings) and around macro cells to predefine a coarse structure for the PDN.

These structures should be planned such that the final automatic power routing only

has to perform rather obvious connections because these are supposed to be routed

in a straight manner without excessive via insertion by the power routing software.

Hierarchical partitioning

The design is subdivided into a sets of smaller sub-systems, so called partitions, that

are then implemented separately. This reduces the complexity of the system, speed-

ing up the design process. Partition boundaries are drawn as module boundaries in

the top-level floorplan and the connectivity, more precisely the location of the differ-

ent partition pins is determined by two steps. First, a timing driven placement of the

whole design is performed. Second, a trial-route run is performed and the partition

pins are generated where the signal wires hit the partition boundary.

During the partitioning process itself, the timing of all signals crossing partition

boundaries is analyzed and the timing budget within the partitions is derived by

estimating parasitic capacitances and routing delays based upon the results of the

initial placement and trial routing. The design data of each partition, including the

derived timing constraints, the netlist of the partition, macro cell placement and

power routing is then stored within separate directories for implementation. Two


output files are generated: a GDSII stream (database file format which is the de facto

industry standard for data exchange of integrated circuit or IC layout artwork) for

inclusion in the top-level GDSII, and a data exchange format (DEF) file of the design

which rapresents the physical layout of an IC in an ASCII format, and is later on

used to un-partition the design and perform full-chip timing analysis.

Top level placement and routing

In the top level placement and routing phase, the following steps are preformed to

establish the final placement and routing of the standard cell netlist and the connec-

tivity to the IO-pads:

1. timing driven placement: due to the digital part being spacial restricted to a cer-

tain area of the chip, placement blockages are created within the remaining

areas prior to the actual placement. After placement, a trial-route run is per-

formed to determine if the placement leads to highly congested areas;

2. pre-Clock Tree Synthesis (pre-CTS): fix of nets that violate maximum transition

time and/or maximum capacitance design rules. Furthermore, the timing is

optimized for positive slack by means of gate resizing (changing the drive

strength of standard cells) or buffer insertion (buffers are inserted if the max.

drive strength is not sufficient);

3. Clock Tree Synthesis (CTS): clock networks are heavily loaded by the clock pins

of the flipflops they are driving. For synchronous designs, the propagation

delay through the clock network to the clock pins needs to be the same for all

destination pins (i.e. the clock skew has to be minimized) which requires the

buildup of a balanced buffer tree. This is done by a dedicated algorithm in this

step. The capabilities of this algorithm include the definition of clock groups

where a set of clocks is balanced for equal delay. On the one hand this is used

to synthesize different clock signals on the presented ASIC with equal delay

and on the other hand this is exploited to balance the path lengths from the

clock source to clock sinks;


4. post-CTS: the design containing the synthesized clock tree is again optimized.

To gain good routing results the optimization is also performed for hold vio-

lations in this step, after setup violations have been fixed. The outcome of this

step is the pre-final placed but not routed design;

5. routing: the design is now being routed in detail while taking into account

the timing constraints. The completion of this step without any design rule

or process antenna (isolated gate) violations finalizes the design of the digital

part.

Figure 3.5: Post place&route physical view of CHIPIX65

6. post-Route: this step only slightly improves the timing, as the routing is not

substantially changed, anymore. What makes this step necessary is that hold

violations possibly introduced by the router could be fixed;

7. stream out: the top-level GDSII file is generated by including GDSII files for all

instantiated standard cells, macro cells, and the one for the analog part into the

top-level design. The generated GDSII file can then be read into the Cadence


analog environment again (stream in) to have the chance of visually inspecting

the design;

8. sign-off RC extraction: for each process corner available in the technology li-

braries, a detailed parasitics extraction is performed. Delay calculation and

the Static Timing Analysis (STA) are performed and final timing reports are

written.

3.2 Power analysis flow

Low power design requires an approach that considers power in all steps of design

flow. Digital design tools allows the generation of power models starting at RTL

level and moving to a detailed post-layout power estimation with parasitics annota-

tion.

Figure 3.6: General and power ASIC design flow with related Ca-dence software packages used [17]

In Figure 3.6 we can observe the power analysis flow parallel to the general design

flow, described in the section 3.1, with the related Cadence software packageas used

in each step. Two type of analysis can be done:


1. RTL/gate-level power analysis (i.e. after synthesis without full layout place&route

information) used to drive architectural choices before going into a full de-

tailed design;

2. post place&route power analysis including:

• average power estimations under different corners and activity conditions

to assess power impact of different factors, important to understand vari-

ations in different operation modes and guide design choices;

• dynamic power under the variety of operating conditions and at different

time constants. It is the parameter to watch when designing the complex

power supply delivery networks for integrated circuits and systems.

The Cadence tool will be used for the post place&route power analysis is "Voltus

IC Power Integrity Solution". The inputs to be provided are shown in Figure 3.7 :

the netlist, parasitics information from standard parasitics exchange format (SPEF)

file, liberty models (dotlib or .lib, containing also leakage and standard cell internal

power description) and SDC.

Figure 3.7: Inputs and accuracy of Cadence Voltus engine [17]


Furthermore, in order to perform power analysis under realistic operation con-

ditions, the final netlist of DUT has been simulated by means of the VEPIX53 frame-

work to annatate its full activity into value change dump (VCD) file. Such file in-

cludes information about the activity propagation through combinational cells, se-

quential cell (based upon activity of input pin, set or reset pin, and enable pin),

macros, clock network and clock gating cells (based upon activity of clock enable

signals). If we specify just activity on primary inputs of the chip, we will get opti-

mistic and inaccurate results.

3.3 VEPIX53 simulation and verification framework

VEPIX53 stands for Verification Environment for PIXel chips and it is developed in

the framework of the RD53 Collaboration (simulation working group), using the

SystemVerilog language and the Universal Verification Methodology (UVM) class

library, to provide a versatile pixel simulation platform that supports pixel chips at

different levels of description.

Its reusable components, which will be descibed below, feature the generation of

different classes of parameterized input hits to the pixel matrix, monitoring of pixel

chip inputs and outputs, conformity checks between predicted and actual outputs

and collection of statistics on system performance.

3.3.1 System Verilog and UVM

SystemVerilog is often referred to as a hardware description and verification lan-

guage (HDVL). It is a superset of Verilog that provides many features to create com-

plete verification environments at a higher level of abstraction than what is possible

to achieve with a standard HDL. Some of the typical features of this HDVL that

distinguish it from a HDL are summarized in the list below [18]:

• constrained-random stimulus generation;

• high level structures, especially Object Oriented Programming (OOP) and Trans-

action Level Modeling (TLM): a transaction (object) is a group of information


shared between different components using methods (e.g get(), put()) called by

proper interfaces;

• multi-threading and inter-process communication (IPC);

• assertions primarily used to validate the behaviour of a design;

• functional coverage, that measures the progress of all tests in fulling the veri-

fication plan requirements.

On top of SystemVerilog, specific verification methodologies have also been de-

fined based on industry practices. The most recent ones are the Open Verification

Methodology (OVM) [19] and the Universal Verification Methodology (UVM) [20],

where the latter is becoming a mature standard used by a growing community. Com-

pared to standard SystemVerilog, UVM offers a set of more solid and documented

base classes for all the building blocks of the environment, as it can be noticed in

Figure 3.8. Therefore, VEPIX53 is based on SystemVerilog, that supports multiple

chip description levels, and UVM for the development of such reusable verification

environments to be used by multiple designers with different needs and goals.

Figure 3.8: Partial UVM class library [21]


3.3.2 Overall architecture

A block diagram of the first version of the VEPIX53 environment is reported in Fig-

ure 3.9. The environment is divided in three main parts [22]:

• a top module, which contains the DUT and hooks it with the rest of the envi-

ronment through interfaces;

• a layered testbench, which includes all the UVM Verification Components (UVC)

and the virtual sequencer;

• a test scenario portion, which defines the configuration of the UVCs and de-

scribes the tests that are performed during simulations by specifying the con-

straints to the input stimuli to be sent to the DUT.

Figure 3.9: Block diagram of the VEPIX53 simulation and verificationenvironment [22]

While it is possible to have different levels of description for the DUT, the test-

bench is implemented at TLM and the UVCs together with the components in the

test scenario are inherited from the UVM class library. TLM is a high-level approach

to modeling digital systems where details of communication among modules are

separated from the details of the implementation of functional units or of the com-

munication architecture.


In fact, the communication is not modeled through signals resembling the hardware,

but by high level channels where information is passed through transactions and

specific interfaces which adapt the environment with the DUT.

Top module

In VEPIX53 all interfaces contain signals used for synchronization of the DUT and

the environment itself, i.e. mainly the clock. The corresponding transactions are

reported in Figure 3.10.

The hit interface hit_if includes the charge signal generated in the pixel sensor

matrix due to particles crossing the detector. Hit transactions that transit in the cor-

responding TLM channel are composed of a time reference field (which refers to the

25 ns bunch crossing cycle in the LHC) and an array of incoming hits described by

the following fields: charge, delay (with respect to the bunch crossing cycle) and

identification of the pixel in the matrix, given through column and row addresses.

The second interface, trigger_if, is provided in the case the DUT is a pixel chip that

performs selection of events of interest by using a trigger signal. The corresponding

trigger transaction contains just a time reference.

A separate interface named output_data_if is available for the DUT output. Dif-

ferent subcategories of output transaction can be identified (e.g. processed hits or

register data) and the currently defined one, focused on the core task of the pixel

chip, contains the same data members as the incoming hit transaction.

Figure 3.10: Unified Modeling Language (UML) diagrams of thetransaction classes defined for VEPIX53 [22]

For monitoring the internal status of the DUT and collecting statistics on chip

performance, an optional interface was defined named analysis_if, containing in-

ternal DUT information to be monitored. The corresponding analysis transaction

includes internal information and is therefore DUT-specific.


Testbench

In the testbench, as shown in Figure 3.9, four different UVCs are defined, each

devoted to the respective interface, and a virtual sequencer. The latter generate

stimulus to create transaction-level traffic to the DUT through the run of uvm se-

quences inherited from the uvm_sequence class, named hit_cluster_sequence and trig-

ger_sequence.

Different classes of hits are generated by VEPIX53 thanks to the study of the

shape of their related clusters of fired pixels [22]. In partucular, as shown in Figure

3.11, these four classes of hits have been identified:

• single charged particles which fire a variable number of pixels, generating a

cluster of pixel hits. The average size of each cluster depends on the angle

formed by the particle with respect to the sensor;

• loopers, soft charged particle that in the solenoidal magnetic field become curl-

ing tracks [23];

• jets, collimated bunches of final state hadrons coming from hard interactions

[23];

• machine background particles, not really a special phemomenon for physics,

but a very extreme input to be used by engineers for extensive verification of

the design;

• noise hits, phenomena not directly associated with tracks, intended to add

background noise.

Figure 3.11: Classes of hits generated by VEPIX53: a) single chargedparticles, b) jets, c) loopers, d) machine background particles

To provide constrained random stimuli to the DUT (pixel chip matrix and trig-

ger logic), the hit and trigger UVCs include objects inherited from the uvm_sequencer


and uvm_driver classes. Hence they are driven by the virtual sequencer, which co-

ordinates the stimulus across different interfaces and the interactions between them

(the adjective virtual is related to the fact that the component is not directly linked

to a DUT interface). Moreover, the sequencer is connected to the UVM driver that

converts the transactions to signal-level stimulus at the DUT interface.

The output UVC includes an entity, inherited from the uvm_monitor class, which

reads the DUT outputs through the output_data_if interface, recognizes signal-level

activity and converts it into transactions to provide to analysis UVC.

The analysis UVC collects transactions from all the other components. Two main

objects, inherited from the UVM class library, are therein defined: a reference model

and a scoreboard. The former predicts the expected DUT output from hit and trigger

transactions, while the latter compares the predicted and actual DUT outputs pro-

viding error/warning/information messages, depending on the verification process

result, together with a final summary on total observed matches and mismatches. In

Figure 3.12 we can observe an example of simulation output which provides infor-

mation about an array of incoming hits, such as the charge of the particle, the delay

with respect to the bunch crossing cycle and the identification of the fired pixels in

the matrix (given through column and row addresses). Moreover, with the number

of total matches and mismatches, the number of single matches/mismatches are also

reported since it is possible a not correct detection of the whole sequence of hits.

Figure 3.12: Output example of the scoreboard object


Test scenario

On top of the framework block diagram there is the test scenario defined by the

uvm_test class that enables configuration of the verification components. The partic-

ular feature of this class is that can be directly launched using command line options

and it gets automatically instantiated. Than it is extended to obtain scenario-specific

configurations of the environment. Different tests are provided and for the purpose

of this work top_test3 and triggerless will be used for generating transactions with the

different classes of hits listed above and, respectively, sending or not of the trigger

signal to the logic.

In addition to the statistically generate hits, it is also possible import particle hits

from external full detector/experiment Monte Carlo simulations directly interfacing

the hit generator with the ROOT data analysis framework in order to observe the

real behaviour of the pixel readout chip.

46

Chapter 4Device Under Test: CHIPIX65

The purpose of this work is estimating the digital power consumption of the readout

ASIC prototype designed in the framework of CHIPIX65 project [24]. Hence, before

presenting the obtained results, it is important to introduce the architecture of the

system that has been simulated to extract its full activity.

The demonstrator has been designed to guarantee an efficiency bigger then 99%

at nominal 3 GHz/cm2 hit rate and 1 MHz trigger rate with 12.5 µs trigger latency at

HL-LHC. The building blocks of the system (see Figure 4.1) are:

• a matrix of 64× 64 pixels, each of dimension 50 × 50µm2. These pixels are

grouped in regions 4 by 4;

• the column-based bias network of the pixels, made by using a current mirrors

configuration;

• the End Of Column (EOC) readout logic based on replicated modules referred

to as Macro-Column Drainers (MCD). In the MCD triggered data coming from

pixel regions are stored. Then the dispatcher module pushes their contents

into a common output FIFO before being packed with 8b/10b encoding and

finally assembled and fed to a high-speed serializer (integrated as a standalone

macro);

• a programmable 10-bit DACs which provide all reference voltages/currents

Chapter 4. Device Under Test: CHIPIX65 47

Figure 4.1: CHIPIX65 demonstrator layout [25]

required by the two different architectures of Analogue Front-Ends (AFEs) in-

tegrated on the chip, a Bandgap Voltage Reference (BVR) for generating refer-

ence current of 4 µA distributed to global DACs, and a 12-bit ADC for moni-

toring bias/reference currents and the bandgap reference voltage;

• Scalable Low-Voltage Signaling (SLVS) transmitters/receivers used to interface

the core logic with I/O pads. The transmitters take Serial Peripheral Interface

(SPI) signal and convert them to a differential signal according to the SLVS

technology, which allows high-speed (high bandwidth), low power and low

noise communication.

The pixel array has been divided into 16 macro-columns, each of 16 Pixel Regions

(PRs) with a dedicated readout stage in the periphery. Moreover the matrix inte-

grates two different AFE architectures (so called flavors) working in parallel, one

with asynchronous and one with synchronous hit discriminator. These are char-

acterized by: a) very low power consumption (5 µA target at 1.2 V power supply)

b) low noise c) compact design and d) 5-bit ToT computation [26].


Both solutions, which cover half of the matrix each, were successfully validated

on silicon using dedicated small-prototypes and tested after X-ray irradiation.

4.1 Pixel Region architecture

The minimum synthesizable entity of the digital architecture is the pixel region, de-

signed to comply with RD53 specifications, that includes 16 AFEs (one per pixel)

grouped in 4 analog islands shown in Figure 4.2.

Figure 4.2: Analog islands in a PR

The advantage offered by the choise of a region-based digital architecture resides in

the possibility of sharing among pixels common functionalities and temporary data-

storage capabilities. For this reason this architecture has been called centralized.

Figure 4.3: Block diagram of a distributed pixel region buffering ar-chitecture [17]


Hence data from pixels are stored into a common single FIFO and not in indepen-

dent memories as for a distributed architecture of the FE65P2 demonstrator also

developed in the framework of the RD53 Collaboration (see Figure 4.3). The consid-

ered architecture (reported in Figure 4.4) includes:

• 16 independent pixels;

• a "region digital writer";

• a "region digital buffer";

• a "region digital trigger matcher";

• a "region digital output".

The behaviour of each component has been described below, focusing on the main

signals involved.

Figure 4.4: Pixel Region architecture [17]

4.1.1 Pixel Logic

The pixel region, as mentioned above, is composed of 16 pixel entities which contain

both the AFE macro and a dedicated digital interface between the AFE in/out ports

and the rest of the logic. The task of the pixel logic is to compute the ToT information

within a fixed deadtime and to flag the end of processing to the shared digital logic.

If the ToT count ends before that fixed deadtime, pixels can neither write to the


shared buffer nor start a new count, until the time runs out. As shown in Figure 4.5,

this impacts on hit loss depending on the value of such time window; it can be set

either to 5 clock cycles or 16 clock cycles.

Figure 4.5: Simulation results on hit loss due to pixel deadtime forboth the slow and fast frontend modes [17]

In the fast front end, the total analog and digital deadtime is equals to 5 clock cycles

only. In this case the percetage of hit loss is smaller than 1%, whereas for the slow

front end, is greater than 2%, that means 15 clock cycles [17].

Figure 4.6: Block diagram of a digital PUC

The block diagram of a digital Pixel Unit Cell (PUC) is reported in Figure 4.6. It

contains:

• a 5-bit ToT counter, driven by the output of the discriminator;

• a 4-bit deadtime counter, used to synchronize the write operation;

• a control logic for the ToT and deadtime counters.

The outputs of the "Pixel Logic" circuit are the ToT value and the hitReady signal

which flags the end of the process to the shared digital logic. In particular, it is sent

to the "Region Digital Writer" which reads the ToT value and resets the conter by


sending back the "hitReset" signal. In this way, every pixel is synchronized and the

shared logic can process the ToTs belonging to the same event together.

An example of ToT counting result is shown in Figure 4.7, where the differential

outputs of the AFE are highlighted in red. These signals enable the control logic of

the ToT counter to set the hit_disc signal that drives the count. Once the hit_disc is set

high, the counts of the ToT and the deadtime (set to 15 clock cycles) start. At the end,

when the deadtime is elapsed, we have the reset of both the counters, which will be

enable when the next hit arrives.

Figure 4.7: Timing diagram of ToT and fixed deadtime counters

4.1.2 Region Digital Logic

The regional logic is shared among pixels. It is composed by 4 modules working to-

gether, to send hit data to the readout block at the chip periphery. The block diagram

has been reported in Figure 4.8.

Figure 4.8: Block diagram of a shared region logic


The module interfacing directly pixels, is the "region digital writer" (rdw). It

synchronously checks ready pixels flags and saves into the region buffer a reduced

information packet: only the first 6 ToTs per event will be saved, using a priority

queue. This choice of limiting the number of stored hits to 6, has been made accord-

ing to the histogram reported in Figure 4.9.

Figure 4.9: Histogram of number of hit pixels per pixel region (4x4)in the extreme scenario at the edges of the barrel [17]

The histogram represents the distribution of the number of pixels fired per region

in the worst case, i.e. in the extreme case of the detectors sitting at the edges of the

barrel, where elongated clusters cause bigger cluster sizes. Therefore, the fact that

the average number of pixels fired per region is lower than 4, justifies the choice

of 6 ToTs saved per event. With such architecture, so called zero-suppressed FIFO

architecture, the memory usage is therefore optimized without writing unnecessary

zeroes.

Figure 4.10: Hit loss due to buffer overflow for increasing values ofbuffer depth [17]


These packets are then saved into a "shared region digital buffer" (rdb) composed

of 16 rows which guarantee an overall inefficiency below 0.1% [17]. Each word

consists of the following fields:

• 10-bit of timestamp;

• 16-bit of ToT map;

• 5-bit x 6 of ToT values;

• 1 valid bit.

The timestamp gives information about the arrival time of the hits related to a certain

collision and will be compared with the trigger timestamp, sent by the EOC, in order

to select data of interest. The 16-bits memoryToTMap vector, corresponds to the ready

flags coming from the 16 pixels and it is required to associate saved ToTs to the

proper pixels, after data reduction.

Figure 4.11: Arbitration logic among pixels in a PR to access the bus[25]

The task to select ToTs of interest and to mark the FIFO rows as "valid" or not (if

the corresponding valid trigger window has elapsed) is performed by the "shared

digital trigger matcher" (rdtm) module.

At the end, the "region digital output" (rdo) selects the triggered entries and

sends them down to the macro-column. Column arbiter, shown in Figure 4.11, is

based on a busy signal in a fast-or configuration: the region sends the triggered data

as soon as every previous PR has finisched transmitting theirs.

An example of this beahaviour is shown in Figure 4.12. In the figure, the in/output

signals of "region digital output" modules of 3 consecutive regions are highlighted


with different colours: in red the PR with address 0 (PR#0), in blue the PR with ad-

dress 1 (PR#1) and in purple the PR with address 2 (PR#2). We can observe as the

PR#2 accesses the shared bus because the regionBusyIn signal, coming from the PR#3,

is set to zero.

Figure 4.12: Example of bus contention among consecutive pixels

Once the PR#2 transmits its data, releases the bus and the PR#1 gets access to the

bus. The data packet regionOut consists of:

• 4-bit of address region;

• 16-bit of ToT map;

• 5-bit x 6 of ToT values.

4.2 EOC readout

The readout of zero-suppressed hit data coming from PRs, is performed by the "EOC

readout" logic which is a digital block placed at the chip periphery. As shown in

Figure 4.13, the readout of triggered data is based on replicated modules referred to

as Macro-Column Drainers (MCD).


Figure 4.13: Block diagram of the EOC readout logic [27]

Each MCD module includes two FIFOs. The "trigger FIFO" is used to temporarily

store triggers and trigger timestamps to be sent to PRs if a macro-column readout is

still ongoing. Because of a depth of 16 rows, it supports a maximum of 16 consecu-

tive triggers. The second FIFO, so called "drain FIFO", is 16 rows deep and is used

to buffer hit packets drained from PRs upon a trigger request.

The timestamp related to a certain collision and the trigger timestamp, buses

distributed to macro-columns, are generated by using binary counters which take

into account the trigger latency of 12.5 µs and the fixed deadtime.

Data stored on replicated MCDs are polled and read by a finite state machine,

looping over MCDs FIFOs and pushing their contents into a common output "dis-

patcher FIFO" (32 rows deep). Finally, data packets with 8b/10b encoding are assem-

bled and fed to a high-speed serializer (320 MHz clock) integrated as a standalone

macro.

56

Chapter 5Power analysis results

The power consumption analysis, described in Chapter 4, starts with the simulation

of DUT by using the VEPIX53 framework. This is necessary to verify the proper

behaviour of the circuit and then to annotate its fully activity. Before presenting the

results, therefore, we summarize the simulation flow.

5.1 Simulation flow

The main steps required for the simulation are:

1. the initial compilation of project library;

2. the elaboration of the design hierarchy;

3. the simulation.

After the compilation of the project library which contains all the source files of

the DUT, the elaboration step creates the simulatable model of the complete design

and testbench environment called snapshot. Two shapshots are generated: a primary

snapshot considering only the top-level DUT and a secondary snapshot with the

whole class-based Verification Environment.

Such an approach, reported in Figure 5.1, is called Multi-Snapshot Incremental

Elaboration (MSIE). Into design it is essential to separate the fixed part of the design

from the one that changes often, e.g. the type of test that will be performed. In

this way, a minor change in the secondary snapshot enables a fast rebuilding of the

whole environment, avoiding a reworking of the whole complex DUT.

Chapter 5. Power analysis results 57

Figure 5.1: Multi Snapshot Incremental Elaboration flow [28]

The simulation flow is handled through the use of the make command. This

command automatically builds executable programs and libraries by starting from

source code and reading files called makefiles which specify how to derive the target

program. The specific script (target) will be execute typing from terminal the com-

mand make followed by its name. In the work directory, that is the location where

the simulation is supposed to be run, two main files (linked together) can be found:

1. the Makefile file which defines the variables used in the compiling and running

scripts that are contained in the .defs file. In such file are included:

• a pointer to all the source code related to the DUT, pixel_chip_harness;

• a variable that points to the top level module of the project, called test-

bench, which instantiates the harness (containing the DUT);

• a variable that points to all the packages of the different components in-

cluded in the VEPIX53 framework (see Section 3.3);

• a pointer to the package that holds all the tests;

• a variable gathering the paths to the directories to be included when the

verification environment source code is compiled;


• a variable that specifies the test to be run. For our purposes it is set as:

RUNTESTS_OPTS = +UVM_TESTNAME=top_test3

2. the Makefile.defs which contains scripts simply refering to the previously listed

variables. In details:

• the variables related to the considered UVM version (v1.1) and pointing

to the UVM library location;

• the variable containing the installation path of the Cadence Incisive sim-

ulation tool;

• a script (to launch it type make clean) for cleaning all the results from a

previous simulation;

• a script for compiling the project library, that creates the library where

all the source files related to the DUT are contained (and to which the

following commands refer). For example, to perform a post place&route

simulation of the chip, the plib_pnr_pnr script has been defined pointing

to the post place&route netlists of the DUT;

• a script (to start it use make run1) for generating the primary snapshot

with the top level module of the project;

• a script for generating the secondary snapshot and running the simula-

tion. In this step, all the packages of the verification environment and the

specified test are compiled. If we want open the Graphical User Interface

(GUI) we have to use make rungui_bak; otherwise, make run2 to see the

simulation results in the shell.

During the execution of the scripts, output .log file are produced (i.e. compile.log,

irun1.log, irun2.log) and can be found in the work directory. These allow the user to

verify the correct result of each step through warning and error messages which are

detectable looking for the keywords *W (for warning) and *E (for errors).


5.1.1 Pixel Chip Harness

The pixel_chip_harness module interfaces the device under test with the simulation

framework VEPIX53. As shown in Figure 5.2, in this module are contained:

• the DUT instantiation;

• the generation of clock/reset signals;

• the connection of the UVM interfaces with the DUT performing the communi-

cation with VEPIX53.

Figure 5.2: Connection between DUT and the simulation environ-ment

Moreover, in this module we can set the value of configuration signals which

control the behaviour of the chip such as the "HIGH_DEADTIME" and the "TRIG-

GERLESS" signals. For our pourposes we have:

wire HIGH_DEADTIME = 1’b1;

wire TRIGGERLESS = 1’b0;

In this way, we use a deadtime of 15 clock cycles (time available for each pixel

logic to provide the ToT value) and a triggered approach to readout the data produced

by each collision.


To guarantee a correct simulation, the Standard Delay Format (SDF) annotation of

the timing delays among standard cells has been performed. Such a file normally

includes path delays, timing constraint values, interconnect delays and high level

technology parameters. It is performed through the $sdf_annotate command. Be-

low, for example, it is reported the code needed to annotate the SDF file of the EOC

module:

$sdf_annotate "pnr_sdf/CHIPIX_EOC.sdf",

pixel_chip_tb.harness.pixel_chip.CORE.EOC,

"sdf.log",

"MAXIMUM";

In this code we can observe the following parameters:

• the name of the SDF file;

• the name of the module that we want to annotate;

• the name of the log file containing the status information, warning and error

messages from the "SDF Annotator";

• the delay value that we want to annotate: "MAXIMUM" stands for the annota-

tion of the maximum delay values.

To understand the effect of the SDF annotation, the timing diagram of ToT counter,

with and without such annotation, is reported in Figure 5.3.

The last important thing to do inside the harness is to assign the output values of ToT

converters for each pixel of the matrix. After that, DUT can be tested by considering

different operative conditions. The ToT_converter module, which emulates the AFE

behaviour for simulation purposes, generates the discriminator output "disc_out"

that is the input pulse for the digital part of the PUC. This input coresponds to the

output of the AFE called "pixel_in", which communicates to the digital logic that the

pixel has been hit.

These assignements are performed by defining a text macro called LIST, which

is produced by a Python script, using the print statement as reported below:


Figure 5.3: Timing diagram of ToT counter (a) without SDF annota-tion (b) with SDF annotation

for flavor in ["TO", "PV"]:

for col in range (0,8):

for reg in range (0,16):

for pixr in range (0,4):

for pixc in range (0,4):

col_offset=0 if (flavor=="TO") else 8;

print(assign pixel_chip_tb.harness.pixel_chip.

CORE.MATRIX.\ MacroColumn_%s[%d] .\

PixelRegion[%d] .\ pixel[%d] .AFE.pixel_in

= digital_hit[%d][%d]; \\"

% (flavor,col,reg,(pixr*4+pixc),(reg*4+pixr),

((col_offset+col)*4+pixc)))

In this way, the script generates recursively the input pulses of all 4096 pixels (16 PRs

x 16 MCs x 16 pixels) of the matrix. These assignments are saved in a file included

in the harness. To produce such file, we must run from terminal the following com-

mand:

./<file.py> > <out_file.res>

This approach is useful due to the large number of assignments recursively pro-

duced.


5.2 Input files for power analysis

As mentioned in section 3.2 , to perform power analysis with Cadence Voltus engine,

we need several files: the netlist, the SPEF files, the DEF files, the VCD files which

provide the activity of DUT and the technology libraries.

5.2.1 SPEF extraction

A SPEF file contains parasitics information of the module under test. Such an an-

notation is performed by the tool "Cadence Encounter", performing the place and

route step in the design.

First of all the last version of design, saved after the placement and routing steps,

has to be restored. It should be underlined that it is possible obtain several files con-

taining the design after each step of the flow. Hence it is important to choose the

required file, otherwise we don’t have matching between the names of the mod-

ules/nets saved into the SPEF file and the names which the tool find in the netlist.

To restore the design we have to click the "File" tab, than the "Restore Design. . . "

and at the end the dialog box in Figure 5.4 is opened. At this point we select "En-

counter" as data type, and insert the .enc file of interest.

Figure 5.4: Restore design

Now it is possible to extract SPEF files and DEF files (of which we talk later). The

extraction can be done by using the Graphical User Interface (GUI) or the following

text command running in the Encounter Command Line:

rcOut -spef ./PixelRegion_tc.spef -rc_corner RC_TYPICAL


The required parameters are:

• the name of the output file;

• the RC corner that is used to generate the parasitics output file.

The "RC corner" contains the Resistance-Capacitance (interconnects) model pro-

vided by the "qrcTechFile". Such corners are available for the worst case, typical case

and best case (which means in this case lower parasitics values).

The same operation can be done by using the GUI and clicking the "Timing" tab,

"Extract RC. . . " than the dialog box in Figure 5.5 will be opened. Here we can write

the name of the SPEF file and set the parasitcs corner to be used.

Figure 5.5: Save SPEF

5.2.2 DEF generation

A DEF file rapresents the physical layout of an IC in ASCII format and it can be

generate from Encouter after restoring the DUT as explained above. Also in this

case we can use the GUI or the following text command:

defOut -floorplan -netlist ./PixelRegion.def

In addition to the filename, these parameters should be settled:

• -floorplan to write the floorplan data in the DEF file. This information includes

chip size and all placed standard cells;

• -netlist to write the netlist (i.e. the routing connectivity information) in the DEF

file.


Using the GUI we have to click the "File" tab, than the button "Save" and select "DEF".

Than the dialog box in Figure 5.6 appear and the name of the file and the extraction

options can be choosed.

Figure 5.6: Save DEF

5.2.3 VCD generation

The main files providing to Voltus engine for correct power results, are the VCD

files where the fully activity of the DUT is annotated. To create such files we have to

run the simulation for the desidered time window by using the "Cadence Incisive"

simulation tool. Before running the simulation, we need to perform the following

steps, settled by using the Incisive Command Line:

1. open a VCD database specifying the .vcd filename:

database -open db -vcd -into ./PixelRegion.vcd -default

2. probe the object (module) to save into the database, i.e. the signals which will

be trace:

probe -create pixel_chip_tb.harness.pixe_chip.CORE.PixelRegion

3. include all "child" scopes in the VCD, i.e annotate all module signals:

-database db -depth all

The VCD file with Cadence SimVision Debug tool could be used by converting it in

a file with .trn extension to check the annotation.


5.3 Average power consumption

The first step of this work is the average power consumption analysis of the digital

pixel array of CHIPIX65 chip by considering the final netlist (post place&route) of a

PR (the basic block of the matrix). Hence, a "signoff" analysis has been performed

to verify the RD53 specification on digital power consumption (< 5µW/pixel [29])

and to study the impact, on power consumption, of different parts of the logic.

The presented results have been obtained under different corners and activity

conditions in order to assess power impact of different aspects. The two considered

corners, taken from the technology libraries, are:

1. the typical corner: Resistance-Capacitance (interconnects) typical model (qrcTech-

File), typical standard cell technology library, operating voltage equal to 1.2 V

and operating temperature equal to 25◦C;

2. the worst corner (powerwise): Resistance-Capacitance (interconnects) best model

(qrcTechFile), best (i.e. fast) standard cell technology library, operating voltage

equal to 1.32 V and operating temperature equal to 0◦C;

The different activity conditions taken into account are a) the operative condition

with extreme hit rate (3 GHz/cm2) and trigger rate (1 MHz), b) the same operative

condition of (a) but without triggers and, c) the situation in which only the clock

signal has been sent to the logic. These activities correnspond to different VCD files

(see Section 5.2.3), extracted for logic related to both the AFEs developed by Torino

(AFE_TO) and Pavia (AFE_PV) INFN Institutes (in equal part in the pixel array).

Once the design (netlist, DEF, SDC, SPEF, technology files) has been loaded in

Cadence Voltus tool, we have:

1. to set the corner (analysis view) for the analysis:

set_power_analysis_mode -analysis_view MMMC_TYPICAL

-set_power_output_dir $REPORT_OUTPUT_DIR

2. to read the activity of the DUT saved in a VCD file and to set the simulation

window of 500 µs where the power is averaged:


read_activity_file -format VCD $VCD_FILE_PATH_NAME

-scope pixel_chip_tb.harness.pixel_chip.CORE.PixelRegion

-start 250 -end 500000

3. to set the name of the output file:

report_power -outfile power.rpt -view MMMC_TYPICAL

In this file, in addition to the power results which will be reported below, we can

check:

• the technology library used;

• the power domain:

Rail: VDD Voltage: 1.2

• the parasitics (SPEF) and DEF files;

• the switching activity file, time window and design annotation coverage in-

cluded. The coverage between the signals saved into the VCD file and the

signals of DUT has to be 100 % to ensure the correctness of the results.

In Tables 5.1 and 5.2 the power results in different activity conditions for the logics

related to the AFE_TO and the AFE_PV are reported.

Table 5.1: Digital average power estimations for the logic relating tothe AFE_TO in the TYPICAL case

Logic Clock Tree Total per Pixel

Hits and Triggers42.34µW 92.61µW

134.9µW 8.43µW(31.38 %) (68.62 %)

Only Clock35.6µW 91.7µW

127.3µW 7.95µW(27.95 %) (72.05 %)

Hits and No Triggers37.83µW 91.77µW

129.6µW 8.1µW(29.18 %) (70.82 %)

These results, extracted in the typical case, show the power contribution due to the

clock tree and to the rest of the logic. It should be underlined that the power con-

sumption per pixel, has been obtained just dividing the total power of the PR by the

numer of pixel included (which is equal to 16).


Table 5.2: Digital average power estimations for the logic relating tothe AFE_PV in the TYPICAL case



121.8µW 7.61µW(35 %) (65 %)


112µW 7µW(30.92 %) (69.08 %)


115.77µW 7.24µW(33 %) (67 %)

The same operation has been done in the worst case. It should not be considered as

a realistic case for all the standard cells in the design, but must be taken into account

to be aware of the worst possible impact of technology variations (see Tables 5.3,

5.4).

Table 5.3: Digital average power estimations for the logic relating tothe AFE_TO in the WORST case (powerwise)



176.2µW 11µW(33.67 %) (66.33 %)


164.2µW 10.26µW(29.5 %) (70.5 %)


168.5µW 10.53µW(31.25 %) (68.75 %)

Table 5.4: Digital average power estimations for the logic relating tothe AFE_PV in the WORST case (powerwise)



167.6µW 10.48µW(38.72 %) (61.28 %)


153.2µW 9.58µW(34.5 %) (65.5 %)


159.2µW 9.95µW(36.78 %) (63.22 %)

In both corners, we have a decrease on power consumption (about 6-8 %) going

from the activity condition where the hits and triggers are enabled to the case in

which only the clock is sent to the logic. The power decreases of about 4-5 % when


only the hits are enabled. But the most noticeable thing is that the power burnt

by the clock tree is dominant, as it is the most active part of the design and it has

to drive long wires (and capacitance) and often also needs buffers with relatively

high driving strength. This issue is becoming typical in modern microelectronics

processes, where the scaled transistors tend to consume (by themselves) less and

less power whereas the interconnects (clock at most, for around 50 %) are playing a

significant role in increasing power consumption.

Figure 5.7: Floorplan of the final RD53 prototype [30]

Before investigating this high contribution due to the clock tree, the results given

per pixel has been scaled to the full pixel matrix of the final RD53A prototype which

consists of 400x192 pixels (20 × 9.6mm2), as shown in Figure 5.7. These data are

reported in Table 5.5.

Table 5.5: Digital average power estimations for the full pixel matrix(400x192 pixels)

typ worst

Hits and Triggers 0.32 W/cm2 0.43 W/cm2

Only Clock 0.3 W/cm2 0.4 W/cm2


We remember that the power budget (not only digital) at HL-LHC for the inner-

most layer of the detector in ATLAS an CMS is less than 0.5 W/cm2 [31].

Two studies have been made to understand the reasons of that high power con-

sumption per pixel. In particular the influence of the number of bits chosen to dig-

itize the signal coming from the AFE for each pixel and the analysis of clock tree

implemented within a PR.

5.3.1 Influence of signal digitization

When a particle hits the pixels, the charge collected by the sensor is digitalized as

ToT, provided by a counter into the hit logic of each pixel. In this prototype the

numer of bits chosen for such digitization is 5.

To know the difference on power consumption between these two cases, the de-

sign flow of the chip has been performed by considering 4 bits of ToTs. The results

of this analysis are reported in Tables 5.6 and 5.7.

Table 5.6: Comparison between the digital average power estima-tions considering 4-bits and 5-bits ToT for the logic relating to the

AFE_TO in the TYPICAL case

4-bits ToT 5-bits ToT

Total per Pixel Total per Pixel

Hits and Triggers 128.1µW 8µW 134.9µW 8.43µW

Only Clock 120.4µW 7.53µW 127.3µW 7.96µW

Table 5.7: Comparison between the digital average power estima-tions considering 4-bits and 5-bits ToT for the logic relating to the

AFE_PV in the TYPICAL case

4-bits ToT 5-bits ToT

Total per Pixel Total per Pixel

Hits and Triggers 114.3µW 7.14µW 121.8µW 7.61µW

Only Clock 105.3µW 6.58µW 112µW 7µW


We can observe as the power decreases using one flip flop less for each ToT counter

is only about 5 % for both the logics under test. For this reason, such reduction has

not been applied.

Table 5.8: Post-placement density check.

Density

TO PV

4-bits ToT 71.14 % 66.12 %

5-bits ToT 77.47 % 72.3 %

In Table 6.1 is reported the post placement density, which is the percentege of

area (dedicated to each PR) occupied by macros and standard cells after placement.

The power consumption decreases of about 6 %.

5.3.2 Clock tree analysis

Because of the power burn by the clock tree is dominant respect to the rest of the

logic, we analyze in detail the clock tree within a PR. Through the Cadence En-

couter tool is possible visualizing the clock tree architecture where we can observe

the presence of a big amount of buffers needed to drive long wires and therefore

high capacitances (see Figure 5.9). After restoring the design, we have to click the

"Clock" tab, click "Debug Clock Tree . . . " and the dialog box in Figure 5.8 is opened.

Here we can choose the name of the clock which drive the clock tree.

Figure 5.8: Clock debugging


The used buffers have different strength and are placed automaticaly by En-

counter. The used buffers are included in the standard cell library provided by the

foundry, where are reported information like the cell sizes, the fan-out and leakage

power consumption.

Figure 5.9: Clock tree within a PR

Once selected, the contribution of the different buffers to the power consumption

has been analyzed. As reported in Table 5.9, in the normal activity condition (with

both hits and triggers), about 50 % of power can be associated to buffers.

Table 5.9: Digital average power estimations due to buffers withinclock tree in the TYPICAL case

TO PV

Clock Tree 92.61µW (68.62 %) 79.2µW (65 %)

CT Buffers 68µW (50.4 %) 56.92µW (46.71 %)


5.4 Peak power analysis

The power optimization goal is not only to reduce average energy consumption but

also to quantify and limit digital logic power fluctuations as much as possible. In-

deed, a major worry is the presence of unavoidable digital power variations, which

could couple into the analog domain and if higher than the current provided to the

serial chain, would cause chip failure (see Section 2.3).

For this reason it is necessary to provide information about the dynamic power

dissipation of the chip. This information can be used to size the decoupling capac-

itances inside the chip and the pixel module needed to filter these short dynamic

current peaks. In Figure 5.10 we can see the lowpass filtering from the chip to the

serial power network due to on-chip decoupling, shunt-LDO decoupling and on-

module decoupling.

Figure 5.10: Decoupling capacitances at different levels of pixel chip

To perform such an analysis the report_vector_profile command, characterized by

the following parameters, has to be executed by using the "Voltus Command Line":

• -event_based_peak_power, to capture power profiles using very small resolutions

(step in ns);

• -write_profiling_db, to write the profiling databases;

• -outfile, to define the name of a report where time windows with maximum

power consumption are provided;

• -step, to set the analysis resolution (expressed in nanoseconds);

• -nworst, to set the number of time window reported within the output file.


An example of output file is reported in Figure 5.11 where the 20 time windows

with maximum power have been saved (-nworst 20).

Figure 5.11: Output file from dynamic power analysis with the reso-lution of 1ns

In this work we focused on the PR with the logic dedicated to interface the

AFE_TO (characterized by the higher average power consumption), considering the

same activity conditions and the same simulation window of 500 µs used for the

previous analysis.

Figure 5.12: Power profiles in the activity condition with hits andtriggers (zoom on shorter simulation window)

The power peaks have been studied at different time scale:

• power variations at short time constant, within the clock cycle (~ps- 25 ns),

which should be filtered out by on-chip decoupling;


• power variations at longer time constant (~1-10µs), seen at the powering sys-

tem level.

After the extraction of the profiling databases, the .trn files can be opened using

the SimVision interface through the command simvision power_profile.trn

and the histograms, reported in Figures 5.12,5.13 and 5.14 are produced.

Figure 5.13: Power profiles in the activity condition with only clocksent to the logic (zoom on shorter simulation window)

Figure 5.14: Power profiles in the activity condition with hits and notriggers (zoom on shorter simulation window)

In the right side of each figure there is information on the maximum percentage

of the average power reachable by the peaks, and in the upper right side of each

histogram the maximum power required by the PR being tested. As expected, the

hits (coming from each collision) have the most impact on peaks while, in the activity

condition without hits and triggers, we have lower peaks as only the clock is sent to

the logic.


5.5 Outcome of the power analysis and optimisation

From the obtained results, we can evaluate which techniques could be used to re-

duce the average power consumption of the digital pixel array. Moreover these

techniques have to be selected based on the requirements of the target application.

As shown in the power histogram in Figure 5.15 the power consumption of the

design can be divided into different components:

• the internal power based on multi-variable models of standard cells which in-

clude both Pshort−circuit dissipated by the instantaneous current between the

supply and ground during a switch of state (see 2.1.1), and Pswitching of inter-

nal nodes;

• the leakage power due to a combination of parasitics currents of the CMOS de-

vice;

• the switching power due to charge and discharge of load capacitances.

It is evident that power consumption is dominated by the dynamic component,

due to the high rates and continuous operation in nominal conditions. Therefore,

power reduction techniques discussed are mainly targeted to dynamic power opti-

misation.

Figure 5.15: Power histogram related to a PR tested with extreme hitrate and trigger rate

Potential highly effective design techniques for dynamic power reduction also

useful for leakage, such as multiple supply voltages and dynamic scaling of volt-

ages and frequency (see 2.2.2), cannot be used for the RD53 chip. Because of system


considerations, it is planned to have only one digital supply (1.2 V) and frequency

is also fixed to 40 MHz in the pixel matrix. It should also be highlighted that such a

prototype will be tested in a very hostile radiation environment, causing consider-

able performance degradation. For this reason the voltage supply has been chosen

to limit performance degradation after radiation and simplicity of the design has

been preferred. In case radiation effects will be proven to be less critical than what

it is currently expected, lower voltage for the digital logic (1-0.8 V) may be tested for

potential use in future versions.

Figure 5.16: Timing diagram of ToT counter

Therefore the design technique to reduce dynamic power in these digital circuits

could be the clock gating, i.e. masking the clock to synchronous circuits during idle

state, in order to avoid unnecessary switching. In fact the hit rate per pixel (75 kHz)

is significantly lower than the clock frequency of 40 MHz.This technique can be per-

formed manually at RTL level or by the synthesis tool (e.g. "Cadence RTL Compiler")

which is can implement clock gating by recognising sequential logic which features

enabling logic. The clock gating can be implemented inside the pixel logic to prevent

clock from being propagated to the ToT counter when it is not enabled. Indeed, in

Figure 5.16, we can observe the absence of such technique.

As regards the use of the syntesis tool we can set the number of gateable flip

flops. It is important looking for a tradeoff between power saved from the reduced

activity and power consumed by the additional cells.

Moreover, because of the high consumption due to the clock tree (see the his-

togram in Figure 5.17), clock tree optimisation can represent an important mean of

power reduction. Power saving can be achieved modifying the CTS options, i.e. the

target clock skew and the maximum transition time allowed. This task is performed

after placement by the Cadence Encounter tool which, without skew and transition

time constraints, works with the aim of buffering clock nets in order to balance the


skew and minimise insertion delay. Hence, by default, the clock tree engine would

be mostly timing driven.

Figure 5.17: Impact of clock tree on power consumption

78

Chapter 6On chip data clustering

Extremely challenging requirement for HL-LHC is not only the development of a

low power system but also of a low mass system in order to have highly "transparent"

detectors, maximizing the interaction of particles with the active part of the sensors

while minimizing similar interactions with auxiliary material. In fact, each detector

contains a huge amount of electronics components, cables, cooling and mechanical

infrastructures which impact on material budget.

In the context of RD53 Collaboration (i.e. at readout chip level) it is possible

contribute to make the whole system a low mass system performing a readout data

compression in order to reduce the output rate and therefore the usage of links. It

should be reminder that the phase 2 pixel detector will have to sustain pixel hit rates

of up 3 GHz/cm2 for the innermost inner layer. Moreover, for a 2 cm x 2 cm pixel

chip and 1 MHz trigger rate, a readout bandwidth of up to 4 Gbits/s will be required

[32].

Figure 6.1: Pixel readout system with E-links to opto-conversionmodules [32]

Chapter 6. On chip data clustering 79

Since Electrical links (E-links) with a badwidth of 1.28 Gbits/s are used for data

readout, we need 3-4 E-links for each Pixel Read-Out Chip (PROC). These links are

then connected to opto conversion modules, so called GigaBit Transceiver (GBT),

which convert data into high rate optical links at 10 Gbits/s to cover the large dis-

tance (> 100 m) and very high data rates to the Data AcQuisition (DAQ) system.

6.1 RD53A prototype

The research of an efficient data compression algorithm, starts from the knowledge

how the input data come out from the pixel matrix of the full scale demostrator chip

RD53A, where the algorithm could be implemented. In fact, since the finite storage

capability on the periphery of the chip, we need to process such data before the

arrival of new inputs which can not be lost.

As shown in Figure 6.2, the matrix is composed of 400× 348 pixels divided in 50

columns of 48 basic modules, so-called digital cores, which contain 16 Pixel Regions

(PRs) of 4 pixels each.

Figure 6.2: Pixel matrix architecture for the RD53A prototype

Digital cores within a column share the same bus, and the access to the bus is

handled by a priority scheme: at digital cores level, the priority is higher from top to

bottom and, within a digital core, from the PR with address 0 to the PR with address


15. Once a PR takes control of the bus, the following information are sent through

the column:

1. 16-bits of ToTs (4-bits x 4 pixels);

2. 5-bits of trigger identifier;

3. 10-bits to identify the core address (the 6 MSBs) and the PR address (the 4

LSBs).

These data compose the 31-bits word that are than stored inside one of the 50 FIFOs

(one per column) downstream of the matrix. At this point, the idea is to implement a

data compression algorithm within the module downstream of each FIFO, process-

ing one word at a time.

6.2 Data compression techniques

The aim of data compression is reducing redundancy in stored or communicated

data, in order to increase the effective data density. In general, data compression

consists in two steps:

• the encoding phase where the "compressed" representation of the original mes-

sage is generated;

• the decoding phase where the recostruction of the original message, from the

compressed representation, takes place.

Among the several algorithms which can be found in literature, the "minimum

redundancy encoding" (also known as Huffman encoding) and the "Run-Length En-

coding" (RLE) have been investigated.

6.2.1 Huffman encoding

Huffman encoding is part of a class of data compression algorithms, so called loss-

less, which allow the perfect reconstruction of the original information from the com-

pressed data.


The key of the algorithm is the assignment of variable-legth codes to the input

symbols which form the source alphabet. The lengths of each codeword are based on

the frequency (weight) of the corresponding characters: the most frequent character

is encoded with the smallest number of bits whereas, the least frequent character,

with the longest codeword.

Moreover, the Huffman dictionary, is prefix free because the code assigned to one

character is not prefix of codes assigned to any other character [33]. In this way

Huffman algorithm allows a perfect decoding of the compressed bitstream.

Figure 6.3: Example of Huffman tree

The Huffman code (see Fig.6.3) can be generated building the Huffman tree and

traversing that tree from the root to each symbol, outputing a 0 every time we take

a lefthand branch, and a 1 every time we take a righthand branch. Considering the

probabilities reported beside each symbol, we can observe as the most likely symbol

is encoded with the smallest number of bits.

6.2.2 Run-Length encoding

Run-length encoding is a very simple form of lossless data compression. To explain

the concept behind this technique, we identify with p the probability of a sequence

of favorable events and with q = 1 − p the probability of the occurance of an unfa-

vorable event. If p = q = 1/2, we can rapresent the two possible outcomes usign 0

and 1. However, if p � q the "direct coding" method is less efficient than encoding

the run lengths between successive unfavorable events [34]. Therefore, in general, it

consists on counting the number of the same consecutive data values.


6.3 Compression strategy

In the current version of the readout chip RD53A, data coming from the matrix and

stored into the downstrem FIFOs, are packed into 32-bits words before being sent

to the module "Aurora_TX_frame". Such module implements the link layer com-

munications protocol Aurora, developed by Xilinx, that is used to move data point-

to-point across one or more high-speed serial links. Since each frame is composed

by 66-bit data blocks, the Aurora transmitter module puts together two consecu-

tive 32-bits words and encodes them into a 66-bits block code. For this reason, data

compression algorithms which will be presented below, maintain the 32-bits output

data format to allow their possible implementation on the prototype, and the correct

efficiency evaluation with respect to the current output packet (see Tab. 6.1)

Table 6.1: Current 32-bit output packet

Symbol Bit configuration Description

(COL) cccccc column address(CORE,REG) ccccccrrrr address region, address core

(A B C D) aaaabbbbccccdddd PR’s ToT values

6.3.1 Run-Level clustering

Within the 32-bit output data packet, we can observe as 10-bits are spent to inde-

tify the PR to which the ToT values are related. The idea of the algorithm we call

Run-Level clustering, is compressing this information by forming clusters of PRs with

address equal to the address of the first PR of the cluster. In particular, we can save

data bits identifying the next PRs of the same cluster with the number of empty PRs

(unfavorable events) between the previous full PR (favorable event).

As shown in the dictionary reported in Table 6.2, the number of bits to decode the

runs of empty PRs has been chosen as 3. This means at most 5 run values because the

words "0111" and "0110" define respectively the end of cluster and the end of column.

Hence, when the counter of runs reaches the maximum value, a new cluster starts.

The information about the end of column is necessary to control the intra-column

scanning of data related to a certain trigger (or cluster). As regards the end of trigger,


Table 6.2: Run Level Encoding


(COL) cccccc column address(CORE,REG) ccccccrrrr new cluster address(1,A B C D) 1aaaabbbbccccdddd PR’s ToT values

(0,RUN) 0rrr runs of empty PRs(EOK) 0111 end of cluster(EOC) 0110 end of column(EOT) 0000 end of trigger

it is produced by the last "Run-Level concentrator" which contains data coming from

the selected collision. Therefore, a 32-bit word to identify the next trigger will be

generated.

An example of this technique is shown in Figure 6.4 where the digital core con-

tains two full PRs. Instead of encoding the address af the last PR with 10-bits, only

4-bits defining the numer of empty PRs between the two full PRs (in this case 2) are

used.

Symbol Codeword

(COL) 000000(CORE,REG) 011110 1100(1, A B C D) 1 1101111111111111

(0,RUN) 0 010(1, A B C D) 1 1111111111111100

(EOK) 0111Figure 6.4: Application of Run Level encoding

The correctness of decoding, has been verified by using Matlab scripts which

emulate both the encoding and decoding steps and calculate the efficiency with

respect to the current data format. Moreover, data coming out from a column of

cores have been extracted by realistic simulation of the circuit and then saved into

Comma-Separated Values (CSV) files. Particle hits has been imported from detailed

physics/sensor Monte Carlo simulations, related to pixel readout chips of the CMS

experiment (with 50× 50µm2 and 25× 100µm2 pixels).

To emulate the functionality of the AFE in generating the output signal "Disc_out"

needed for calculating the ToT value (charge digitization), three possible behaviours


Figure 6.5: AFE behaviours

have been taken into accont (see Fig. 6.5), all featuring a linear charge to ToT con-

version:

• the slowA mode: charge range of 4500 electrons;

• the slowB mode: charge range of 7500 electrons;

• the slowC mode: charge range of 35000 electrons.

As expected and shown by histograms in Figure 6.6, the occurrence probability

of small ToT values increases with higher ranges due to the shorter duration of the

discriminator pulse given the same input charge.

Figure 6.6: Histograms of ToT values probability in a) slowA, b)slowB and c) slowC modes

In the results which will be presented in Section 6.4, two additional compression

approaches have been analysed. Both of them are related to the Run-Level clustering

technique and based on the occurrence probability of ToT values.


6.3.2 Modified Run-Level clustering

The approach we called modified Run-Level clustering consists in the mapping of the

ToT values using less than 4 bits to encode the minimum and the maximum ToT

value. After the information about the column and the PR address, the flag bit "H"

indicates the start of the hitmap. Moreover, if the cluster is defined by more than

one PR, the run sequence has to be followed by the bit flag "F" which advises the

presence of a new PR. The description of each symbol of the dictionary is reported

in Table 6.3.

Table 6.3: Modified Run Level Encoding


(COL) cccccc column address(CORE,REG) ccccccrrrr new cluster address

(F) 1 new PR(H) 0 start hitmap(Z) 0 zero ToT(M) 11 max ToT

(10,VAL) 10vvvv ToT value(0,RUN) 0rrr runs of empty PRs(EOK) 0111 end of cluster(EOC) 0110 end of column(EOT) 0000 end of trigger

To understand the difference with the classic Run-Level encoding, the same example

has been reported in Figure 6.7.

Symbol Codeword

(COL) 000000(CORE,REG) 011110 1100

(H) 0(10,VAL) 10 1101

(M) 11(M) 11(M) 11

(0,RUN) 0 010(F) 1(H) 0(M) 11(M) 11(M) 11

(10,VAL) 10 1110(EOK) 0111

Figure 6.7: Application of modified Run Level encoding


6.3.3 Run-Level clustering with Huffman

An other variant of the Run-Level clustering is applying the Huffman coding on the

possible 16 values of ToT. The difference with the Run-Level encoding modified is

that the set of codewords are not fixed but variable according to the distribution of

input data.

Table 6.4: Example of Huffman dictionary

ToT values Codeword

0 [1 0 1 0 1 0 1]1 [1 0 1 0 1 0 0]2 [1 0 0 1 0]3 [1 0 1 1]4 [0 1 1 0]5 [0 1 1 1]6 [0 1 0 1]7 [1 0 0 0]8 [0 1 0 0 1]9 [1 0 1 0 0]

10 [1 0 0 1 1]11 [0 1 0 0 0 0]12 [1 0 1 0 1 1]13 [0 1 0 0 0 1]14 [1 1]15 [0 0]

6.4 Data compression results

These three proposal of data compression, which could be implemented within the

EOC of RD53A, have been compared with the current data formatting. From the

available Monte Carlo data, about 800 triggers has been considered to evaluate the

data compression efficiency (see Eq.6.1) and both encoding and decoding steps have

been implemented.

η =bits after compression− bits before compression

bits before compression× 100% (6.1)

The efficiencies for the two different pixel sizes and different charge-ToT relations

are reported in Tables 6.5 and 6.6.


We can observe as with dedicated codewords for ToTs we can reach higher efficien-

cies respect to the basic Run-Level clustering approach which is strictly dependent

on cluster size. In fact, bits saving is in the replacing of the core and PR address

(10-bits) with a less number of bits (4-bits) necessary to encode the runs of empty

PRs between full consecutive PRs.

Table 6.5: Data compression efficiencies on data from PROCs with50× 50µm2 pixels

SlowA SlowB SlowC

Run-Level clustering −3.8% −3.9% −4%Run-Level clustering modified −8.4% −8.5% −11.9%

Run-Level clustering + Huffman −18% −15.9% −20.1%

Moreover, the two variants are related to the occurrance probability of ToT val-

ues. In particular, regarding the Run-Level modified approach, we encode the mini-

mum and the maximum value of ToT (15) since they result, from Monte-Carlo data,

the most likely values. It can be seen as a simplified Huffman approach, which

allows a prior definition of the codewords library and therefore an easier implemen-

tation in hardware.

Table 6.6: Data compression efficiencies on data from PROCs with25× 100µm2 pixels

SlowA SlowB SlowC

Run-Level clustering −8.9% −9% −9.3%Run-Level clustering modified −16.9% −17% −20.6%

Run-Level clustering + Huffman −33.3% −30.3% −30.7%

88

Conclusions

The analysis of power consumption of digital pixel array of the prototype CHIPIX65,

developed in the framework of RD53 Collaboration, has been the main focus of this

thesis. First, the design flow of the pixel readout chip has been reproduced for the

extraction of files to be imported into the simulation and power analysis tools. The

required files are: the SPEF files for parasitics information, the SDC files containing

timing constraints and the netlist of DUT at different levels of the flow. Secondly,

through the use of the simulation and verification framework VEPIX53, the evalu-

ation of the correct behaviour of the logic has been performed in different activity

conditions and for different corners of the CMOS technology. The DUT has been

linked to the simulation platform through an appropriate harness module and stim-

ulated by random hits and trigger signals: 3 GHz/cm2 hit rate and 1 MHz trigger

rate. After such verification, the fully activity of the chip has been annotated into

the VCD files in order to have the basis for the final power estimations. Simulation

results show that the average power consumption of the pixel array is dominated

by the dynamic component, due to high rates and continuous operation in nomi-

nal conditions. But it should be underlined that the power burnt by the clock tree

counts more than 60% of the total power required by the pixel array. This results

is explained by the fact that the clock tree includes the most active part of the sys-

tem and clock distribution drives long wires and big capacitive loads. Because the

goal of the optimization is not only the reduction of the average power consump-

tion but also the limitation of the digital logic fluctuations, the power peak analyis

has been performed. To this purpose the extraction of power profiles which indicate

the maximum current burned by the logic at different timing resolutions has been

Conclusions 89

performed. These information has been used by the power network designers to fil-

ter these current peaks with proper decoupling. In conclusion, once highlighted the

main power consumption factors, a bibliographic research has been done looking for

the most suitable power design techniques for this context. Taking as a reference the

paper [35], effective technique to reduce the power dissipated by the clock net have

been also suggested. These include the "custom" clock gating technique to avoid un-

necessary switching disabling the clock during idle state. It can be implemented at

RTL level or automatically by the synthesis tool. Moreover, clock tree optimisation

could be implemented by exploring different constraints for the CTS, e.g. the maxi-

mum transition time allowed, the target clock skew or the type of buffers to be used

within the clock network. Finally, results about a proposed on-chip data clustering

algorithms have been described.

90

References

[1] L. Rossi, P. Fischer, T. Rohe and N. Wermes, Pixel Detectors: From Fundamentals

to Applications. Berlin, DE: Springer, 2006.

[2] G. F. Knoll, Radiation detection and measurement, 2nd ed.. Hoboken, NJ, USA:

John Wiley & Sons, 2010.

[3] T. Hemperek. "Hybrid or Monolithic? Pixel detectors for future LHC ex-

periments", Univ. Bonn, DE, December 2013. [Online]. Available: https:

//indico.cern.ch/event/273886/

[4] J. Christiansen. "TDC architectures in ASIC’s", CERN, Geneva, CH, November

2011. [Online]. Available: https://indico.cern.ch/event/122027/

contributions/88189/attachments/69314/99333/ASIC_TDC_

FEE2011.pdf

[5] CERN, The HL-LHC project. [Online]. Available: https://hilumilhcds.

web.cern.ch/about/hl-lhc-project

[6] J. Christiansen. "ATLAS/CMS/LCD RD53 collaboration: Pixel readout

integrated circuits for extreme rate and radiation. ATLAS and CMS phase

2 pixel upgrades", CERN, Geneva, CH, April 2015. [Online]. Available:

https://indico.cern.ch/event/381514/contributions/901442/

attachments/760389/1043057/RD53_overview_april_2015.pdf

[7] J. Rabaey, Low Power Design Essentials. Berlin, DE: Springer, 2009.

https://indico.cern.ch/event/273886/

https://indico.cern.ch/event/273886/

https://indico.cern.ch/event/122027/contributions/88189/attachments/69314/99333/ASIC_TDC_FEE2011.pdf



https://hilumilhcds.web.cern.ch/about/hl-lhc-project

https://hilumilhcds.web.cern.ch/about/hl-lhc-project

https://indico.cern.ch/event/381514/contributions/901442/attachments/760389/1043057/RD53_overview_april_2015.pdf

https://indico.cern.ch/event/381514/contributions/901442/attachments/760389/1043057/RD53_overview_april_2015.pdf

REFERENCES 91

[8] H. J. M. Veendrick, "Short-Circuit Dissipation of Static CMOS Circuitryand its

Impacton the Design of Buffer Circuits", IEEE J. Solid-State Circuits, pp. 468-

473, Aug. 1984.

[9] J. Rabaey and M. Pedram, Low power design methodologies, Kluwer interna-

tional series in engineering and computer science. Kluwer Academic Publish-

ers, 1996.

[10] D. Dobberpuhl and R.Witek, "A 200 MHz 64b Dual-Issue CMOS Microproces-

sor", IEEE J. Solid-State Circuits, pp. 106-107, 1992.

[11] M. Munch, B. Wurth, R.Mehra, J. Sproch and N. Wehn. Automating RT-Level

Operand Isolation to Minimize Power Consumption in Datapaths. Presented

at 2000 Design, Automation and Test in Europe. [Online]. Available: https:

//ems.eit.uni-kl.de/fileadmin/ems/pdf/date00_09a_4.pdf

[12] T. Kapilachander, I. Hameem Shanavas and V. Venkataraman, "Technical

Study on Low Power VLSI methods", I.J. Information Engineering and Elec-

tronic Business, 2012. [Online]. Available: http://www.mecs-press.org/

ijieeb/ijieeb-v4-n1/IJIEEB-V4-N1-8.pdf

[13] D. Ta, T. Stockmanns, F. Hugging, P. Fisher, J. Grosse-Knetter, O. Runolfsson

and N. Wermes, "Concept, realization and characterization of serially powered

pixel modules (serial powering)", Nuclear Instruments and Methods in Physics

Research Section A: Accelerators, Spectrometers, Detectors and Associated Equip-

ment, vol.565, no. 1, pp. 113-118, 2006.

[14] L. Feld, "Novel Powering Schemes for SLHC Tracking Detectors", IEEE Spe-

cial Focus Workshop Detector Developments for the SLHC, 2008. [Online]. Avail-

able: https://web.physik.rwth-aachen.de/service/wiki/pub/

Feld/FeldVortraege/IEEE_2008_SLHC_Powering_LF.pdf

[15] S. Marconi, "Simulation, optimization and design of hybrid pixel array logic

for extreme hit and trigger rates for the High Luminosity - Large Handron

Collider", Ph.D. First-Year report, Dept. Elect. Eng., Univ. Perugia, IT, 2015.

https://ems.eit.uni-kl.de/fileadmin/ems/pdf/date00_09a_4.pdf

https://ems.eit.uni-kl.de/fileadmin/ems/pdf/date00_09a_4.pdf

http://www.mecs-press.org/ijieeb/ijieeb-v4-n1/IJIEEB-V4-N1-8.pdf

http://www.mecs-press.org/ijieeb/ijieeb-v4-n1/IJIEEB-V4-N1-8.pdf

https://web.physik.rwth-aachen.de/service/wiki/pub/Feld/FeldVortraege/IEEE_2008_SLHC_Powering_LF.pdf

https://web.physik.rwth-aachen.de/service/wiki/pub/Feld/FeldVortraege/IEEE_2008_SLHC_Powering_LF.pdf

REFERENCES 92

[16] H. Bhatnagar, Advanced ASIC Chip Synthesis Using Synopsys Design Compiler

and PrimeTime, 2nd ed.. Kluwer Academic Publishers, 2002.

[17] S. Marconi, "Simulation, optimization and design of hybrid pixel array logic

for extreme hit and trigger rates for the High Luminosity - Large Handron

Collider", Ph.D. Second-Year report, Dept. Elect. Eng., Univ. Perugia, IT, 2016.

[18] C. Spear, SystemVerilog for Verification. A Guide to Learning the Testbench Lan-

guage Features, 2nd ed.. Berlin, DE: Springer, 2008.

[19] M. Glasser, "Open Verification Methodology cookbook". Berlin, DE: Springer, 2009.

[20] Verification Academy, Uvm cookbook. [Online]. Available: https://

verificationacademy.com/cookbook/uvm

[21] Accellera, Universal verification methodology (uvm) 1.1 user’s guide, May

2011. [Online]. Available: https://www.cadence.com/rl/resources/

white_papers/max_metric_driven_ver_wp.pdf

[22] S. Marconi, E. Conti, P. Placidi, J. Christiansen and T. Hemperek, "The RD53

collaboration’s SystemVerilog-UVM simulation framework and its general ap-

plicability to design of advanced pixel readout chips", Journal of Instrumenta-

tion, vol. 9, no. 10, p. P10005, 2014.

[23] T. Binoth, C. Buttar, P. Clark and E. Glover, LHC physics. Boca Raton, FL, USA:

CRC Press, 2012.

[24] CHIPIX65, Innovative electronics in CMOS 65nm tecnhnology for a new gen-

eration pixel chip at future High Energy Physics colliders. [Online]. Available:

http://chipix65.to.infn.it/

[25] E. Monteil, L. Parcher, A. Paternò, N. Demaria, A. Rivetti, M. Da Rocha Rolo, F.

Rotondo, C. Leng and J. Chai, "A synchronous analog very front-end in 65nm

CMOS with local fast ToT encoding for pixel detectors at HL-LHC", Topical

Workshop on Electronics for Particle Physics, 26–30 September 2016.

[26] L. Pacher. "Results from CHIPIX65 Prototype of a New Generation Pixel Read-

out ASIC in 65 nm CMOS for HL-LHC experiments", 12th Trento Workshop

https://verificationacademy.com/cookbook/uvm

https://verificationacademy.com/cookbook/uvm

https://www.cadence.com/rl/resources/white_papers/max_ metric_driven_ver_wp.pdf

https://www.cadence.com/rl/resources/white_papers/max_ metric_driven_ver_wp.pdf

http://chipix65.to.infn.it/

REFERENCES 93

on Advanced Silicon Radiation Detectors, February 2017. [Online]. Avail-

able: https://indico.cern.ch/event/587631/contributions/

2471728/attachments/1415455/2167011/PACHER_CHIPIX65_

TREDI2017.pdf

[27] A. Paternò. "A Prototype of a New Generation Readout ASIC in 65nm CMOS

for Pixel Detectors at HL-LHC", Topical Workshop on Electronics for Particle

Physics, 26–30 September 2016.

[28] E. Conti, "DESIGN OF DEDICATED ELECTRONIC SYSTEMS FOR THE

READOUT OF PIXEL RADIATION SENSORS", Ph.D. Thesis, Dept. Elect.

Eng., Univ. Perugia, IT, 2015.

[29] M. Garcia-Sciveres, "RD53A Integrated Circuit Specifications", CERN-RD53-

NOTE-15-001, 2015. [Online]. Available: https://cds.cern.ch/record/

2113263

[30] F. Loddo, "RD53A status and activities", CERN, Geneva, CH, October

2016. [Online]. Available: https://indico.cern.ch/event/572325/

contributions/2330016/attachments/1353926/2045369/Loddo_

RD53A_Tk13october2016.pdf

[31] L. Gaioni, "RD53 status and plans", The 25 th International Workshop on Vertex

Detectors VERTEX 2016, 25-30 September 2016. [Online]. Available: https:

//indico.cern.ch/event/452781/contributions/2297462/

attachments/1344178/2026138/talk_vertex_gaioni.pdf

[32] J. Christiansen, "Outline and requirements of Phase2 Pixel system and

Read-Out Chip", CERN, Geneva, CH, December 2014. [Online]. Available:

https://indico.cern.ch/event/379891/contributions/903791/

attachments/758020/1039818/CMS_pixel_elec_requirements_

v1.0.pdf

[33] A. Moffat and A. Turpin, "On the Implementation of Minimum Redundancy

Prefix Codes", IEEE Transactions on Communications, vol. 45, no. 10, October

1997.

https://indico.cern.ch/event/587631/contributions/2471728/attachments/1415455/2167011/PACHER_CHIPIX65_TREDI2017.pdf



https://cds.cern.ch/record/2113263

https://cds.cern.ch/record/2113263

https://indico.cern.ch/event/572325/contributions/2330016/attachments/1353926/2045369/Loddo_RD53A_Tk13october2016.pdf



https://indico.cern.ch/event/452781/contributions/2297462/attachments/1344178/2026138/talk_vertex_gaioni.pdf



https://indico.cern.ch/event/379891/contributions/903791/attachments/758020/1039818/CMS_pixel_elec_requirements_v1.0.pdf



REFERENCES 94

[34] S. W. Golomb, "Run-length encodings", IEEE Trans. Inform. Theory, pp. 399-401,

July 1966.

[35] S. Marconi, T. Hemperek, P. Placidi and E. Conti, "Low-power optimisation of

a pixel array architecture for next generation High Energy Physics detectors",

accepted to Prime Conference 2017 (to be published).

Power consumption veriﬁcation for a new generation pixel … UNIVERSITÀ DEGLI STUDI DI PERUGIA...

Documents

Transcript of Power consumption veriﬁcation for a new generation pixel … UNIVERSITÀ DEGLI STUDI DI PERUGIA...