2012 Annual Research Report

download 2012 Annual Research Report

of 86

Transcript of 2012 Annual Research Report

  • 8/9/2019 2012 Annual Research Report

    1/86

      1

  • 8/9/2019 2012 Annual Research Report

    2/86

     

  • 8/9/2019 2012 Annual Research Report

    3/86

      1

    Welcome

    CEA  is a French government-funded technological research organization. Drawing on itsexcellence in fundamental research, its activities cover three main areas: Energy,Information and Health Technologies, and Defense and Security. As a prominent player inthe European Research Area, with an internationally acknowledged level of expertise in itscore competencies, CEA is involved in setting up collaborative projects with many partnersaround the world.

    Within CEA Technological Research Division, three institutes lead researches inorder to increase the industrial competitiveness through technological innovation andtransfers: the CEA-LETI, focused on microelectronics, information & healthcaretechnologies, the CEA-LIST dedicated to technologies for digital systems, and the CEA-LITEN devoted to new energy technologies.

    The CEA-LETI  is focused on micro and nanotechnologies and their applications, fromwireless devices and systems, to biology and healthcare or photonics. Nanoelectronics andMicrosystems (MEMS) are at the core of its silicon activities. As a major player in theMINATEC innovation campus, CEA-LETI operates 8,000-m² state-of-the-art clean rooms,on 24/7 mode, on 200mm and 300mm wafer platforms. With 1,700 employees, CEA-LETItrains more than 240 Ph.D. students and hosts 200 assignees from partner companies.Strongly committed to the creation of value for the industry, CEA-LETI puts a strongemphasis on intellectual property and owns more than 1,880 patent families.For more information, visit http://www.leti.fr.

    The CEA-LIST  is a key player in Information and Communication Technologies. Itsresearch activities are focused on Digital Systems with major societal and economicstakes: Embedded Systems, Ambient Intelligence and Information Processing. With its 650researchers, engineers and technicians, the CEA-LIST performs innovative research inpartnership with major industrial players in the fields of ICT, Energy, Transport, Security &Defence, Medical and Industrial Process.For more information, visit http://www-list.cea.fr.

    Design Architectures & Embedded Software research activity is shared betweenCEA-LETI and CEA-LIST through a dedicated division. More than 240 people are focusingon RF, digital and SoC, imaging circuits, design environment and embedded software.Theses researchers perform work for both internal clients and outside customers, includingNokia, STMicroelectronics, Sofradir, MicroOLED, Cassidian, Trixell, Kalray, Delphi, Renault,Airbus, Schneider Electric, Magillem, etc… 

  • 8/9/2019 2012 Annual Research Report

    4/86

     2

  • 8/9/2019 2012 Annual Research Report

    5/86

      3

    Page 5

    Thierry Colette > Interview Head of Architecture & IC design,Embedded Software Division

    Page 7Key Figures 

    Page 9Scientific Activities

    Page 11Architecture & IC Designfor RF & mmW 

    Page 21Architecture & IC Designfor Image Sensors 

    Page 31

    Architecture, IC Design& Control for DigitalSoCs

    Page 49Architecture & IC Designfor Emerging

    Technologies 

    Page 57Embedded Software 

    Page 69Reliability & Test 

    Page 77PhD Degrees Awarded 

    Contents

  • 8/9/2019 2012 Annual Research Report

    6/86

     4

  • 8/9/2019 2012 Annual Research Report

    7/86

      5

    A Wide Spectrum for…

    Interview with Thierry Collette,Head of Architecture & IC Design,

    Embedded Software Division

    Dear reader,

    30 years ago, with the microelectronic revolution, raised the communication new age.In parallel began another one: the computing science revolution. Now, we move into anew one, which is finally the synthesis of both. With the success of the Internet, and thenew needs, e.g. in health, transportation or security, more and more computing devicesare being smart and connected, leading to new research fields: the efficient big data management, and the integration of hardware and software know-how inside integrated

    embedded systems. Furthermore, what we assist today with smartphones, tablets oronboard computers, will be widely spread to many kinds of devices: the Internet ofThings is emerging.

    Our multidisciplinary platform dedicated to Integrated Circuit Design and EmbeddedSoftware, allows us to address this new trend. By joining these two fields of know-how,CEA is the one of first research organization in Europe to support such an original andglobal offer to the industry providing a wide range of capabilities, oriented towards theapplicative analysis and the exploration of integrated embedded architectures.

    This platform includes the tools, methods and human competencies from front-end &back-end integrated circuit design (digital, analog & mixed) in the most advancedtechnologies, complex circuit emulation, hardware/software integration, toindustrial test and reliability.

    We hope reading this scientific report will convince you that this wide spectrum platformbrings specific innovations, creating new opportunities to fulfill our first mission: supportand promote the industry by innovation and technology transfer.

    Thierry Collette

    © CEA-Leti / L. Godart

  • 8/9/2019 2012 Annual Research Report

    8/86

     6

  • 8/9/2019 2012 Annual Research Report

    9/86

      7

    Full suite of IC CAD tools,Hardware Emulators,

    & Test equipments,for Analog, RF & Digital circuits.

    37 granted patents29 papers, journals & books

    136 conferences & workshops

    3 locations:MINATEC campus (Grenoble)

    Integration Research Center (Gières)

    PARIS-SACLAY Campus (Palaiseau)

    160 Permanent researchers,65 PhDs and Post-docs

    34M€ budget85% funding from contracts

    2012 Key Figures

    Credits © CEA-Leti / CEA-List

  • 8/9/2019 2012 Annual Research Report

    10/86

     8

  • 8/9/2019 2012 Annual Research Report

    11/86

      9

    Scientific Activity

    Publications

    165 publications in 2012, including journals and Top conferences like ISSCC, VLSI Circuits

    Symposium, DAC, DATE, PIERS, ESSCIRC, RTSS and ESWeek.

    Prize and Awards

    IEEE SOI Conference 2012 Best Paper Award granted to Olivier Thomas et al.

    HIPEAC Paper Award granted to Antoine Joubert et al. for their DAC 2012 paper

    ATC 2012 Best Student Paper Award granted to Ngoc-Mai Nguyen

    Experts

    31 CEA experts: 2 research directors, 2 international experts

    9 Researchers with habilitation qualification (to independently supervise doctoral candidates)

    2 IEEE Senior Members

    Scientific Committees

    Editorial Boards: Journal of Low Power Electronics,

    19 members of Technical Programs and Steering Committees in major conferences: ISSCC,ESSCIRC, DAC, DATE, ESWEEK, RTNS, IJCNN, IWANN, EMSOFT…

    Normalization committee: AUTOSAR (Automoive Open System Architecture)

    Conferences and Workshops organizations

    MPSoC 2012, DTC 2012, D43D 2012, VARI 2012, ICE 2012

    International Collaborations

    Collaborations with more than 20 universities and institutes worldwide

    Caltech, University of Berkeley, University of Columbia, Carnegie Mellon University, EPFL, CSEM,UCL, Polito Torino, KIT, Chalmers University, Tongji, ….

  • 8/9/2019 2012 Annual Research Report

    12/86

     10

  • 8/9/2019 2012 Annual Research Report

    13/86

      11

    Architecture &

    IC Design

    For RF & mmW

    1

    Wireless Sensor Node

    UWB LocalizationPower Amplifiers

    RF & mmW Passives

    RF BIST

  • 8/9/2019 2012 Annual Research Report

    14/86

     12

    Simulation Infrastructure for Energy

    Autonomous Wireless Sensor Networks

    with Sense & React Capability

    Research topics : Wireless Sensor Networks, Energy Harvesting, Sense & React

    cost, and (3) a detailed interference model for theradiofrequency (RF) environment. Indeed, this last point iscrucial in the context of a physical (PHY) layer with sense &

    react capability. Indeed, considerable power savings can beobtained when an RF transceiver is able to instantaneouslyadapt its level of performance to the time-varying conditionsof the propagation channel.

    Since interference (Fig. 2) can lead to packet data loss,missed alarms, delay, loss of synchronization, etc., manyauthors have investigated its impact on WSNs. However,none has studied the problem of interference due tointermodulation which is caused by the nonlinearity of the RFreceiver. Unfortunately, linearity typically comes at the costof increased power consumption.

    Figure 2: Interference between desired user and nodes ofadjacent networks.

    We therefore propose the following new SINR model [2] for

    the investigation of performance degradation of WSNs underintermodulation interference:

    This model has been implemented in the GRECO simulationplatform hence enabling the study of different dynamicpower/performance tradeoff strategies with the aim ofspecifying a new sense & react transceiver for perpetuallypowered autonomous sensor networks.

    C. Bernier, A. Didioui, D. Morche, O. Sentieys (IRISA)

    ABSTRACT: Papers [1,2] present the simulation framework developed within the GRECO (GREen CommunicatingObjects) project. GRECO project aim is to design an energy efficient wireless platform that is totally autonomousthanks to energy harvesting capabilities and adaptive power management. To reach this goal, GRECO partnersare developing a simulation framework that will allow a complete modeling of the platform in order to evaluatedifferent power optimization strategies leading to energy neutral operations.

    The huge variety of Wireless Sensor Network (WSN)applications, ranging from environmental monitoring tohealthcare and smart homes, requires modular and

    reconfigurable platforms. Additionally, these systems have tobe low cost, comply with severe size constraints, anddemonstrate very high energy efficiency since the mostimportant constraint in WSN’s remains the energyconsumption. Several technologies have been developed forharvesting energy from our surroundings such as solar, windand vibration energy. As the environmental energy can bescavenged for as long as desired, if a Power Manager (PM) isdesigned such that the consumed energy remains lower thanthe harvested energy over a long period, thus leading to anenergy neutral operation  (ENO), the system cantheoretically reach infinite lifetime.

    Figure 1: Generic Architecture of Energy Autonomous WirelessSensor Node

    As the energy used by the radio transceiver represents themajor part of the energy consumed by a WSN node, adaptiveMAC layer policies are an active field of research. Forexample, in Fig. 1, the power manager, considered as thecore of the energy harvesting wireless sensor node, controlsthe wake-up period of the microcontroller and the radio

    transceiver according to the harvested energy, hence keepingthe node in energy neutral operation.

    Clearly, the development of novel applications anddeployment scenarios based on such adaptive platforms mustbe assisted by the simultaneous development of a dedicatedsimulation framework. Contrary to existing networksimulators, this framework must be able to model (1) theenergy harvesting subsystem which is highly dependent ontime and environmental factors, (2) the cross-layer adaptivepower management techniques and their associated power

    References: [1] Berder O., Sentieys O., Le T. N., Fontaine R., Pegatoquet A., Belleudy C., Auguin M., Tatinian W., Jacquemod G., Broekaert F., Didioui A.,Bernier C., Benchehida K., Bourdel S., Barthelemy H., Ciais P. & Barratt C., "GRECO: GREen communicating objects." Design and Architectures

    for Signal and Image Processing (DASIP), 2012 Conference on: 1-2.[2] Didioui A., Bernier C., Morche D. & Sentieys O., "Impact of RF front-end nonlinearity on WSN Communications.", 2012 9th InternationalSymposium on Wireless Communication Systems, ISWCS 2012, 28 August 2012 - 31 August 2012: 875-879.

  • 8/9/2019 2012 Annual Research Report

    15/86

      13

    Robust and Precise Localization with

    Double Quadrature Receivers

    Research topics : UWB, Localization, Beamforming, Antennas

    can undoubtedly improve the performances of the existinglocalization system.LORELEI performance opens the door to a new kind of

    applications where a small RFID tag (with energy scavengingor remote power) can be precisely identified and localized.The development of such kinds of applications requires reallysmall tags and readers such that these equipments are notnoticeable by the users. The bottleneck in size reduction isthe antenna. Reducing its size far below the wavelengthimpacts its radiated efficiency and its strong integration onthe device disturbs its omni-directionality in module and inphase. As can be seen in Figure 2, the radiated signalbecomes dependent of the elevation angle. This may impactthe performances of the ranging system.

    Figure 2 : Received signal for 40° and 145° elevations

    In [4], we have shown that the performance of the classicalsingle quadrature receiver degrades when faced to suchphenomena. On the other hand, in the LORELEI receiver, thesignal is projected on an orthogonal base of two signals. As aconsequence, with a 0.6cm worst case error, the systemappears to be really robust against some deviation of theantenna characteristics. This emphasizes the key impact of

    dedicated and innovative architectures [5] to reach highperformances in IR- UWB systems. Thanks to this approach,the obtained performance is among the most interesting inthe state of the art [6].More recently, we have shown that by modifying theprocessing done in single quadrature receivers, it is possibleto reach the same robustness and precision, at the cost of anincreased complexity. The next step will be to reach mmranging precision, in order to be able to consider a widerrange of applications.

    F.Bautista, D. Morche, G.Masson, F.Dehmas, S.Bories

    ABSTRACT: In this work, several refinements have been added to the localization techniques in impulse radio inorder to improve the precision, the range as well as the robustness of the existing techniques. The receiverarchitecture exploited in this approach is the double quadrature. This solution has shown its capability to reachfine ranging precision in the cm range [1]. Then a multi-antenna approach has been exploited to extend therange of the receiver and to extract the Angle-of-Arrival information. More recently, we have shown that doublequadrature architecture shows better robustness to antenna characteristics than the classical single quadrature[4]. Lastly, a new approach has been proposed to recover the same performances with single quadrature.

    The needs for surveyed positions in civil safety and militaryapplications require a new generation of Impulse Radio-UltraWide Band (IR-UWB) technology for range up to several km,

    capable of communication, precise localization and lowconsumption. Up to now, most of the IR-UWB localizationsolutions were based on non-coherent receivers with poorperformances. In [1], as far as we know, we presentedLORELEI, the first IR-UWB receiver working in the authorized3-5 GHz frequency band and reaching a ranging accuracylower than 10 cm. The fine localization is obtained thanks tothe double quadrature architecture. High flexibility capabilityto cope with various channel conditions and to reducesynchronization phase has been reached thanks to thesampled baseband architecture. Even if several hundredmeters range can be obtained with this solution, it may bedesirable to extend the range even more as well as thelocalization performances.

    LO2I

    LNA

    LO1I

    ∫ ADC Out_II

    ∫  ADC Out_IQ

    ∫  ADCOut_QQ

    ∫  ADCOut_QI

    LO1Q

    LO2Q

    LO2I

    LO2Q

     

    Figure 1 : LORELEI Architecture

    In [2] and [3] we have exploited a multi-antenna scheme to

    enhance the performances. By using four antennas andLORELEI ICs, we can achieve some beamformingfunctionality by a simple digital algorithm. It increases therange and can be exploited to reduce the power of theunwanted blockers. It can be also exploited to extractindependently the Angle-of-Arrival of each path of theimpulse channel responses. The error is lower than 2 degreesover a wide angle range. This functionality opens the door ofnew localization algorithms which can combine Angle-of-Arrival and time of arrival for all distinguishable paths. This

    References : [1] G.Masson et al. “A 1 nJ/b 3.2-to-4.7 GHz UWB 50 Mpulses/s Double Quadrature Receiver for Communication and Localization” ESSCIRC2010[2] Farid Bautista et al. “UWB Beamforming Architecture for RTLS applications using Digital Phase-Shifters” ISCAS 2011 - Rio de Janeiro[3] Farid Bautista, Dominique Morche, François Dehmas and Gilles Masson "Low power beamforming RF architecture enabling fine ranging and

    AOA techniques” ICUWB' 2011[4] Farid Bautista, Dominique Morche, Serge Bories, Gilles .Masson “Antenna Characteristics and Ranging Robustness with Double QuadratureReceiver and UWB Impulse Radio” ICUWB 2012[5]D.Morche, M.Pelissier, G.Masson, P.Vincent “UWB : Innovative Architectures Enable Disruptive Low Power Wireless Applications » DATE2012[6] S.Bourdel, G.Gielen, S.D’amico, D.Wisland, B.Busze, D.Neyrinck, J.Jantunen D.Morche “Advanced Tutorial on UWB Circuits and Systems”Workshop at ESSCIRC 2012

  • 8/9/2019 2012 Annual Research Report

    16/86

     14

    Design of a Fully Integrated CMOS

    Self-Testable RF Power Amplifier

    Using a Thermal Sensor

    Research topics : CMOS power amplifiers, RF built-in self-test, temperature sensors

    tone input signal of a fixed frequency and varying power.When this type of signal is applied to the input of the PA, thevariation of the local temperature of the PA active devices

    (shown in black in Fig. 2 right as measured by the sensor)tracks the variation of on-chip power, and therefore, onpower delivered to the load. The plot shows how the DCvalue of the sensor follows closely the PA Efficiency figure ofmerit. A second set of measurements is shown in Fig. 3.There, the input signal consists in two tones of fixed spacing(10 kHz) and varying frequency. As the two tones are sweptover the PA bandwidth the thermal signal observed at thetwo tones beat frequency tracks the PA bandwidth, withaccuracy comparable to the conventional RF measurement.These experiments demonstrate the potentials of non-invasive, temperature based observation techniques for RFcircuits BIST or self-healing

    Figure 2 : Left: Output power and gain of the PA obtained by RFmeasurements. Right: comparison of efficiency, PAE andtemperature sensor output as a function of input power.

    Figure 3 : PA frequency domain characteristics extracted using RFmeasurements and using the on-chip temperature sensor

    J.L. González, J. Altet (UPC), N. Deltimple (IMS), Y. Luque (IMS), E. Kerhervé (IMS)

    ABSTRACT: This research work presents a wideband RF power amplifier (PA) dedicated to 2GHz applicationsintegrating a contact-less temperature sensor that allows on-chip observation and testing of the PA. Indeed,based on the static and dynamic local temperature changes caused by the PA operation, the thermal sensor cansense parameters such as output power or efficiency. This principle is applied to a 65nm CMOS PA with an OCP1of 21dBm. We demonstrate that the output voltage of the thermal sensor follows the PA efficiency under singletone and multi-tone input signal conditions.

    Testing issues, mainly its cost, is becoming cruciallyimportant for the success of RF SoC products for massmarkets Test cost is directly related with testing time and

    cost of test equipment. One strategy to enhance yield and toease RF test consists in incorporating sensors on chip thatmeasure the operation of the circuit-under-test (CUT). Inmost of the cases, the sensors imply contact to electricalnodes of the RF circuit and high-frequency signal processing,at least at the input section of the sensor. In this work wepropose an alternative sensing strategy that requires nocontact to the CUT since it is based on the measurement ofthe temperature variations in the vicinity of the circuit [1].This technique is especially well suited for the observation ofpower amplifier characteristics [2,3] such as 1dBcompression point or bandwidth, which can be used fortesting or implementing for self-calibration loops.

    Figure 1 : Layout of the CMOS PA including a differentialtemperature sensor.

    The idea behind this technique is that any modification of thebalance between the power drawn from the supply and thepower provided to the load (or to the next stage) results in avariation of the dissipated power, that can be detected as alocal temperature increases in the vicinity of the activedevices of the CUT. We have applied this technique tocompare some RF measurements and the results obtainedwith an integrated temperature sensor for figures of merit ofa 2.5 GHz PA fabricated in a 0.65nm CMOS process, shown inFig. 1.A first set of measurements is shown in Fig. 2 for a single

    References : [1] D. Gómez, C. Dufis, J. Altet, D. Mateo, J. L. González, “Electro-thermal coupling analysis methodology for RF circuits,” Microelectronics

    Journal, Vol. 43, No. 9, September 2012, pp 633–641.[2]  J.L. González, B. Martineau, D. Mateo, and J. Altet, “Non-invasive monitoring of CMOS power amplifier operating at RF and mmWfrequencies using and on-chip thermal sensor,” 2011 IEEE Radio Frequency Integrated Circuits (RFIC) Symposium Digest of Papers, pp.1-4, 5-7 June 2011.[3] Deltimple N., González J.L., Altet J., Luque Y., & Kerhervé E., "Design of a fully integrated CMOS self-testable RF power amplifier using athermal sensor." in Proceedings of the 38th European Solid State Circuits Conference, ESSCIRC 2012, 17 September 2012 - 21 September2012: 398-401.

  • 8/9/2019 2012 Annual Research Report

    17/86

      15

    SOI CMOS RF Power Amplifier and

    Tunable Matching Network for

    Integrated RF Front-Ends

    Research topics : SOI, CMOS, Power Amplifier, Tunable Matching, RF Front-End

    TMN circuit occupies an area of 1.6mm2 and operates under2.5V voltage supply. As can be seen in Fig. 2, the TMN iscentered on 50 Ohms and provides good impedance coverage

    at 1.95 GHz. When combined with a miniature dual-bandantenna [3], the TMN succeeds to reduce the reflection lossesdown to less than 0.5 dB and allows maintaining a fairlyconstant radiated power even in cases of strongperturbations created by a metallic plane close to theantenna.

    Figure 1: SOI LDMOS PA micrograph

    Figure 2: SOI CMOS Tunable Matching Network micrograph (left)and measured Smith chart coverage at 1.95GHz (right)

    A. Giry, G. Tant

    ABSTRACT: A high integration level and tunable RF functions in SOI CMOS technology are key enablers to makesmaller and more cost-effective RF Front-Ends. In this work, a two-stage SOI LDMOS linear PA and a SOI CMOSTunable Matching Network have been designed and characterized. The obtained results represent a new steptowards high efficiency integrated RF Front-Ends for future multimode multiband cellular applications.

    Next generation wireless terminals and access points willhave to handle an increased number of standards andfrequency bands, which translates into great challenges and

    stringent requirements when looking at the RF front-end (RF-FE) section. Multiple Power Amplifiers (PA), RF switches andfilters will be needed, which will result in an increased sizeand cost of the RFFE section, especially if the multipletechnologies (GaAs, SAW, IPD) currently required to achieveadequate performances cannot be circumvented. A higherintegration level is the key to make smaller and more cost-effective RF-FE, and SOI CMOS technology provides anattractive trade-off among performance, cost and integrationcapability appears today as a key technology. In addition,size constraints lead to intrinsically small antennas which arevery sensitive to their environment and experience wideimpedance variations leading to large mismatch losses andimportant degradation of RFFE energy efficiency.The proposed research work aims at investigating SOI CMOS

    technology for the design of highly integrated RFFE withreduced power consumption. To meet the needs of futurecellular RFFE, a watt-level SOI LDMOS PA with high linearityand efficiency has been developed [1] together with a low-loss SOI CMOS Tunable Matching Network (TMN) allowingimproved energy efficiency under various mismatchconditions. The proposed PA and TMN have beenimplemented in a 0.13um SOI CMOS industrial process with ahigh resistivity substrate. Fig. 1 shows a micrograph of thetwo-stage PA which occupies an area of 0.84mm2  and hasbeen designed by using a high voltage LDMOS power deviceto get high efficiency and Through Silicon Vias (TSV) [2] forefficient ground connection. At 900MHz under 3.6V supplyvoltage, the LDMOS power stage delivers up to +33.2dBm ofpeak power with a maximum efficiency of 60%. When testedwith a 10MHz bandwidth 16QAM uplink LTE signal, the two-stage PA provides a higher linear output power of +27dBmwith less than 3% EVM. Fig. 2 shows a micrograph of the SOICMOS TMN based on integrated high-power tunablecapacitors which consist in arrays of binary weightedswitched-capacitors. Each tunable capacitor exhibits 32states and has been designed to cover the range 0.7-2.8 pFwith a minimum quality factor of 40 at 2.7 GHz and amaximum power rating of +36dBm. Control logic isintegrated on-chip and allows the selection of appropriatecapacitance values through an integrated SPI interface. The

    References : [1] A. Giry, G. Tant, Y. Lamy, C. Raynaud, P. Vincent, “A Monolithic Watt-level SOI LDMOS Linear Power Amplifier with Through Silicon Via,”2013 IEEE Topical Conference on Power Amplifiers for Wireless and Radio Applications (PAWR), 20-23 Jan. 2013

    [2] http://www.leti.fr/en/How-to-collaborate/Collaborating-with-Leti/Open-3D[3] L. Dussopt, M.A.C Niamien, A. Giry, A. Chebihi, S. Contal, F. Fraysse, S. Aissa, O. Perrin, C. Delaveaud, "Enhanced-efficiency front-endmodule with multi-standard impedance-tunable antenna," IEEE 17th International Workshop on Computer Aided Modeling and Design ofCommunication Links and Networks (CAMAD), pp.328-332, 17-19 Sept. 2012

  • 8/9/2019 2012 Annual Research Report

    18/86

     16

    Slow-Wave CPW and CPW in

    CMOS65nm SOI Technology:

    A Benchmark

    Research topics: CMOS, 60 GHz, transmission lines, CPW, quality factor

    Figure 2 : Measured performance comparison of S-CPW and CPWquality factor Q. Experimental CPW results at 60GHz from [1]:square for 70- Ω CPW and triangle for 38- Ω CPW.

    Thanks to the enhanced effective dielectric permittivity, thequality factors of the S-CPW are significantly improved,compared to CPW ones. The Q-factors of S-CPW areincreased by a factor 4 and 2.5, for 28 Ω S-CPW and for 65 Ω S-CPW, respectively.In this work, high performance slow-wave lines fabricated inan advanced 65 nm HR-SOI CMOS technology werecharacterized and optimized. Experimental results show thatthe performance improvement of S-CPW, compared toconventional CPW, is mainly due to the increase of theeffective permittivity. At 60 GHz, the attenuation constant ofS-CPW is reduced by almost 40% and the effective relativepermittivity is two to six times higher, leading to almost twoto four times higher quality factor.

    X.L. Tang (IMEP), A.L. Franc (IMEP), E. Pistono (IMEP), A. Siligaris, P. Vincent, P. Ferrari (IMEP),and J.M. Fournier (IMEP)

    In this work, slow-wave coplanar transmission lines (S-CPW) and standard CPW are compared throughmeasurements up to 65 GHz. Both S-CPW and CPW lines are fabricated on an industrial CMOS 65nm SOI withhigh resistivity substrate. Due to the slow-wave effect, S-CPW lines achieve a high effective permittivity thatreduces the wave-length. As a result, very high quality factors are achieved that show the interest of this objectfor millimeter-wave (mmW) circuit design.

    Millimeter-wave CMOS circuits have been intensivelydeveloped in the past decade in order to respond to agrowing demand for mass-market, high throughput wireless

    applications. The most popular approach for the matchingnetworks and inductive components design is the use ofmicrostrip (MS) and CPW[1]. However, CPW and MS linessuffer, at high frequency, from high losses and low qualityfactor because of thin metallic layers in CMOS technologies.The concept of S-CPW respond to this problem by exploitingthe slow-wave phenomenon that increases artificially theeffective permittivity (εeff ). As a result, the wavelength isdecreased and thus, the corresponding physical line lengthfor a given phase shift is reduced. This is illustrated inequation (1):

    (1)

    Where α  is the attenuation constant, β  is the spatial phase

    velocity and c0 the speed of light in the vacuum.Figure 1 shows a 3-D schematic structure of a S-CPW lineintegrated in a CMOS back-end with six Copper metal layersand one Aluminum top layer. It consists of a conventionalCPW line with patterned metallic shield placed between theCPW and the silicon substrate.Two characteristic impedance transmission lines (28 Ω  and65 Ω  respectively) were fabricated in S-CPW andconventional CPW. Measurements were carried out up to 65GHz using a two-port VNA. The extracted quality factor foreach measured line is shown in figure 2.

    Figure 1: (a) 3-D schematic view of S-CPW structure. (b)Schematic cross section of the 65 nm SOI CMOS back-end.

    References : 

    [1] A. Siligaris, C. Mounet, B. Reig, and P. Vincent, "CPW and discontinuities modeling for circuit design up to 110 GHz in SOI CMOStechnology," in IEEE Radio Frequency Integrated Circuits (RFIC) Symposium, pp. 295-298, 2007.[2] Xiao-Lan Tan, A.-L. Franc, E. Pistono, A. Siligaris, P. Vincent, P. Ferrari, and J.M. Fournier, "Performance Improvement Versus CPW andLoss Distribution Analysis of Slow-Wave CPW in 65 nm HR-SOI CMOS Technology," IEEE Transactions on Electron Devices, vol.59, no.5,pp.1279-1285, May 2012.

  • 8/9/2019 2012 Annual Research Report

    19/86

      17

    On the Electrical Properties of Slotted

    Metallic Planes in CMOS Processes for

    RF and Millimeter-Wave Applications

    Research topics : Interconnections, RF and mmW ICs

    this way the electrical properties of the transmission lines.Figure 3 shows the impact of different hole and basic cellsized in the capacitance of a line to the plane, where relative

    changes by a factor of 3 are observed.

    Figure 2 : Effective conductivity values obtained by simulation

    (symbols) and comparison with the analytical model (lines).

    Figure 3 : Comparison between EM simulation results and predictions for capacitance of a M6 interconnection to a slotted plane in several metal layers from the analytical model

    The observed significant change in the plane andtransmission line properties observed must be taken intoaccount for an accurate design of this type of structures. In[3] a simulation strategy is proposed that goes in thatdirection.

    J.L. González, B. Martineau (STMicroelectronics), D. Belot (STMicroelectronics)

    ABSTRACT: This research work is focused in the effects of slotted metallic planes in passive structures builtusing CMOS processes for RF and millimeter-wave (mmW) applications. The impact of holes on the referenceplane resistance and in the capacitance of any surrounding structure to the plane are investigated throughelectromagnetic (EM) simulations. Two analytical expressions are derived that capture the holes impact on theplane resistivity and on the dielectric constant of the materials found between the plane and the surroundings.These expressions are used to propose a simplified EM simulation methodology for on-chip microstriptransmission lines.

    Recent realizations of integrated radios operating atmillimeter wave frequency (several tens of GHz) [1,2], and toa lesser extent at RF frequencies (several GHz), require the

    use of distributed passives such as transmission lines. Thesestructures must be fabricated by respecting strictmanufacturing rules imposed by the semiconductorprocessing tools and procedures. For large area metallicsurfaces that are required to build the reference groundplanes of microstrip lines, for example, the manufacturingrules impose a maximum density of metal, so that suchplanes must be pierced with holes, as indicated in Figure 1.a.Up to now, little attention was paid on the impact of thismodified planes with respect to the ideal, continuous metallicplane that should be used if possible. Figure 2.b shows thebasic parameters of a section of a slotted plane. A basic cellconsisting on a section of the plane with a single hole can bedefined, and the plane can be considered as a 2D repetitionof this basic cell. The relative size of the hole with respect to

    the size of the basic cell sets the basic parameter for theplane: the metal density (or its inverse, the hole density).

    Figure 1 : (a) 3D view of an example microstrip transmission linestructure with slotted ground plane. (b) Details of the slottedmetallic plane.

    In this research work we have analyzed the modification ofthe electrical properties of the plane that are caused by thepresence of the hole, in comparison to an ideal, continuousplane without holes (i.e. with a 100% metal density). Figure2 shows how the conductivity of the plane is reduced by afactor of 5 if the metal density is reduced up to a 40%. Thismodification of the plane conductivity is observed fordifferent plane thicknesses, such as those obtained by usingthe various metallization levels available in CMOS processes.The holes opened in the plane also modify the electric fieldsof the surrounding structures to the plane, such as forexample the capacitance of a line of the plane, modifying in

    References :.[1] J.L. Gonzalez, F. Badets, B. Martineau, D. Belot,”A 56-GHz LC-tank VCO with 17% tuning range in 65-nm bulk CMOS for wireless HDMI,”

    IEEE Trans. Microwave Theory Tech., 58 (2010).[2] B. Martineau, V. Knopik, A. Siligaris, F. Gianesello, D. Belot,”A 53-to-68 GHz 18 dB m power amplifier with an 8-way combiner in standard65nm CMOS,” in Proceedings of the IEEE International Solid-State Circuits Conference, February 2010, pp. 428–429.[3] José Luis González, Baudouin Martineau, Didier Belot, “On the electrical properties of slotted metallic planes in CMOS processes for RF andmillimeter-wave applications,” Microelectronics Journal, Volume 43, Issue 8, August 2012, Pages 582-591.

  • 8/9/2019 2012 Annual Research Report

    20/86

     18

    BAW Filters for Ultra-Low Power

    Narrow-Band Applications

    Research topics : BAW, Ultra-Low-Power, Narrow-band

    )( 122122112211

    21

    2   Z  Z  Z  Z  Z  Z  Z  Z  Z  Z 

     Z  Z  Z GmGv

    oout oout 

    out o

    −+++

    ⋅⋅⋅−=  

    In the particular case where Gm=1, this expression ishomogenous to an impedance, an “equivalent load”, which isdirectly related to the S21 scattering parameter of the BAW.We converged to a filter response which has both a narrowbandwidth and a large equivalent load, by drasticallyreducing the frequency spacing between the series andparallel resonators, to the overlap area (see fig.2), which isforbidden in classical power transmit BAW filter design.

    Figure 2 : BAW filter Equivalent Load response for classical andnew frequency offset

    Note that these filter responses are obtained with resonatorparameters extracted from state of the art devices. The onlydifference with respect to existing process flows is the loadinglayer thickness which must be modified to create smaller

    frequency offsets.The filter design methodology described in this work hasbeen explored in High-IF architecture for ISM band. It canalso be used to design an extremely selective, ultra-lowpower RF gain stage for a wake-up radio where typical inputand output impedances are small capacitances (e.g. 100fF),allowing the design of a lattice filter with equivalent loadgreater than 1kΩ and BW-3dB

  • 8/9/2019 2012 Annual Research Report

    21/86

      19

    A Frequency Measurement BIST

    Implementation Targeting

    GigaHertz Applications

    Research topics : Design for test, Radiofrequency measurements

    A close ground line could increase crosstalk in the receptionpath. Therefore, only the first stage is introduced very closeto the SRO whereas the next stages are located as far as

    possible of the output of the SRO.Figure 1 represents the SRO resonance frequency obtainedby the measurement with the ATE resources FMeas and withthe BIST technique FBIST. This scatter plot shows theexcellent correlation between both measurements.

    Figure 2: Picture of the chip connected to the ATE Verigy 93K.

    Figure 2 shows the chip connected in its socket. The SROoutput is connected to the mixer through the coaxial cablewhereas the BIST used only digital pins of the tester.We suggested a complete BIST technique for measuring highoscillation frequency of a fully integrated front-end designed

    for UWB transmission systems. To achieve this, we firstderive from the high frequency oscillation a proportionallower clock signal. This clock is then used to increment anasynchronous counter. The final counter state enables adirect computation of the oscillation frequency. Experimentalresults are excellent and confirm the results expected bythorough electrical simulations. The comparison of the BISTtechnique with the standard test setup shows a negligibledifference in the frequency measurement for a test timesaving by a factor 20.

    M.Dubois, E. de Foucauld, C. Mounet, S. Dia (Presto Engineering) and C.Mayor (Presto Engineering)

    ABSTRACT: We propose a Built-In Self-Test (BIST) technique for measuring the natural resonance frequency ofoscillators which are set much higher than the working speed of current Automated Test Equipment (ATE).Based on an asynchronous counter, the BIST response corresponds to a digital output code proportional to thefrequency of the oscillator under test. The efficiency of the proposed BIST is demonstrated on an Ultra-Wide-Band transceiver, whose communication frequency ranges in the band of 7.25GHz to 8.5GHz.

    We present a BIST dedicated to the measurement of a highfrequency transceiver based on the Super RegenerativeOscillator (SRO) principle. The suggested BIST architecture

    relies on an asynchronous counter that deduces thefrequency measurement counting the number of oscillationperiods within a given period of time. The digital output testresponse suites any digital communication and processingsystems, which can use the information for test purposesand/or self-calibration or compensation techniques.For respecting standardization rules, this frequency is set toF=7.875GHz, the center of the 7.25GHz to 8.5GHz UWBband. For this application, the optimal BER is reached whenthe oscillator resonance frequency matches the centralfrequency of the input signal.The super-regenerative receiver with its BIST is implementedin a 0.13µm CMOS technology. The connection of the outputof the SRO and the input of the BIST has to be as short aspossible to limit the parasitic capacitance of the net

    connection and crosstalk with other signals. On the otherhand, the introduction of the BIST close to this critical path ofthe system increases the risk of performance reduction of thesystem even when the BIST is switched off.

    Figure 1: Frequency measurement of the SRO resonance with the ATE resources and the BIST technique.

    Reference : [1] Dubois, M.; De Foucauld, E.; Mounet, C.; Dia, S.; Mayor, C., “A frequency measurement BIST implementation targeting gigahertzapplication”, 2012 IEEE International Test Conference (ITC), Page(s): 1 - 8

  • 8/9/2019 2012 Annual Research Report

    22/86

     20

  • 8/9/2019 2012 Annual Research Report

    23/86

      21

    Architecture &

    IC Design

    For ImageSensors

    High Performance IR Imagers

    3D Integration for Imagers Advanced Integrated Algorithms

    2

  • 8/9/2019 2012 Annual Research Report

    24/86

     22

    An 88dB SNR, 30µm Pixel Pitch

    Infra-Red image Sensor

    With a 2-step 16 bit A/D Conversion

    Research topics : CMOS image sensors, Infra-Red, pixel-level ADC

    charge packets so that, at the end of the integration time,the pixel counter contains a digital value proportional to thetotal integrated charge and the residue remains on the

    integration capacitance (Cint). GS (Global Shutter) is a globalsignal while RS (Row Select) is a linewise signal that allowsthe pixel to write on the digital bus on one hand and on theanalog bus on the other. For a fixed resolution of 16 bits, thenumber of bits at the pixel level can be assessed on an areacriterion. For the 0.18µm process we used, Fig. 2 shows thatthere is a tradeoff between the integration capacitance andthe counter depth.

    RS

    rst

    CINT

    VBIAS 11bitcounter Vref 

    Vint

    indiumbump

    Vdd

    digitalbus

    IPD

    Vpulse

    MCTPD

    11

    +

    -

    monostable

    +

    -

    Vref 

    analog

    circuit

    5 bit flash ADC

    16

    bus

    bottom of the columnpixel

    RS

    towardsSRAM

    GS

    GS

    Fig 3 : 2-step 16 bit ADC principle

    The test chip was fabricated in a 1P6M 0.18µm standardCMOS process. This 320x256 hybrid HgCdTe sensordemonstrates how the 3Ge- full well capacity associated witha 16-bit ADC resolution paves the way for a breakthrough inthermal sensitivity. In electro-optical tests, a peak SNR of88dB has been reached with power consumption below72mW.

    [this work] [4] [1] [2]

    CMOSprocess

    0.18µm 0.35µm 0.18µm 0.18µm

    Pixel pitch 30µm 50µm 30µm 50µm

    Peak SNR 88dB 85dB 75dB 70dB

    Power/pixel 0.5µW 1.7µW 10µW 9.7µW

    Format 320x256 128x128 16x1 64x64 

    Table 1 : Summary of the sensor features vs other works

    A. Peizerat, J-P. Rostaing, N. Zitouni, N. Baier, F. Guellec, R. Jalby, M. Tchagaspanian

    ABSTRACT: A new readout IC (ROIC) with a 2 step A/D conversion for cooled infrared image sensors ispresented in this paper. The sensor operates at a 50Hz frame rate in an Integrate-While-Read snapshot mode.The 16-bit ADC resolution preserves the excellent detector SNR at full well (~3Ge-). The ROIC, featuring a320x256 array with 30µm pixel pitch, has been designed in a standard 0.18µm CMOS technology. The IC hasbeen hybridized (indium bump bonding) to a LWIR (Long Wave Infra Red) detector fabricated using our in-house HgCdTe process. The first measurement results of the detector assembly validate both the 2-step ADCconcept and its circuit implementation. This work sets a new state-of-the-art SNR of 88dB.

    Used in security and defense applications, cooled (77K)Infrared HgCdTe (Mercury Cadmium Telluride) hybrid sensors(detector bump-bounded over the CMOS IC) are very

    demanding in terms of SNR (typical state-of-the-art valuesare in the 70-80dB range). The detector sensitivity can belimited either by the incident number of photons during oneframe or by the CMOS readout IC (ROIC) charge handlingcapacity. In many thermal imaging conditions, this secondpoint predominates. This limited charge well capacity isdetermined by two CMOS process constraints: the integrationcapacitance that has to be fit in a given pixel area andvoltage range. To overcome this limitation, new ROICarchitectures must be developed. Pixel-level analog-to-digitalconversion is a very attractive solution that enables highdynamic range imaging and SNR breakthrough performance,while being compatible with an IR pixel size.

    The overall architecture of the sensor is given Fig. 1. At the

    end of the integration time, the global shutter pixel deliversan 11 bit digital output as well as its residue analog output.This residue is then converted using a 5 bit flash ADC, whichgives a final 16-bit digital output. In order to make theIntegrate While Read (IWR) feature possible, a whole imagememory (SRAM) is needed.

    dataout

    9.6mm

       7 .   2

       m   m

    column charge amplifiers

    5 bit flash ADCs

    Row decoder

    320*256

    16 bit word

    SRAM

    320x256 pixel array

    16 bit output shift register

     number of bits in the pixel

       a   r   e   a    (   u   m   2    )

    2

    900

    700

    500

    300

    100

    04 6 8 10 12 14 160

    counter area

    counter area

    + CINT area

    CINT area

    pixel area

     Fig 1 : overall block diagram Fig 2 : Cint trade off

    As illustrated on Fig. 3, the pixel uses a pixel-level ADCtechnique that is described in [1]. It consists in counting

    References : [1] A. Peizerat, M. Arques and J.-L. Martin, “Pixel-level A/D conversion: comparison of two charge packets counting techniques,” in Proc., 2007International Image Sensor Workshop.

    [2] Peizerat, A.; Rostaing, J.; Zitouni, N.; Baier, N.; Guellec, F.; Jalby, R.; Tchagaspanian, M., “An 88dB SNR, 30µm pixel pitch Infra-Redimage sensor with a 2-step 16 bit A/D conversion”, 2012 Symposium on VLSI Circuits (VLSIC), pp. 128-129

  • 8/9/2019 2012 Annual Research Report

    25/86

      23

    Linear Photon-Counting

    with HgCdTe APDs

    Research topics : photon counting, image sensor, infrared

    Figure 2 shows the Probability Density Function (PDF) frommore than 10000 samples. The probability to generate alaser pulse with a n-photon state is a Poisson distribution:

     µn en µnP   −×= )!()(  where >=

  • 8/9/2019 2012 Annual Research Report

    26/86

      23bis

    A low-noise, 15µm pixel-pitch,

    640x512 hybrid InGaAs image sensor

    for night-vision

    Research topics : image sensor, infrared, night vision

    Experimental results are in good agreement with simulatedvalues. In high gain configuration (17.6µV/e-) a read noise of30e- has been reached for a dynamic range of 71dB. In low

    gain configuration (1.9µV/e-) we get respectively 108e- and79dB. As expected, the lower noise floor in high gain isobtained at the expense of the dynamic range. This trade-offshould be adjusted accord to application needs. The dual gainof the pixel allows a use in both night and day conditions aswell as image fusion if needed. The 640x512 image sensoroperates at a frame rate up to 120fps with a total powerconsumption of 150mW.

    Figure 2 : View of the packaged hybrid image sensor and picturetaken with the developed camera.

    Further work is carried out to reduce the pixel pitch to 10µmwhile maintaining good noise performance in the aim ofdeveloping a future 1280x1024 detector.

    F. Guellec

    ABSTRACT: This paper presents the design of a 15µm pixel-pitch, 640x512 CMOS readout IC. A careful noiseanalysis of the C-TIA pixel circuit is necessary to achieve low noise performance with a high conversion gain. A30e- read noise for a 71dB Dynamic Range (DR) has been reached with the developed hybrid InGaAs imagesensor operated in rolling shutter with Correlated Double Sampling. These state of the art results demonstratethat this detector is well suited for night vision in the Short Wave Infrared band where it can take advantage ofthe airglow. The dual gain functionality of the pixel furthermore enables both night and day use. In lowconversion gain configuration, the noise floor vs. dynamic range trade-off is different and we get a DR of 79dB.

    Hybrid InGaAs infrared detectors allow easy and compactcamera integration as cooling is not needed. They aresensitive from the Short Wave infrared (SWIR) (λ=1.7µm)

    down to the visible (λ=0.4µm) when the substrate is thinned.The SWIR band presents some key advantages for nightvision. In this band, the haze offers a good transmission andan optical phenomenon occurring in the atmosphere (calledairglow or nightglow) causes a weak generation of light.

    In this context, we developed in collaboration with the III-VLab a low-noise, 15µm pixel-pitch, 640x512 hybrid InGaAsimage sensor for night vision [1, 2]. We were in charge of thereadout IC design in a standard 0.18µm CMOS technology.The pixel is based on a dual gain C-TIA circuit with an anti-blooming function. The image sensor is operated in rollingshutter with an optional correlated double-sampling modewhich is useful to reduce the noise in high-gain configuration.Thanks to a thorough noise analysis (taking into account

    power supply noise and CDS filtering) and careful circuitoptimization with respects to area and power consumptionconstraints state of the art performances have been reached.

    Figure 1 : Simplified pixel architecture and modeled noisespectral density (dashed blue: input noise, bold blue: output

    noise, red: output noise after CDS)

    Références : [1] F. Guellec, S. Dubois, E. de Borniol et al., “A low-noise, 15µm pixel-pitch, 640x512 hybrid InGaAs image sensor for night vision,” Proc. SPIE8298, Sensors, Cameras, and Systems for Industrial and Scientific Applications XIII, 82980C, February 2012.

    [2] E. de Borniol, F. Guellec, P. Castelein, A. Rouvié, J.-A. Robo and J.-L. Reverchon, “High-performance 640x512 pixel hybrid InGaAs imagesensor for night vision,” Proc. SPIE 8353, Infrared Technology and Applications XXXVIII, 835307, May 2012.

  • 8/9/2019 2012 Annual Research Report

    27/86

     24

    High Dynamic Range Image Sensor

    with Self Adapting Integration Time

    in 3D Technology

    Research topics: 3D Technology, HDR, Image Sensor, Integration Time

    Figure 1: Architecture of the pixel and the integration time feed-back loop

    Figure 2: WTA output voltage versus pixel voltage

    The characteristic equation of the resulting curve shows again about 0.973 and a 321mV offset voltage. These values

    are coherent with the analytical expressions of the offset(Eq.1) and the gain (Eq.2):

    )l n)(   N V nV g s wiV g s t   ⋅⋅−=   (1)

     

     

     

     

    ++ 

      

     

    ++==

    ∑ 421

    1

    311

    1

    d s

     N 

    i

    d s im w

    m w

    d s wd s wm w

    m w

    ggg

    g

    ggg

    g

    V i n w

    V cG  (2)

    F. Guezzi-Messaoud, A. Dupret, A. Peizerat, Y. Blanchard (ESIEE Paris)

    ABSTRACT: This paper presents a High Dynamic Range (HDR) image sensor architecture that uses capabilities ofthree-dimensional integrated circuit (3D IC) to reach a dynamic range over 120 dB without modifying the classic(3T or 4T) pixel architecture. The integration time is evaluated on subsets of pixels on the lower IC of the stack and then sent back by vertical interconnections to the sensor array. This work evaluates the performance of ananalog Winner Take All circuit, used to detect the maximum exponent corresponding to the optimum integrationtime chosen for every group of pixels.

    Integrating more complex functions within the same circuit isone of the main quests for the microelectronics industry.Three-dimensional integration by circuit stacking (3D

    stacking) constitutes a promising way to achieve this goal. Itallows notably pushing some limitations that circuits havereached nowadays. The main motivation is to take advantageof the 3D topology to exceed the limited dynamic range ofthe standard image sensors while keeping the classic 3T or4T pixel architecture. This work presents a new architectureof an image sensor that allows reaching a dynamic rangeover 120dB without modifying the classic (3T or 4T) pixelarchitecture. This architecture takes advantage of emergenceof technologies of dense vertical interconnections, ThroughSilicon Via (TSV), to locally adapt the integration time of agroup of pixels. The coding of a high dynamic range and ahigh PSNR image leads to an increase of the datathroughput, at the IOs of the circuit. The HDR architecture isso coupled to a two-level compression system [1, 2].

    To mitigate the available lack of pixel area and TSV pitchesof about tens of microns, the circuit proposed in this worktakes advantage of 3D stacking of 2 integrated circuits. Thecircuit consists of two stacked dies vertically interconnectedby TSVs. The upper die performs image acquisition and isbased on the architecture of a classical 2D image sensor,with 3T or 4T pixels. The processing performed on the lowerdie contains two stages. Firstly, it estimates the best suitedof integration time for every macro-pixel, and then, generatethe command signal that adjusts the integration time. Todeduce the optimal integration time, we use the circuitarchitecture presented in Fig.1.

    In every macro-pixel, the maximal voltage drop ΔV,corresponding to the minimum integration time, isdetermined by means of a Winner Take All (WTA). We havedesigned a WTA circuit in 32nm double oxide CMOStechnology. Due to its analog nature, the transfer function ofthe WTA has an offset and a gain error. The output voltageas a function of different sets of input voltages has beensimulated (Fig. 2).

    References : [1] F. Guezzi Messaoud , A. Dupret, A. Peizerat and Y. Blanchard, “ A novel 3D architecture for High Dynamic Range image sensor and on-chipdata compression”, Proceedings of the Sensors, Cameras, and Systems for Industrial, Scientific, and Consumer Applications XII, San Francisco,

    SPIE 2011.[2] F. Guezzi Messaoud, A. Dupret, A. Peizerat and Y. Blanchard, “On-chip compression for HDR image sensors”, proc.DASIP, 90-96, October2010.[3] Guezzi Messaoud F., Dupret A., Peizerat A. & Blanchard Y. “High Dynamic Range Image Sensor with Self Adapting Integration time in 3DTechnology”, IEEE International Conference on Electronics, Circuits, and Systems (ICECS), December 9-12, Seville, Spain, 2012

  • 8/9/2019 2012 Annual Research Report

    28/86

      25

    Computational SAR ADC for

    a 3D CMOS Image Sensor

    Research topics: 3D integration, CMOS image sensor, image descriptor

    The C-SAR is conceived as the building block of the readoutcircuit of a 3D CMOS image sensor. The top tier of (Fig. 1)embeds a 32×32 macropixel array. Each macropixel is

    composed of a square of 8×8 10µm back-illuminated pixelsbeing locally read in rolling shutter. On the second tier, anarray of 32×32 C-SAR cells is implemented to compute thebinary weighted sum of pixels. Each C-SAR cell is associatedto a macropixel and is thus shared by its block of 8×8 pixels.The connection between tier 1 and tier 2 is realized in directmetal bonding, in a face to face configuration. Only oneinterconnection is required for each 8 pixels column of amacropixel. The readout pipeline of a macropixel is presentedin Fig. 2. Each column of the macropixel in tier 1 isassociated to a sample and hold circuit in tier 2. A bank of2×8 capacitors thus enables to store the reference (blacklevel) and pixel signal of a line of 8 pixels being read inrolling shutter mode. The sampled data are then multiplexedtowards the analog to digital processing unit to be then

    computed.

    Figure 2 : Readout pipeline of a macropixel

    A low power consumption architecture has been simulated fora processing resolution of 9 signed bits reaching a FOM of6.25pJ/pixel. Compared to standard processing architectures,no additive time is required, the processing being performedtogether with conversion. This C-SAR is suitable with highframe rate up to 2200fps.

    A. Verdant, A. Dupret, M. Tchagaspanian, A. Peizerat

    ABSTRACT: The architecture and simulation of a Computational SAR ADC (C-SAR) dedicated to the processing ofimage descriptors for a 3D CMOS image sensor are reported here. The differential charge sharing architectureenables to A/D convert the convolution of multiple binary weighted pixel signals on multi-scale kernels. TheCMOS image sensor is constituted of two tiers (two 3D layers). An array of C-SAR is implemented on the bottomlayer. Each C-SAR is associated to a square of 8×8 pixels on the top layer, with a pitch of 10µm and a fill factorof 80%. The total noise of 460µVRMS simulated at transistor level on a 65nm technology enables to reach aprocessing resolution of 9 signed bits on 0.5V pixels dynamic, with a FOM of 6.25pJ/pixel.

    In automotive applications, the driver drowsiness detection isextremely constrained in terms of processing bandwidth. Theeye blinking analysis is based on high frame rate video

    (200fps). The general principle of the method used to extractblink features from video. A part of this processing relies onthe face detection from Viola-Jones algorithms using Haar-like descriptors. Despite of the high-throughput architecturesassociated to standard CMOS image sensors allowing spatialweighted sums (convolution) to be computed, integratedprocessing features are mandatory to reduce power andsilicon area costs. Hence, to overcome the limitationsassociated to the use of DSP, processing features have beensuccessfully implemented in CMOS image sensors [1].

    The Computational-SAR (C-SAR) architecture allowing thecalculation of the Haar descriptors is here presented. Thistopology takes benefit from the high bandwidth of the SARADC together with low power consumption. Indeed, the

    successive approximation converters are known to providethe best FOM considering the energy per step. This C-SARprocessing unit will be exposed considering itsimplementation in a two tiers 3D CMOS image sensor, chosento preserve the fill factor of the sensor array.

    Figure 1 : 3D CMOS image sensor embedding C-SAR

    References : [1] L. Alacoque, L. Chotard, M. Tchagaspanian, J. Chossat, “A small footprint, streaming compliant, versatile wavelet compression scheme forcameraphone imagers”, In International Image Sensor Workshop, IISW’09, Bergen Norway.

    [2] Verdant, A.; Dupret, A.; Tchagaspanian, M. & Peizerat, “A. Computational SAR ADC for a 3D CMOS image sensor”, IEEE 10th InternationalNew Circuits and Systems Conference (NEWCAS), 2012, pp. 337-340

  • 8/9/2019 2012 Annual Research Report

    29/86

     26

    Design and Optimization of

    Two Motion Detection Circuits

    for Video Monitoring

    Research topics : Smart image sensor, visual perception

    their functionality.

    Figure 2 : Second motion detection circuit. Two half hysteresisvoltage comparators are used for negative and positive voltagecomparison. The logic NOR is used as the output stage, in whichthe active charge (M9) is shared among the same column or thesame row circuit.

    Figure 3 : Simulation results of Fig.2 with Vr=1.65V and avariable Vt to have different threshold levels, which can be

    interpreted by the symmetrical width around Vr to achieveabsolute value comparison. The symmetry of window comparisonis simulated for a fixed central voltage but variable windowwidths ranging from 10mV to 500mV.

    Table 1.Parameters Circuit 1 Circuit 2

    Number of transistors 17 14Power consumption 1µW 0.7µWWindow width 10mV/300mV 8mV/500mVNon homogeneity

  • 8/9/2019 2012 Annual Research Report

    30/86

      27

    Towards a Real Time Sensor for

    Focusing Through Scattering Media

    Research topics : Image sensor, Wavefront correction

    the  parallel processing at the pixel level allows dramaticacceleration of processing [1]. A major challenge is to makethe implementation compatible with pixel level. In that scope,

    we present a pyramidal genetic algorithm (GA) that can beimplemented within a CMOS image sensor.

    Figure 2: 2D and 1D images of the intensity before (a) and after(b) optimization with our genetic algorithm.

    The standard optical setup corresponding to this model, usedfor testing the algorithms and simulating theirimplementation, is shown in Fig. 1. A Laser source illuminatesa reflective SLM array. Each element of the SLM array shiftsthe phase of its incident light from 0 to 2Π. Next the lightbeam is scattered by the media, and finally the transmittedintensity is recorded on an image sensor.

    In order to compare our implementation to state of the art,

    we consider the previously used criterion of enhancementdefined as the transmitted intensity in the chosen target(focus point) over the averaged transmitted intensity beforeoptimization. This criterion is measured with regards to thenumber of frames acquired by the image sensor. An exampleof transmitted light is shown in Fig. 2 running our geneticalgorithm.

    Results show at least a gain of a factor 10 with our algorithmcompared to state of the art. Moreover, the pyramidalapproach compared to the classical one allows at least a gainof a factor 2. Finally, our genetic algorithm has beenevaluated with different noise levels and compared to thestate of the art. Results show a convergence of our algorithmwith high noise level while the state of art does not converge.

    T. Laforest, A. Verdant, A. Dupret, S. Gigan (CNRS UMR 7587), F. Ramaz (CNRS UMR 7587)

    ABSTRACT: Materials such as milk, paper, white paint and biological tissue scatter light. As a result, transmittedlight intensity through these materials is a speckle pattern, having often a short persistence time. Recently,advances in optics to control light through disordered media have reported an increasing efficiency.Consequently, that allows us to foresee a real time sensor that achieve such task in an integrated way. Thereby,in this perspective, we propose a genetic algorithm implemented with pyramidal approach in a CMOS imagesensor, which matches integrated data processing and short persistence time. Our algorithm have beensimulated with a faithful model. Results show at least a gain of a factor 10 compared to the state of the art. 

    Materials such as milk, paper, white paint and biological tissueare opaque due to multiple scattering of light. Consequently,the interaction between the media and the light beam causes

    phase changes of light. Recently, many works have beenreported to control coherent light through scattering media.The principle consists in correcting phase perturbationsproduced by the media achieving inverse diffusion. Indeed, theuse of phase only Spatial Light Modulators (SLM) for wavefrontcorrection is a promising way to achieve focusing coherentlight. Wavefront correction can be achieved by finding theoptimal set of phases thanks to SLM which phases can beadjusted. This task constitutes an optimization problem.

    Turbid media, especially biological tissue, often feature shortpersistence time, of few milliseconds. Hence, the correctedwavefront must be computed within the persistence time. Thiscomplicate the optimization process, which hence must berobust with regards to high noise level. Some works proposefocusing sequential algorithms or the measurement oftransmission matrix that allows generating the correct phaseset that, in turn, will allow focusing the light beam.

    All these algorithms are time consuming or suffer from lack ofrobustness in noisy environment. Recently an efficient geneticalgorithm has been presented.

    Figure 1 : Standard optical setup. M, mirror, SLM, spatial lightmodulator, SIS, smart image sensor.

    For instance, considering a 256x256 pixels image array, apersistence time of 2 ms, and assuming that the algorithmneeds 250 frames to converge, the image sensor have tocapture 125 000 frames per second (fps), corresponding to atransfer rate of nearly 9 Giga-pixels per second. The standardapproach, i.e. camera and processor suffers from limitations:delay due to frame transfer and centralized data processing.Therefore, we aim at developing a dedicated smart imagesensor allowing enhancing the focusing  convergence timewith regards to persistence time in biological media. Indeed,

    References :[1] J.-M. Tualle, A. Dupret, and M. Vasiliu, “Ultra-compact sensor for diffuse correlation spectroscopy,” Electronics Letters, vol. 46, no. 12, pp.819–820, 2010.[2] Laforest T., Verdant A., Dupret A., Gigan S. & Ramaz F. “Towards a real time sensor for focusing through scattering media”, 2012 IEEESensors, October 28-31, Taipei, Taiwan, 2012

  • 8/9/2019 2012 Annual Research Report

    31/86

     28

    Perceptual Image Quality

    Assessment Metric

    Research topics : motion blur, digital photography

    normalized regarding both, JND and MOS database to providedirect human perceptual value not limited to the general caseas straight-line blur and Gaussian blur. Our metric is also

    based on the circle of confusion which can take apart on thefinal user viewing condition (such as web applications,display, printing…). The metric is validated by user test suchas image comparison and it fits the experimental trends ofother databases both in the case of linear motion blur (Fig.1)and arbitrary motion blur (Fig. 2).To our best knowledge this is the first metric that canmeasure all types of arbitrary blur. This metric leads tospecifying image based electronic image stabilization systemsand can quantify the subjective final gain of the overall IS.

    Figure 1: comparison of quality prediction and the ground truth inthe case of linear blur

    Figure 2: comparison of quality prediction with the ground truthin the case of arbitrary motion blur

    F. Gavant, A. Dupret, L. Alacoque, D. David

    ABSTRACT: Image sensors stabilization is usually based on accelerometers. To reduce the number of externalcomponents of digital image sensors, an integrated image stabilization system is envisaged. Such a systemrequires modeling the blur due to hand tremor and a general sharpness metric to quantify the gain of such astabilization system. We aim at providing an accurate model of the hand tremor and its impact as a Point SpreadFunction. In order to define the specification of the image based image stabilization we have derived perceptualvisual quality sharpness metric for camera shake blur. This sharpness metric is based on visual blur test. Itproves to fit well ground truthes such as mean opinion score data base and quality ruler measure of blur.

    The digital imaging market is characterized by conflictingdemands: smaller pixels, in order to attain large format andto reduce the cost of the die, and sensor high performances

    in terms of sensitivity and signal-to-noise ratio (SNR). Tokeep a reasonably high SNR, longer integration times arerequired. Yet, longer integration time makes the quality ofthe resulting image sensitive to motion blur. Since handtremor is more important for lighter device these problemsare even more dramatic for compact cameras andcameraphones. Therefore, an image stabilization (IS)mechanism is to be used to reduce blur due to the camerashake. In order to get rid of the classical mechanicalaccelerometers used in IS, our approach is to develop anintegrated image-based motion detection. The specificationsof this integrated image based motion detection derive fromthe impact of hand tremor blur on the quality of the image.Our work so leads to a faithful model of hand tremor and ametric to measure the impact of blur on the quality of

    images.The angular variations between camera and scene caused byhand tremor present a power spectral density (PSD). Thecharacteristics of the camera (focal length, pixel pitch, etc.)are responsible for the conversion of angular tremor to thetranslation motion of pixels on the image sensor. The PointSpread Function (PSF) then results from the integration ofthe motion signal. The PSF is used to generate the motionblurred image from a reference scene by convolving it withthe reference image.The particular blur induced by the hand tremor in theresultant image has not been well characterized regarding itsimpact on human perception. Yet, two particular types of blur(Gaussian blur and straight-line motion blur) have beenstudied. The Gaussian blur can be found in defocus conditionwhile the straight-line blur is generally used as a simplifiedmodel of the motion tremor. For the Gaussian blur, somepublicly available databases of subjective quality data alreadyexist. The data base uses several distortions such asGaussian blur providing the Mean Opinion Score (MOS).Regarding the straight-line blur quality ruler based on the

     just noticeable difference (JND) have been studied. Yet, dueto the complexity of the camera shake, these particularresults are not suitable for complex blurs.Thus we developed a sharpness quality metric based on thePSF of the camera shake. The result of the metric is then

    References : [1] Gavant, F.; Alacoque, L.; Dupret, A.; Ho-Phuoc, T. & David, D. (2012), “Perceptual image quality assessment metric that handles arbitrarymotion blur'', SPIE Conference on Image Quality and System Performance IX, Burlingame, CA, JAN 24-26, 2012.

  • 8/9/2019 2012 Annual Research Report

    32/86

      29

    Saliency-Based Data Compression for

    Image Sensors

    Research topics :visual attention compression, architecture-algorithm co-design

    since it features a very compact physical implementation, thesecond is Itti's model that usually serves as a reference. Thebest performances are obtained with our model.

    Fig. 3 represents several frames, their saliency maps andcompressed versions. This framework is particularly effectivewith scenes containing locally distributed motion. Indeed, inthis case the moving regions - very well predicted by ourmodel and actually fixated by observers - conserve allinformation while large non-fixated regions are reconstructedonly by low-frequency information.

    Figure 2 : Frame memory required for motion computation

    The proposed framework presents an original, compact yetefficient, saliency-based data compression model for imagesensors. It is flexible and so might be improved by addingfiltering operators.

    Figure 3 : Other frames (first row) and their saliency maps (thirdrow) provided by the model BSM1. Compressed frames (secondrow)

    Tien Ho-Phuoc, L. Alacoque, A. Dupret, A. Guérin-Dugué, (GIPSA-LAB)

    ABSTRACT: As saliency models have revealed ability to predict where observers fixate during scene exploration.Embedding a saliency model into an image sensor for data compression allows allocating bit-rate budgetaccording to the saliency level of a region. This paper presents an original implementation of a saliency-baseddata compression algorithm and architecture. A video-rate compliant, compact saliency models is designed toallow its integration within an image sensor. It shows better performances in predicting human fixation thanthe state-of-the-art models. A simpler version of our proposed model requires 256 times less memory. Second, aHaar wavelet based compression is applied according to the saliency of regions in each frame.

    Lossy compression algorithms enable higher compressionratio than their lossless counterparts at the expense ofartifacts that are visually disturbing, especially on salient

    regions, and when high compression ratio are used. Animage sensor integrating a saliency model is able to adaptthe compression ratio according to saliency. Itsimplementation with the image sensor must be compliantwith strong hardware constraints, i.e. limited memory andprocessing elements within the image sensors. We firstpropose a very compact - yet efficient - video saliency modelthat complies with the low-complexity requirement of imagesensors. The proposed model combines - through the “OR”operation - motion saliency with the central fixation bias, ahuman viewing tendency. Motion saliency is computed inblocks thanks to an adaptive threshold (Fig. 1) resulting inlittle required memory (Fig. 2). The central fixation bias isconstant for all frames and is stored within a look-up table.Second, the compression step is applied to each block. If a

    block is salient, all its information is conserved. By contrast,non-salient blocks are reconstructed by only their LL(approximation) component from the Haar wavelettransform. Only compact operators are used in the proposedmodel.

    Figure 1 : Illustration of the motion saliency extraction byadaptive threshold

    Fig. 3 illustrates the saliency map of the proposed saliencymodel - exploiting motion and the central fixation bias - for agiven frame. It is also compared with the saliency maps oftwo other algorithms: the first one is Sigma-Delta algorithm,

    References :[1] Tien Ho-Phuoc, Alacoque L., Dupret A., Guerin-Dugue A., Verdant A., "A compact saliency model for video-rate implementation", 45thAsilomar Conference on Signals, Systems and Computers (ASILOMAR), 2011, pp.244-248, 6-9 Nov. 2011.

    [2] Tien Ho-Phuoc, Laurent Alacoque, Antoine Dupret, Anne Guérin-Dugué, Arnaud Verdant, “A unified method for comparison of algorithms ofsaliency extraction” Proc. SPIE. 8293, Image Quality and System Performance IX 829315 (January 22, 2012)[3] Tien Ho-Phuoc, Laurent Alacoque, Antoine Dupret, “Compact saliency model and architectures for image sensors”, IEEE Workshop on SignalProcessing Systems 2012.[4] Tien Ho-Phuoc, Antoine Dupret, Laurent Alacoque, “Saliency-Based Data Compression for Image Sensors”. IEEE Sensors, 2012. Oct. 28-31,Taipei, Taiwan

  • 8/9/2019 2012 Annual Research Report

    33/86

     30

    A New Approach of

    Smart Vision Sensors

    Research topics: smart imagers, adaptive processing, feedback

    the image capture parameters (exposure time, conversiongain and pixel reset), during photons integration time. Thisintroduces the use of frame sub-exposures to construct a full

    frame. These sub-exposures may be considered as sampledcontinuous readout. To deal with the control needed for ourapproach, we propose a hardware architecture adaptationrelying on 3D stacking technologies to process pixel quicklyenough to enable capture control – by feedback – during theimage construction. It associates a 2D preprocessingelements matrix to the photo-sensitive layer, separated inpixel blocks. These preprocessing elements are designed todo generic vision pre-computing in order to provide apreprocessed image, or specific image features to theassociated high level processing unit.The innovative purposeof this layer is to locally control the photo-sensitive layer byprocessing incoming pixel values on the fly, and sending backadapted capture parameters.

    Figure 2: Multi- exposition adaptation and resulting motionrelated Region-of-Interest delivered by the sensor integration fastfeedback adaptation.

    This work was presented in [1], showing the first results offeedback controlled design. Fig. 2 shows an application of ourapproach for motion detection in a highly contrastedenvironment. As image processing algorithms are designedfor traditional architecture that processes images after theiracquisition, new algorithms must be considered in order tobenefit from this smart sensor architecture. Further work willinvestigate such designs, and enhance our smart sensoradaptation capabilities and flexibility.

    J. Bezine, M. Thévenin, R. Schmit, M. Duranton, M. Paindavoine (LEAD)

    ABSTRACT: Today’s digital image sensors are used as passive photon integrators and image processing isessentially performed by digital processors separated from the image sensing parts. This approach imposes tothe processing part to deal with definitive pictures with possibly unadjusted capture parameters. This workpresents a self-adaptable preprocessing architecture concept with fast feedback controls on the sensing level.These feedbacks are controlled by digital processing in order to adapt the exposition and processing parametersto the captured scene parameters. This innovative way of designing smart vision sensors, integrating fastfeedback control enables new approaches for machine vision architectures and their applications.

    Nowadays, in most image processing systems, the sensor isseparated from the image processing part, pixel values beingsent serially. First, photons are integrated for a predefined

    exposition time; next, a control circuit reads and sequentiallyconverts the pixel values from analog to digital. Finally, pixelvalues are sent to an image processor for imageenhancement or computer vision applications. Thus, imageprocessing systems consider pixel values after the end of fullexposure. In that way, corrections such as dynamic rangeenhancement or image stabilization need to be added inorder to suppress the effects of unadjusted image captureparameters. This is particularly true in vision applicationssuch as obstacle detection, or target tracking, the imagesensor being used on moving vehicles, suffering from theirvibrations and often analyzing difficult scenes (highlycontrasted or bad weather conditions).

    During the last decade, image processing systems tend to

    link sensing parts to the processing units. Near-pixelprocessing were introduced in smart sensor, at analog ordigital level, in order to refine or adapt captured imagesbefore final processing, thus optimizing it. To further improvesilicon and energy efficiency, this work proposes to associateeven more closely image capture and image processing byadding fast and local feedback controls in the usual imagecapture process.

    Figure 1: Schematic of feedback integration approach in theimage processing flow.

    This adaptation of the usual image capture process ispresented in Fig. 1. It is firstly based on the close control of

    [1] J. Bézine, M. Thévenin, R. Schmit, M. Duranton, M. Paindavoine, “A New Approach of Smart Vision Sensors”, Proc. SPIE 8436, Optics,Photonics, and Digital Technologies for Multimedia Applications II, 84360I (June 1, 2012).

  • 8/9/2019 2012 Annual Research Report

    34/86

      31

    Architecture,

    IC Design &

    Control forDigital SoCs

    3D Architectures & Circuits

    ManycoresFDSOI Circuits & Memories

     Asynchronous Design

    Exploration & Estimation

     Adaptive Control

    3

  • 8/9/2019 2012 Annual Research Report

    35/86

     32

    Platform 2012, a 3D-ready Many-Core

    Computing Accelerator with Power,

    Thermal and Variability Management

    Research topics : Many-core architecture, low-power, System-on-Chip, 3D stacking

    bits data-wide asynchronous IO ports driven by micro-buffersand tied to micro-pads for die stacking. In addition (not

    shown in the figure), power and ground are also deliveredthrough a “vertical plug”. In this configuration the die will beflipped and stacked on top on a host SoC with CPU,peripherals, standard IOs and DRAM interfaces.

    Figure 1 : Block diagram of the flexible SoC.

    A second 2D configuration is supported by the static MUXes.In this mode traditional board-level high-speed interface(denoted 2D SNoC) links the fabric with the external host andmain memory. This interface is physically driven through asmaller number of standard IO pads (two 81-pin ports). The2D configuration allows simple interfacing with on-boardFPGA-based hosts.

    The SoC is being implemented in STMicroelectronics’ low-power 28nm CMOS process. Target chip area is below26mm². The power distribution grid of the SoC is designed tohandle power delivery in both 3D and 2D configurations. Thechip power consumption under heavy workload is upper-bounded at 4W (at 1.1V, 125C), but its aggressive powermanagement features enables energy-proportional operationup to a few hundreds mW average power.

    L. Benini (UNIBO), D. Melpignano (ST), E. Flamand (ST),B. Jego (ST), T. Lepley (ST), G. Haugou (ST), F. Clermidy, D. Dutoit

    ABSTRACT: P2012 is an area- and power-efficient many-core computing accelerator based on multiple processorclusters implemented with independent power and clock domains, enabling aggressive fine-grained power,reliability and variability management. Clusters are connected via a high-performance fully-asynchronousNetwork-on-Chip (NoC) and feature up to 16 processors. The SoC is being implemented in STMicroelectronics’low-power 28nm CMOS process and is 3D stacking ready. Target chip area is below 26mm² for a 4 clustersversion.

    The Platform 2012 (P2012 [1]) project aims at moving asignificant step forward in programmable accelerator

    architectures for next generation data-intensive embeddedapplications such as multimodal sensor fusion, imageunderstanding and mobile augmented reality. P2012 is anarea-, power-efficient and process aware many-corecomputing fabric, and it provides an architectural harnessthat eases integration of hardwired IPs. P2012 can bedescribed as a Globally Asynchronous Locally Synchronous(GALS) fabric of tiles, called clusters, connected through anasynchronous global NoC [2] (G-ANoC). The P2012 clusteraggregates a multi-core computing engine (ENCore), and acluster controller (CC). The ENCore cluster can host anumber of processors varying from 1 to 16.

    Power, thermal and variability management are essentialfeatures in computing architectures targeting deep-

    submicron CMOS implementation. P2012 makes use ofseveral hardware-assisted control loops to reduce design-time margin and to improve energy efficiency. Each clusterhas a local clock, generated with a small-size and highlyreactive Frequency-Locked-Loop (FLL). Clock speed can beadjusted in a few cycles on a per-cluster basis with no inter-cluster constraints. The fabric interconnect is fullyasynchronous, hence no global chip-wide clock distribution isrequired. Static and dynamic variability are managed thougha number of distributed sensors, both direct (critical pathmonitors, both embedded and replica-based) and indirect(thermal sensors, both absolute and relative). Sensors areaccessible through memory-mapped registers clustered inthe Clock Variability and Power (CVP) module which controlsprocess, variability and temperature sensors. Hence

    feedback-based software policies can be implemented foroperating point selection.

    The first silicon embodiment of P2012 is the flexible SoCdepicted in Figure 1. One key innovation in the physicalimplementation of the SoC is its flexibility in off-chipconnectivity. The die can be configured as an “acceleratorchiplet” for three-dimensional die-stacking by appropriatelysetting the static MUXes shown on the right hand side offigure 1. In this 3D mode (denoted 3D ANoC) the fabricinterface to host and main memory goes through three 32

    References : [1] Melpignano D., Benini L., Flamand E., Jego B., Lepley T., Haugou G., Clermidy F. & Dutoit D., "Platform 2012, a many-core computingaccelerator for embedded SoCs: Performance evaluation of visual analytics applications." 49th Annual Design Automation Conference, DAC '12,

    3 June 2012 - 7 June 2012: 1137-1142.[2] Y Thonnart, P. Vivet, F. Clermidy, "A fully-asynchronous low-power framework for GALS NoC integration”, DATE 2010

  • 8/9/2019 2012 Annual Research Report

    36/86

      33

    Enhancing Cache Coherent Architectures

    with Access Patterns for Embedded

    Manycore Systems

    Research topics : shared memory, coherence protocols, manycores, memory access patterns

    manycore systems, a hardware structure and a specificprotocol was designed specifically to handle the patternbased access. An example comparing the same series of data

    accesses for both protocols can be seen in Figure 2.

    Figure 2: Comparison between baseline protocol and the patternapproach (speculative-hybrid protocol).

    Even for such a simple pattern with only 3 elements (thedifference grows linearly with the size of the pattern), thenumber of messages is reduced, and a speculative prefetch isdone: once the first element of the pattern is detected, theremaining parts of the pattern is fetched and updated withoutwaiting for any other memory access. Hence, future accessesare automatically prefetched and ready for use, reducingboth throughput and memory latency.A first real-size evaluation of the supposed advantage wasdone on a simple simulation instrumented with a in-housemodified version of the pinatrace Pintool memory analyzer,from Intel's Pin framework. We showed on that on a two-passimage filter that was chosen for it stresses memory accesses,we obtained a reduction of 37% of message throughput andan acceleration of the application by more than 50% withregards to the baseline protocol alone.

    Hence, this protocol, taking advantage of regular memoryaccesses (patterns), was validated on a programrepresentative of embedded applications. The results showthat such an apparatus significantly reduce message andmemory throughput and accelerate applications. Suchbreakthrough can be vital for the future of manycoresystems, their programmability and their performance.

    J. Marandola (USP), S. Louise, L. Cudennec, J-T Acquaviva, D.A. Bader (GATech)

    ABSTRACT: One of the key challenges in advanced micro-architecture is to provide high performance hardware-components that work as application accelerators. In this paper [1], we present a Cache Coherent Architecturethat optimizes memory accesses to patterns using both a hardware component and specialized instructions. Thehigh performance hardware-component in our context is aimed at CMP (Chip Multi-Processing) and MPSoC(Multiprocessor System-on-Chip). We also provide a first evaluation of the proposal on a representativeembedded benchmark program, which shows that we can achieve over 50% computing speedup and reducememory throughput by nearly 40%.

    Shared memory paradigms are gaining interest to programmulticore systems: the main C compilers already embedsupport for OpenMP. Indeed, such programming concepts

    allow improving on legacy code to obtain a reasonable andefficient multicore support. But the age of simple multicoresis reaching an end: as the number of cores grows, singlebuses are replaced by Networks-on-Chip (NoCs), distributedmemory, and distributed data-paths: bus spying techniquesused to ensure cache coherence are no more applicable.With distributed caches and NoCs, the usual MESI (Modified,Exclusive, Shared, Invalid) protocol for cache coherence mustbe modified to refer to a given (reference) core called HomeNode (HN) which tracks the MESI state of a given cache linefor the whole chip. But this technique does not scale well,and is not adapted to embedded devices and applications.First, it can be very talkative as seen in Figure 1, and,second, it does not take advantage of regular memoryaccesses.

    Figure 1: A write message transaction with the baseline protocol.

    Such regular accesses can be represented as memory accesspatterns and a research effort was engaged which led to apatent deposit [2]. Improving on the baseline protocol whichis the state of the art of shared memory mechanisms for

    References : [1] Marandola, J.; Louise, S.; Cudennec, L.; Acquaviva, J.-T. & Bader, D. A. “Enhancing Cache Coherent Architectures with access patterns forembedded manycore systems System on Chip (SoC)”, Proc. of 2012 International Symposium on SOC, Tampere, Finland, 1 -7, 2012[2] L. Cudennec, J. Marandola, J-Th Acquaviva and J-S Camier, “Multi-core System and Method of Data Consistency”, FR2970794 (A1), CEA,January 2011.

  • 8/9/2019 2012 Annual Research Report

    37/86

     34

    Adaptive Stackable 3D Cache

    Architecture for Manycores

    Research topics: 3D, cache, NUCA, manycore

    by the Operating System, to allocate a larger private cachequantity to a given application. Moreover, the operatingsystem can also decide to share a given cache tile between

    one or several applications running in various memorysegments to reduce the overall MISS rate while losing theexclusivity of access to this cache tile. In this case, the highlyaccessed memory segments will occupy a larger storagecapacity in the 3D cache.By allowing the OS to control the cache resource allocation