TKT-1212 Digitaalijärjestelmien · PDF fileTKT-1212 Digitaalijärjestelmien toteutus...
Transcript of TKT-1212 Digitaalijärjestelmien · PDF fileTKT-1212 Digitaalijärjestelmien toteutus...
TKT 1212 Digitaalijärjestelmien TKT-1212 Digitaalijärjestelmien toteutus
Lecture 15 - System design trends & challenges
Erno Salminen TUT 2011Erno Salminen, TUT, 2011
AcknowledgementsAcknowledgements Most slides were made by Ari Kulmala
The International Technology Roadmap for Semiconductors
M. Keating and P. Bricaud, “Reuse Methodology Manual for S Ch D 3 d Ed ”System-on-a-Chip Designs, 3rd Edition”
2
OutlineOutline Challenges in digital systems
Introduction to system design
3
Challenges in digital system designChallenges in digital system design High-level challenges, not taking into account physical and
manufacturing issues
M d l1. Managing design complexity
2. Minimizing power consumption
3 V if i th f ti lit3. Verifying the functionality
4. (optimizing chip area & performance)
4
Design complexityDesign complexity Good news: "Over the past ten years, reuse leverage more
th d bl d d t d t t l t i t l than doubled, and more reuse tends to translates into less project effort, shorter cycle times as well as fewer spins and less schedule slip.”
Bad news: ”85%-89% of IC projects miss their original schedules… Schedule slip is 5-30% [Accenture’s report] or even 44% [Numetrics’ report]”even 44% [Numetrics report] One reason is that re-usable components are not, after all, easy
to integrate
[Eetimes] http://eetimes.eu/semi/showArticle.jhtml?articleID=204702114 http://www.eetasia.com/ART_8800520301_480200_NT_68f71562.HTM
5
Design challenge #1 :ComplexityDesign challenge #1 :Complexity #1 - Increased complexity
6
Fig 1. TI OMAP3430 top-level block diagram Fig 2. SoC size as function of time (CAGR= compund annual growth rate)
Parallel computingParallel computing A few cores on desktop processors Several cores on embedded devices Tens of cores in research embedded systems
E.g. 35 processors, 23 other ip components, 3 FPGA boards used How to write software?
SS SS SS SS SSS SS SS SS SS SSS SS SS SS
HIBI On-Chip Network
SoC Architecture
MERM SD HM M ME ME DQ DQDQ DQ S SME ME ME DQ DQ DQ SS S ME ME ME DQ DQS
FPGA board #0 FPGA board #1 FPGA board #2
Mapping to FPGA prototype
b id b idp. p. p. p.FPGA board #0
Stratix II S180
FPGA board #1
Stratix II S180
HIBIHIBI
FPGA board #2
Stratix II S180
HIBI
SM RMME DQ SDMasterCPU
SlaveCPU
ResourceM
Full-PixelM ti t
DCT-Q-IDCT IQ
SDRAMt l
Legend HM HIBI M it
bridge bridge
hand
sh.
hand
sh.
HIB
I wra
p
HIB
I wra
p
HIB
I wra
p
HIB
I wra
p
hand
sh.
hand
sh.
SM RMME DQ SDCPU CPU ManagerMotion est. IDCT-IQ controlg HM Monitor
A. Kulmala et.al. , SAMOS 20077
IBM/Sony/Toshiba CELL BEIBM/Sony/Toshiba CELL BEsynergistic processor elements (SPE)d l h d d l (PPE)dual-threaded power processor element (PPE)element interconnect bus (EIB) (actually ring)
Heterogeneous processors:1 PowerPC8 SPEs
8
New architectures: Intel Terascale/PolarisTerascale/Polaris 80 cores (small processors) Interconnected with Mesh network on chip Interconnected with Mesh network-on-chip Stacked chip: to solve local memory problems
9
System on chip (SoC)System-on-chip (SoC) Purely:
Integrating whole system on a single chipIntegrating whole system on a single chip Chip complexity increases Processors, memories, hardware accelerators, I/Os, analog RF, …
Loosely definedy A highly complex chip full of digital logic Interfaces to external memories, analog devices, etc
Two main types: power-efficient (PE) and high-performance (HP, later a.k.a. CS – consumer stationary)
Target is to reduce cost 1990-1992 mobile phones included 15 ICs and 800 other discrete
components and in 2002 3-4 ICs and 200 discrete components
Cellular phones as embedded systems. Neuvo, Y. s.l. : IEEE International Solid-State Circuits Conference,Digest of Technical Papers, 2004. pp. 32-37.
10
Power efficient SoCs (SOC PE)Power-efficient SoCs (SOC-PE) Its typical application area is electronic equipment categorized as
“Mobile Consumer Platforms’” this application area will make rapid progress in the foreseeable future across
semiconductor technology generations. Very high performance required while the power consumption is
l l d b h b l f strictly limited by the battery (lifetime). Advanced power consumption reduction techniques
As a result, the requirement for processing power will be 1000× in the next ten years while the requirement for dynamic power consumption next ten years, while the requirement for dynamic power consumption will not change noticeably.
The life cycle of “Mobile Consumer Platform” products is short, and will stay short in the futurey The design effort cannot be increased—it needs to stay at the current level
for the foreseeable future. Die-size of around 64 mm2
ITRS 2005, http://www.itrs.net/Links/2005ITRS/SysDrivers2005.pdf11
Trends on SoC Pes #2Trends on SoC-Pes #2
ITRS 2005, http://www.itrs.net/Links/2005ITRS/SysDrivers2005.pdf12
SoC PE Design complexity trendsSoC-PE Design complexity trends
13
SoC Consumer Stationary (SoC CS)SoC Consumer Stationary (SoC-CS) E.g. a high-end game machine (like PS3) Processing performance is most important differentiator Required Processing performance is most important differentiator. Required
processing performance in year 2020 will be more than 70 TFLOPS. As Functions will be implemented and realized mainly by software, high
processing power is required, and hence this SOC needs many data p g p q , yprocessing engine( DPE ).
Comparing with the SOC-PE, has lower performance-per-power than SoC-PE, but better than in terms of functional flexibility in case of ddi dif i f i adding or modifying functions.
The life cycle of those SOC-CS is relatively long, because it is easy to add or modify functions, and as a result the application area is wide.
L i i th i S C PE b t th b t i hti i Less processing engines than in SoC-PE but the beasts are mightier in SoC-CS
Die-size of around 220 mm2
14
SoC CSSoC-CS
DPE = data processing engineengine
ITRS 2006 update, http://www.itrs.net/Links/2006Update/FinalToPost/01_SysDrivers_2006UPDATE.pdf
15
Power consumptionPower consumption Chip power consumption can be defined as
Pavg = Pdynamic + Pshort + Pleakage + Pstatic
Traditional view of CMOS transistors is that they do not consume power while static (Pstatic)p static
However, in 90nm and below, leakage becomes an increasingly important factor (Pleakage)
A large proportion of power is consumed by dynamic operations A large proportion of power is consumed by dynamic operations and switching (next slide)
Pshort = short-circuit power, e.g. when gate switches state, both transistor types are conducting at the same time for some timetransistor types are conducting at the same time for some time ~10% of total chip power
16
Benini: dynamic power management
D i tiDynamic power consumption2
dynamic out ddP K C V f
K = average number of transitions of the output node every cycle divided by two (e.g. ½ means that
dynamic out dd f
node every cycle divided by two (e.g. ½ means that there is a single transition each cycle) Glitches etc
l l Vdd = Supply voltage f = clock frequency C = output capacitance Cout = output capacitance Note the square-law dependence of Vdd
Typically, higher the f, higher Vdd required
17
yp y, g f, g dd q
Soc CSSoc-CS
Power consumption per a DPE itself will pbe reduced
Leakage power will be much more than the calculated value shown in Figure because of variability and temperature effects
ITRS 2006 update, http://www.itrs.net/Links/2006Update/FinalToPost/01_SysDrivers_2006UPDATE.pdf
18
SoC PE powerSoC-PE powerLarger fraction of power is static than in SoC-CS
SOC-CS POWER CONSUMPTION TRENDS TRENDS Different from the SOC-PE, the SOC-CS is generally free
from the battery life issue, however rapid power consumption growth has a critical impact on chip packaging issue and cooling issue issue and cooling issue.
Leakage power will be much more than the calculated value shown in last slide because of variability and temperature y peffects.
Power consumption per a DPE itself will be reduced because h d f h d l d lthe decreasing factor such as Vdd and insulator dielectric
constant become dominant.
20
Cost of designing a Soc PECost of designing a Soc-PE•Blue line: costs nowadays•Purple: cost without the Purple: cost without the inventions on the design productivity
http://www.itrs.net/Links/2005ITRS/Design2005.pdf21
NumericalNumerical values for the previous table
22
Simplified Electronic Product D l t C t M d l Development Cost Model
http://www.itrs.net/Links/2005ITRS/Design2005.pdf23
Design development costsDesign development costs Manufacturing non-recurring engineering (NRE) costs are on the order
of millions of dollars (mask set + probe card) for high-end chips( p ) g p Design NRE costs routinely reach tens of millions of dollars Design shortfalls being responsible for silicon re-spins that multiply
manufacturing NRE. g Rapid technology change shortens product life cycles and makes time-
to-market a critical issue for semiconductor customers. Manufacturing cycle times are measured in weeks, with low uncertainty. g y y Design and verification cycle times are measured in months or years,
with high uncertainty. Software can account for 80% of embedded-systems development costy p Test cost has grown exponentially relative to manufacturing cost Verification engineers outnumber design engineers on microprocessor
project teams
http://www.itrs.net/Links/2005ITRS/Design2005.pdf24
ITRS 2006 update, http://www.itrs.net/Links/2006Update/FinalToPost/02_Design_2006Update.pdf25
RS/
SysD
nks/
2005
ITR
itrs.
net/
Lin
dftt
p://
ww
w.i
vers
2005
.pd
ht riv
26
ITRS 2005: Interconnect
global signals
global signals global signals with repeaters
gate
local signals
Delay of global wires does not scale with
gate
Courtesy of Erno Salminen
technology
27
Note on High end processorsNote on High-end processors Really, really complex and exotic structures Parallel development projects Parallel development projects
Intel has around 400-500 engineers for new CPU architecture project Development flow (simplistic)
1 High-level modeling1. High level modeling2. Functional models with RTL3. Analysis of bottlenecks and microarchitectural choices
1. Don’t forget the market pressure (e.g. compromise performance to get high f i )frequencies)
4. Implementation of critical blocks in low-level custom Even single transistors tweaked, delays very carefully calculated etc Very time consuming, not doable with HDLy g
Formal methods used in critical parts Very high volume
Speed binning – chips are priced according to their freqeuncy
28
Moore’s law and moreMoore’s law and more
SiP: Many ICs in a single package(system-in-package)( y p g )
29
Teaching in DCSTeaching in DCS
30
DI-tutkinto 30 opEsitiedot/Koulutusohjelma-
Kandidaatin tutkinto 25 op
Yksinkertaistetut kurssien esitiedot 11/12, laatinut ES
jkohtaiset
p
TKT-1101 DigTeknPer.
TKT-1202 DigSuunn
TKT-1212 DigJärjTot8 op (k3)
TKT-2431 SoC-Suunn
5 op (s1)
TKT-1400 ASIC I
5 op (s1)
TKT-3541 Soc-Alustat
5 op (k3)
4 op (s1) 5 op (s1)
TKT-1220 Aritmetiikka
4 op (k3)
5 op (s1)
TKT-1410 SuunnVarm
5 op (k3)
TKT-1527DigSysDesIss.
5 op (k3)
tai ELE1010
TKT-3200 Tietokonetekn. I
TKT-1110 Mikroprosess.
TKT-1230 Laboratorio
3 op (k4)
5 op (k3)
TKT-3526Proc. Design
TKT-2526Project work
5-8 op
TKT-9626/9636Seminar
3-6 opTKT-3400Tietokonetekn. II
5 op (s1)p
5 op (k3)TKT-3500
Mikrokontroll.5 op (s1)
g5 op (k3)
TKT-2301 Lang. sens.vsov. 5 op (s1)
TKT-2456 Wireless.sens. 5 op (k3)
TKT-1540/1550DI-työ semin.
1+0 op
TKT-9646Colloqium
3 op
5 op (k3)
TKT-1570Kandityösemin.
8 op
tai ELE-2300
TKT-2530SatellPaikann
5 op (s1)pakollinenTKT-2556
Inertial nav.
p ( ) p ( ) 3 op
TKT-2566GNSS.
5 op (k3)
8 op
Esitietoksi käy TKT-1202 tai TKT-1212TKT-9617
ScientificPubl6 op (s1)
suositeltava
Tarkista eksaktit esitietovaatimukset opinto-oppaasta.
5 op (k4)y
System design processSystem design process
32
System development back in the days
Traditional waterfall model just d k i l d i
in the days
does not work in large designs
Serialized HW-SW development
Time to market pressure Time-to-market pressure
=> Parallalize everything possible HW development (prototypes, p p yp
emulation) SW development Verification (verification Verification (verification
environment) HW/SW integration
33
System development t 2000at 2000s ”Spiral flow”
Parallel all the time
Iterations after iterations Inevitable
Physical issues taken into account earlyaccount early
”aina kiire jonnekin on, on, on”
34
D ig d ifi ti l i t l kDesign and verification cycle interlock
Func. spec
DESING CYCLE DURATION
Hi h l l d iHigh-level designDesign implementation
Final physical design
Create ver. plan Evolve verification plan
Implement verif. environmentfrom plan Debug HDL and environmentfrom plan g
regression
Plan review checkpoint Tape-out readinesscheckout
Tape out35
System designSystem design Blocks preferably re-usable
IP
Blocks implemented as in earlier lectures with reearlier lectures with re-usable macros
36
System Design Process (2)System Design Process (2)1. System specification
identify the system requirements (engineering, marketing)y y q ( g g g) formulate the preliminary specification
2. Develop a behavioural model Basic algorithms their usability (e g good enough video encoding quality) Basic algorithms, their usability (e.g. good enough video encoding quality) Executable specification, “golden reference”
3. Model refinement and testf f f h f l d f f h d verification environment for verifying the functionality and performance of the design
floating point model -> fixed-point model -> cycle-accurate and bit-accurate model
4. HW/SW partitioning (decomposition)p g ( p ) largely a manual process guided by experience and understanding of tradeoffs
(area(cost) vs. performance) define the interfaces between HW and SW, communication protocols
37
System Design Process (3)System Design Process (3)5. Specify and develop a hardware architectural model Memory architecture Interconnection structure, bandwidth, latency Start from high level models, transaction-level modelingg , g Refine the architecture until it meets the requirements
6. Refine and test architectural model (co-simulation) A behavioural model of the HW A prototype version of the SWA prototype version of the SW Key to success – efficient HW-SW co-design
38
System Design Process (4)System Design Process (4)7. Specify implementation blocks HW specification: Basic functions
Timing, area, and power requirementsg, , p q
Physical and SW interfaces
Descriptions of the I/O pins and register map
39
Integrating macros into a SoCIntegrating macros into a SoC
40
Selecting the IP criteriaSelecting the IP, criteria General
Quality of the documentationQ y Robustness of design
”Proven in silicon”
1. For hard macrol f h d d f Completeness of the design and verification environment
Functional, timing, synthesis, floorplaninng models If CPU, compilers, debuggers
Physical design limitationsy g Aspect ratio, blockage and porosity of the macro (how much it blocks routing)
2. For soft macro Robustness of verification environment
Rich set of models and monitors for automated stimulus and checkers Rich set of models and monitors for automated stimulus and checkers Ease of use
Interfacing the macro to the rest of the design User-friendly installation and synthesis scripts, tools in general
41
Problems in integrating IPProblems in integrating IP Interfaces do not work as documented for example, some pin is inverted
Misunderstanding of the block’s function
l b Functional bugs (…)
Someone needs to get familiar with the IP
D i i i l Documentation is incomplete
Interface of the IP is proprietary (does not match used bus)
V ifi ti d l ( b t t f t d l ) Verification models poor (abstract, fast models)
Limited support from IP provider
42
Examples of i t g ti t 3988
24067
HW in encoder
SW in encoder
DCT-Quant.-IDCT-IQuant
integration cost Integration costs!
2685
1794
0 5000 10000 15000 20000 25000 30000
HW in simulation
HW in simple test
clock cycles
Motion estimator (ME)
Execution time
The used IP may be lightning fast, but proprietary interface may incur substantial overhead
3367
7688
25751
341HW in simulation
HW in simple test
HW in encoder
SW in encoder
Motion estimator (ME)
E incur substantial overhead E.g. Data needs to be fetch
somewhere
0 5000 10000 15000 20000 25000 30000clock cycles
1 794 590
884 2 142
1 303
4 321341
1
ME
HW execution t ime Software Data delivery Contention
DCTQ
Execution time
E.g. data permutation
Examples from MPEG-4 Encoder
1 794 590 1 30330
0 2 000 4 000 6 000 8 000Clock cycles
DCTQIDCT
0 100 200 300 400 500 600 700Memory bits [103 bits]
Where time is spent?
25431652
803
486
5750
495
1148
37DCT-Q-IDCT Wrapper
ME
ME Wrapper
HIBI WrapperHW monitor
RMSDRAM controller
Logic cell usage
Memory bit usageAntti Rasmus, Ari Kulmala, Erno Salminen, Timo D. Hämäläinen, "IP Integration Overhead Analysis in System-on-Chip Video Encoder", IEEE
43
4951824 4383
615
0 1 000 2 000 3 000 4 000 5 000 6 000 7 000
Nios II
DCT-Q-IDCT
Logic cellsArea
g y y p ,Workshop on Design and Diagnostics of Electronic Circuits and Systems (DDECS) 2007, Krakow, Poland, April 11-13, 2007, pp. 333-336.
SummarySummary Increasingly complex systems need new methodologies Hierarchical gradual refinement (’Spiral flow’) Hierarchical, gradual refinement ( Spiral flow ) Reuse evertyhing you can. Pay attention that your own work is
reusable Accessible, easy to start with, well commented, tool-independent…, y , , p
Invest in executable specifications Divergence to two types of SoCs: High-performance & Low-
powerp Several advances and active research required in order to keep on
pushing the technology in its limits Parallel processing seems the best way to increase performanceParallel processing seems the best way to increase performance New methodologies for SW programmers need to be adapted Currently, tool support for parallelization is weak
44
ExtraExtra
45
SystemCSystemC
Higher abstraction level language for system modelingg g g y g
46
Sidenote: technology nodeSidenote: technology node We speak about 90nm, 65
nm etc. What exactly that Gate
source drain
channel
. ymeans? It depends
For MPU/ASIC it is typically
substratesource drain
N-type cmos transistor
y ygate-length isolated feature size Or channel length
DRAM half pitch is roughly the minimum distance between two wiresbetween two wires
Note that there is some tolerance between manufacturers, e.g. 90nm process might actually b l k 8 100 be like 85-100 nm
47
Fundamentals of SystemCFundamentals of SystemC SystemC is based on C++ Primary goal of SystemC to enable system level modeling Primary goal of SystemC to enable system-level modeling
Systems implemented in SW, HW, or some combination of those Requirements for system-level design language
Specification and design at various levels of abstractionSpecification and design at various levels of abstraction Fast simulation speed to enable design-space exploration Incorporation of embedded software (SW) code Creation of executable specification of design intentp g Creation of executable platform models Constructs allowing the separation of computation and communication
Needs to support wide range of models of computation and i i l l f b i d h d l i d i communication, levels of abstraction, and methodologies used in system
design E.g. DSP problems naturally map to a dataflow or Kahn process network
(KPN) models( )
48
SystemC (2)SystemC (2) Compiles to an .exe (i.e. built-in simulator) Own debug printf()’s required for feedback
Core language includes: Modules, ports, processes, events, interfaces, channels, p , p , , , Event-driven simulation kernel
Functional modeling and transaction-level modeling enable hiding “uninteresting” details at early stage of developmenthiding uninteresting details at early stage of development Increased simulation speed and faster design space exploration
Not very well supported for synthesis May lead problems of keeping two separate models up-to-date
(SystemC and VHDL of a block)( y )
49