Seminario utovrm

67
System level design

Transcript of Seminario utovrm

System level design

Agenda

Part 1Microprocessor and silicon technology EvolutionMemoriesBus architecturesSystem On Chip

Part 2GPUFPGAArchitectural designCollaboration toolsOpen Source15/04/2023

System level designPart 1

Electronic System DesignArchitectural design• Break down the design in subsystems• Define subsystems functionality• Define interfaces (Busses, protocols, APIs etc)

Methodologies• Implement collaboration tools for large, dislocated

teams• Define version control strategies• Define verification methodologies

Experience is keyNEVER reinvent the wheel

Introduction

15/04/2023

History

In the beginning…

Discrete components (resistors, transistors)Multiple transistors to form a single gateEven a simple counter required multiple boards

15/04/2023

History

The integrated circuit (1950-1960)Invented in 1950Technology available in 1960Multiple Logic gates in a single device (chip)

15/04/2023

Notes

• Great inventions are visionary

• Not necessarily an invention is feasible immediately

• Designer shall foresee technological breakthroughs

15/04/2023

History

Intel 8086 (1978)29K transistors, 3 µm10 MHzFirst “true” 16 bit microprocessorRequired several external peripherals• Interrupt controller• DMA• Timer

Backwards compatible with 8080 (1974)• First “usable” microprocessor, 4500 transistors,

10 µm• Required +12V and -5V supplies• 2 MHz

15/04/2023

Notes

• First microprocessors are 40 years old

• Latest x86 core i7 is backwards compatible with something from 36 years ago!

15/04/2023

History

Programmable Logic Array (1977)Fuse basedCan implement any combinatorial logicProgrammable at production time

15/04/2023

History

PLD (1983)More complex than PLA, reprogrammableIntroduce macrocell concept• Small PLA with a Flip Flop• External interconnect

15/04/2023

History

LCA (1985)Programmable sea of gates, 1µmRAM based with external configuration memoryUp to 7600 logic gates (484 CLBs)

15/04/2023

Notes

• Moving from OTP to reprogrammable made big difference

• Programmable logic is key when ASSPs are not available

• Same design flow as ASICs

15/04/2023

History

Dynamic RAMInvented in 1966, First useful device in 1973Drastically reduces transistor count from SRAMRequires refreshMultiplexed addressing• Reduces access time• Increases latency

Banking• Read/write multiple rows

at the same time

15/04/2023

History

DRAMs or HDDs can have very long access times

Random access to high latency devices kills performanceAccess to adjacent data requires very low latency

Cache memoryWhenever random data is requested, cache stores adjacent locations in «cache lines»Subsequent accesses to data in cache has low latencyMultiple levels of cache improve performance15/04/2023

Notes

• Thinking outside the box allows revolutionary solutions

• Tradeoffs are acceptable when benefits prevail

• New technologies limitations stimulate more innovation

15/04/2023

History

Intel 80486 (1989)1.2M transistors, 1µm50 MHz / 40 MIPSFirst to embed cache (16KB)• Reduce DRAM latency penalty

First to embed FPU32 bit data bus

15/04/2023

Notes

• 10x technology shrink in 15 years (101µm)

• Embedding of widely used external coprocessors

15/04/2023

History

Pentium (1993)3.3M transistors, 800nm66MHz, no direct connection to memoryIn 1996 introduced MMX

Distributed architecturesMicroprocessorMemory interface bridgePeripheral and bus bridgeExternal superIO

15/04/2023

Notes

• Processor doesn’t interface directly with memory anymore

• Northbridge routes processor accesses among memory and high speed busses abstracting them

• Peripherals get integrated in Southbridge

15/04/2023

History

Parallel bus topology (ISA, PCI, AGP)Separate or multiplexed address/dataLimited by signaling technology• Few MHz with TTL• Up to 150 MHz with LVCMOS

Dual and Quad data rate to further improve bandwidth (mainly on memories)• Some use of differential signaling

15/04/2023

History

Peripheral Component Interconnect

Configuration space• Automatic card detection and

configuration• Extended card information• Standardized register set

Dynamic device address mapping• No more conflicts among multiple

cards on the same bus

Introduces bursting

15/04/2023

Serial busses (PCIe, USB, HDMI, etc)Smaller number of tracesVery low voltage differential signalingClocked or self clockingMulti Gbit per lanePCIe• 1.0 – 2GBit/sec per lane• 2.0 – 4GBit/sec per lane• 3.0 – 8Gbit/sec per lane

USB• 1.0 – 12 Mbit/sec• 2.0 – 480 Mbit/sec• 3.0 – 5GBit/sec

History

15/04/2023

History

15/04/2023

PCI ExpressIntroduces layered, packetized busStar connection rather than one to manyAllows tree configuration via PCI-PCI bridgesScalable bandwidth • Pin compatible connectors from 1 to 16 lanes• Increasing bit rate at each generation

Overhead• Protocol & flow control • Encoding

– 20% on Gen1&2 (8b10b)– 1.54% on Gen3 (128/130)

Notes

• Communication between subsystems is key

• Bandwidth can be increased without brute force

• High speed, low voltage serial is faster and more energy efficient than parallel LVCMOS

15/04/2023

System on Chip

System on ChipMicroprocessor plus multiple peripherals and memory in a single chipIP blocks from multiple vendors are integrated in a single device

Typical Smartphone SoCProcessor from ARMGPU from AdrenoPeripherals from Synopsysetc

15/04/2023

Interconnect

Interconnection between IP blocksEnsure interoperabilityMaximize performance

Address system complexityMultiple mastersLocked transfersCache coherency

TestabilityMulticore debuggingSystem tracePerformance counters

15/04/2023

AXIInterconnect processors and high performance peripheralsMultilayer matrix configuration

AXI-StreamStreaming interface for packetized data flowMultiple data widths within same interconnectBackpressure support

APBInterconnect low speed peripherals

ATBAdd tracing capability to any peripheral

Interconnect

15/04/2023

Notes

• Single chip integrates all peripherals except memories

• IP blocks from different vendors

• Interconnect standardization (AMBA)

• Test and debugging challenges

15/04/2023

Today

Sample Automotive SoC

15/04/2023

Questions

15/04/2023

Thank you

System level designPart 2

Today

Sample Automotive SoC

15/04/2023

Today

Systems are not only made of Hardware

15/04/2023

Notes

• Big - Little Architecture

• Heterogeneous processors for different tasks

• Codecs implemented in software

• Application specific interfaces

15/04/2023

Quiz Time

Most common performance bottlenecks

15/04/2023

Bottlenecks

Memory LatencyDDR clock speeds exceed multiple GHzColumn access time in the order of 5nsRow access time still in the order of 50nsCan be worked around with multiple levels of Cache memory

BandwidthClock frequency is limited by technologyLarge busses are expensiveCan be worked around with distributed memory

15/04/2023

GPUs

Graphics Processing Unit Started as dedicated vertex processorsEvolved thowards SW programmable shadersNow used for massively parallel computation• OpenCL• Cuda

Massively parallelHundreds of parallel processorsMultiple chips can be teamed for increased performance

15/04/2023

Today

GPU ArchitectureEach Core has high speed memoryCores grouped in clusterseach cluster has local memoryEach cluster can access device memoryHost memory can be transferred to device memory via DMA

Different levels of memory latencyEach core in a cluster executes the same code

15/04/2023

Today

GK110 Kepler (Nvidia) 7G transistors (28nm)1.5MB on chip L2 cache 15 SMX units (64KB RAM each)• 192 single precision cores• 64 double precision cores• 32 Special function units + 32 load/store units

External DDR56x64 bit memory controllersUp to 6 GHz clock speed

15/04/2023

Notes

• Peripherals may be more complex than main processor

• Eliminating bottlenecks by architecture, not just brute force

• Transforming dedicated HW in SW programmable devices creates value

15/04/2023

Today

Soc FPGAHigh gate count FPGA+Dual core Cortex A9• Lower integration than ASSPs• Higher Flexibility than ASSPs

Direct interconnection between FPGA and Processor• Possibility to accelerate software with FPGA IP• Lower system cost implementing in software less

critical Ips• On the fly reprogramming to repurpose hardware

on demand

15/04/2023

Trends

High density FPGA+SoC14 nm trigateEmbedded 64 bit quad core Cortex A531 GHz system speeds56 GBps transceiversSupport for 2.7 TBps HMCSupport for 1.3 TBps DDR4

15/04/2023

Trends

Silicon shrink not favorable anymore

15/04/2023

Notes

• Integration of hard IP with programmable fabric

• ASIC design cost skyrocketing is favoring FPGAs

• Large library of IP cores (including open source)

• FPGA to accelerate critical algorithms

15/04/2023

Trends

Silicon feature size reaching single atom level

Quantum effects not negligible anymoreLight sources for lithography unavailable

New technologies to increase density

15/04/2023

Trends

15/04/2023

Stacked die, MCMMultiple dies in a single packageWired interconnect

3d interconnect Interposersthrough silicon vias

Trends

Hybrid memory cubeIntegrate memory + controller in a single packageOptimize memory performance (speed/power)Connect multiple concurrent processors to a single device

15/04/2023

Summary

• Chip density is increasing regardless of physical limits

• Systems are gradually being condensed to a single chip

• Chips requiring multiple technologies are manufactured with MCM or 3D processes

• System components from multiple vendors are integrated in single chips

• Software IS a system component

15/04/2023

What we learnt

Summary

• Whenever Moore’s law is hitting a wall breakthroughs keep it going

• Thinking outside the box is vital for innovation

• System design requires knowledge of leading edge technologies

• System optimization requires in depth knowledge IP block functionality at all levels

15/04/2023

Questions

15/04/2023

System Design Methodologies

Theory

Electronic System DesignMethodologies

• Implement collaboration/Knowledge management tools• Define version control strategies• Define verification methodologies

Architectural design• Break down the design in subsystems• Define subsystems functionality• Define interfaces (Busses, protocols, APIs etc)

Subsystem implementation• Unit test design• RTL coding• Simulation• Synthesis and timing closure (ASIC/FPGA)

15/04/2023

Methodologies

System design requires organizationEven small groups can have communication issuesEven a single developer can miss information

Collaboration/Knowledge management tools

Requirement and bug trackingRevision controlProject planningBuild automation

15/04/2023

Methodologies

Requirement trackingKeep track of specificationsClearly define dependenciesHelp partitioning in smaller tasks

Bug TrackingTrack issues and their solutions• Solutions to old problems can shorten new ones• Knowledge of issues can prevent repeating

mistakes

Clearly identify which changes have been adopted for a specific issue• Regression testing

15/04/2023

Methodologies – Version Control

Version ControlKeep track of modifications and their reasons• Always comment your commits• Possibly reference bug tracker

Allow multiple developers to work concurrentlyBranch/tag• Branching allows separate development

environments for each developer• Developers can commit broken code in branches• Merge branches only when code is reliable• Tag when code is stable or on milestones

15/04/2023

Methodologies - verification

VerificationTesting is crucial to ensure qualityEach IP shall include its test unitSystems shall have test benchesTest cases shall be carefully plannedCoverage shall be known

Coding tests can take more time than coding IP blockPlan testing before coding

Better specifications and clearer requirements

Quality is NOT a cost15/04/2023

Methodologies - Documentation

Always document your work!Sharing knowledge improves teamworkDocumentation adds value to your workYou can’t remember everything

IssuesSynchronization with artifact versionsCompleteness

15/04/2023

Methodologies - Documentation

DoxygenDocument your code within the codeAutomatic hierarchy documentationCan be used with most programming languages• Can be extended to any file with plugins

Can generate graphs with dot pluginMultiple outputs (PDF, Word, HTML, etc)Automatic generation always in sync with code

15/04/2023

Architectural design

Before you start…Search for existing solutions• Literature• Patents• Open source

List requirements• Define input and outputs• Clearly understand criticalities

List use cases• Define what resources are required for each

scenario

15/04/2023

Architectural design

Partition design in independent unitsSmall• Easy to maintain• Simple to understand

Reusable• Generalize a problem whenever possible• Create a library of tested, robust building blocks

Documented• Possibly use self documentation tools• Test bench with use cases

15/04/2023

Architectural design - Interfaces

Always try to use standard interfacesBlocks can be reused more easilyUnderstand implications of an interface architecture

If non standard interface is required…Define a standard (and document it) Check it against known use casesExplore interface weak points and benefits

15/04/2023

Open Source

BenefitsCollaborative designimproves quality, stability through peer reviewHuge code base for software and hardware IP

DrawbacksHeterogeneous code styles and interfacesNo warranty on quality/functionalityLimited support from community

15/04/2023

Open Source

Business modelsOpen Source libraries and interfaces• Company releases parts of code to community• Community improves code functionality and

reliability• Establish trust with customers and partners

Open Source applications and platforms• Sell support and customization services• Sell HW products• Gain visibility and business opportunities• Possibility of mixed Open/Closed source approach

15/04/2023

Questions

15/04/2023

Thank you