Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design...

13
Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan ( RIVAI ) Xing Wang (SYNOPSYS)

Transcript of Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design...

Page 1: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu

Cissy Yuan ( RIVAI ) Xing Wang (SYNOPSYS)

Page 2: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 2

AIoT – Fast Growing, Fragmented Market

o AIo Smart Homeo Smart Cityo Securityo Autonomous Drivingo Socialo Educationo Manufacture

Customized architectures achieve much better compute efficiency

$212B 2019 $1.6T 2025

[email protected]

Page 3: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 3

Fast Evolution of AI Calls for SW HW Co-design

SW HW Co-design can bring best cost-performance

Memory CentricCustomized ASICComputing close to memory3-10x faster

Computing Centric TPUCustomized ASIC for Tensor Flow15-30x faster than GPGPU, GPCPU

Save wasted computing/memorySW HW Co-design3-7x energy efficiency

[email protected]

Page 4: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 4

SW HW Co-design on Pygmy, Heterogenous Multicore AI SoCSh

ared

Mem

ory

LPDDR DRAM Controller + Scatter-Gather DMA

IO (USB3, SDIO, SPIO, I2S, I2C, UART, GPIO)

AXI L

ite /

Test

I/FC

ache

TC

MC

ache

TC

M

MMUAI Engine

AI Engine

AI Engine

AI Engine

RV64GCV CoresC

ache

TC

MC

ache

TC

MPLL

JTAG DFT

Vector AI engines have customized RISC-V ISA64bit multicores boot full Linux with SMP

FastInterrupt

RV32

MBIST

Compile application to customized ISA.AI applications directly program vector engines.

Pygmy

OS (Linux)

NN Models

Tensor Flow Lite, Glow

Frontend Compiler

Backend Compiler(LLVM, GCC)

Manually Mapping

RISC-V extended ISA enabled full stack SW HW [email protected]

Page 5: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 5

Co-design Requires Challenging Full Stack Simulation & Debug

Zebu is a powerful weapon to overcome these challenges

Enormous flow effort at functional, performance, power simulation & debug

vLong RuntimeBoot Linux kernel at block-level takes 24 hours, 20 times slower at chip level.Co-Simulation with precise CPU behavioral model is 1.5 times slower.Hours to days manually port ASIC RTL to FPGA.2-6 hours just to generate bit file.

vHard to debugDifficult to dump out at useful time period.

Hard to reproduce bugs.Visibility is very low.

[email protected]

Page 6: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 6

Differentiated ZeBu Technologies drive emulation efficiencyZeBu Supports All High Value Emulation Use Cases

ZeBu Server 4Unified, Parallel and Incremental Compile

Hybrid / Virtual Host

Virtual Integrations

Transactors and Models Speed Adaptors

Streaming

Replay

Triggers

Performance

ZeBu Technologies

ZeBu Use Cases OS Boot and Driver/FW for

IP and systemsFull Chip RTL Verification

System Validation using ICE or Virtual

IntegrationGate-Level Emulation

Power Management Validation (UPF)

Software-driven Power Analysis

Performance Validation

Simulation Acceleration

Page 7: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 7

AI Engine

ORV RISCV Core

Cac

he

DD

R c

ontro

ller

Perip

hera

ls

LPD

DR

4

AXI4

Lite

/ Te

st I/

F

AXI4

Lite

Pygmy

ZeBu Server

ZeBu Runtime Server

AI Engine

AI Engine

AI EngineCac

he

Cac

heC

ache

UARTXTOR

UART Terminal

USB 3.0 DevXTOR

SDIOXTOR

SPI Mst./Slv.XTOR

I2CXTOR

I2SXTOR

USB 3.0 Dev TB/TC

SDIO TB/TC

SPI TB/TC

I2C TB/TC

I2S TB/TC

RIS

C-V

Deb

ugge

r

ZeBu Smart Z-ICE

ZeBu Pygmy Emulation Environment

AXI4MasterXTOR

zRciMain

ProcessUser shared objects

User C program

Page 8: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 8

Pygmy RISC-V AI Chip on ZeBu Demonstration

Page 9: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 9

Linux Boot test (billions of cycles)

Pygmy Linux Boot Debug Case Study

Stimuli

Frame 0

HWState

DPI (trackers/monitors) streaming

Multiple Level critical signal streaming

Stimuli

Frame N

HWState

Stimuli

Frame N-1

HWState

Verdi DebugReplay & Waveform Capture

Debug Window

Stimuli

Frame 1

HWState

Stimuli

Frame N

HWState

Stimuli

Frame N-1

HWState

Test Failure

CPU Hang

Linux Boot test (billions of cycles)

Linux Boot test (billions of cycles)

CPU Hang

Linux Boot test (billions of cycles)

Triggered

Full Visibility Signals

Page 10: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 10

ZeBu Runtime Server

10

Pygmy Performance Debug Case Study

10

With ZEMI3, performance issues can be debugged at higher level more efficiently

ZEMI-3 Hardware

Part

ZeBu

SystemVerilog DPICPU CoreL2B Module PMU Module

AI Engine

AI Engine

AI Engine

AI Engine

ORV64 RISCV Cores

Cac

heC

ache

DD

R

cont

rolle

rPe

riphe

rals

AXI4

Lite

Bus

Pygmy

Cac

heC

ache

ZEMI-3 Hardware

Part

Time

Num

of I

nstru

ctio

ns

Page 11: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 11

Analyze Power using RTL: Hours instead of WeeksRTL average power for millions of cycles expands beyond legacy emulation flows for 10,000s of cycles

High Performance SW Waveform

Reconstruction

Essential Signals N x Grid Jobs

Dump Waveform for all power essential signals (Registers)

Waveform reconstruction for all signals and generate SAIF Average power for millions of cycles

RTL Power Estimation

Average Power

1 Hr x 150 Grid Jobs*20khz (18min)* RTL Power Estimation for Average Power

*1 Manhattan frame, 2B gates design on ZS4

RTL ZTDB SAIF

Flops + hierarchy boundaries

Page 12: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

© 2019 Synopsys, Inc. 12

Ø SW HW co-design brings best cost-performance in fast growing AIoT market.

ØZeBu delivered highest efficiency for full stack Functionality Verification, Performance Debug and Power Analysis.

ØRISC-V extended ISA enabled full stack SW HW co-design flow.

Ø软硬件联合开发能获得最好的性价比, Zebu助力实现最快的全栈迭代,这一切得益于RISC-V

Conclusions

磨刀不误砍柴工

Page 13: Full Stack Software Hardware Co-design - GitHub Pages · Full Stack Software Hardware Co-design Pygmy RISC-V AI SoC on Zebu Cissy Yuan( RIVAI ) Xing Wang ... o Smart City o Security

Thank YouWe are hiring J [email protected]