Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs,...
Transcript of Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs,...
![Page 1: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/1.jpg)
Formally Verifying Many RISC-V Implementations with One Page of Code
(seriously, we did!)Steve HooverRedwood EDA
1/25/2019 1
![Page 2: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/2.jpg)
Agenda
• WARP-V
• Design Methodology
• Comparison
• Formal Verification
• Methodology
• Analysis
• Wrap-Up
21/25/2019
![Page 3: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/3.jpg)
WARP-V
Goal: To showcase the flexibility of “transaction-level
design”
IP needs to be flexible, but:
● Small conceptual variation requires extensive RTL
parameterization of:○ staging (flip flops)
○ stitching through hierarchy
○ clock gating/enabling
○ etc.
● SystemC+HLS is not ideal for CPUs, with tight
cycle-level interactions
Low-Power1-Stage
FPGA
Mid-Range7-Stage
ASIC
![Page 4: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/4.jpg)
Approach
Macro Preprocessor (M4) provides:
● parameterization (incl. staging)
● component selection
● code generation
Transaction-Level Verilog (TL-X.org)implies from context:
● staging (flip flops)
● stitching through hierarchy
● clock gating/enabling
⇒ No need to parameterize details
(or code them at all)
“Swiss Cheese” CPU
Config
ISA = ...BrPred = ......
Params
WordWidth = ...MemSize = …...
Staging
Fetch = 1Decode = 2Execute = 3...
BrPred
Dec Exe WB
TL-VerilogVerilog
ISA
![Page 5: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/5.jpg)
Developed Online (makerchip.com)
![Page 6: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/6.jpg)
WARP-V
ALU
IMem Rd DMem Rd/Wr
RF Rd
+
pend. replay PC
reg. byp.
RF Wr
ld rtn
br. target
+
Dec
PC
pend
+1
@rslt @wb@exe@rd@tgt@dec@fet@pc
DMem Rd/Wr
![Page 7: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/7.jpg)
WARP-V
ALU
IMem Rd DMem Rd/Wr
RF Rd
+
pend. replay PC
reg. byp.
RF Wr
ld rtn
br. target
+
Dec
PC
pend
+1
DMem Rd/Wr
@0 @1 @2 @3 @4 @5
![Page 8: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/8.jpg)
Logic comparison of main CPU pipeline (as apples-to-apples as possible)
• All three express RTL detail
• WARP-V in TL-Verilog is slightly smaller and much more flexible
picorv32 Rocket WARP-V
Language Verilog Chisel TL-Verilog
Construct. Language Verilog preproc. Scala M4 (plus Perl)
Core pipeline ~5 stages (unpipelined) 5 stages 1-7 stages
Lines of code ~983 ~944 ~811
Comparison
![Page 9: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/9.jpg)
Verification Methodology
● First demonstration of TL-Verilog for verification modeling.
● WARP-V brought to life in 1.5 wks. using an 11-instruction test program
w/ assembler and test program also in M4+TL-Verilog. (Jan 2018)
● Remaining verification done by Ákos Hadnagy in Google Summer of
Code 2018
● Uses open-source formal verification tools: RISCV-Formal by Clifford
Wolf
![Page 10: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/10.jpg)
Verification Modeling
ALU
IMem Rd DMem Rd/Wr
RF Rd
+
pend. replay PC
reg. byp.
RF Wr
ld rtn
br. target
+
Dec
PC
pend
+1
@rslt @wb@exe@rd@tgt@dec@fet@pc
DMem Rd/Wr
riscv-form
al
ld rtn
![Page 11: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/11.jpg)
As Promised, 1 Page
9/11/2018 11
![Page 12: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/12.jpg)
Code Size Comparison
![Page 13: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/13.jpg)
Wrap-Up
● Methodology implications
○ Single, small codebase provides logic and verification of flexible IP
○ We focused on verifying the 5-stage version, others just worked (mostly)
● Futures
○ Many-core
○ Opportunities again in Google Summer of Code 2019 (ask me)
![Page 14: Formally Verifying Many RISC-V ... - bostonarch.github.io · SystemC+HLS is not ideal for CPUs, with tight cycle-level interactions Low-Power 1-Stage FPGA Mid-Range 7-Stage ASIC.](https://reader033.fdocuments.in/reader033/viewer/2022041421/5e1f8a777dc604107a4738c1/html5/thumbnails/14.jpg)
Other Things I Love to Discuss Lately
● Why open-source hardware is poised to explode
● Cloud FPGAs and hardware-accelerated web applications
● Start-up life
● Internship/co-op opportunities