Basic ARM Modules and Systemsaccess.ee.ntu.edu.tw/course/soc2003/workshop slides/NCTU_Slides... ·...
Transcript of Basic ARM Modules and Systemsaccess.ee.ntu.edu.tw/course/soc2003/workshop slides/NCTU_Slides... ·...
Basic ARM Modules and Systems
Speaker: Kun-Bin LeeDirected by Prof. Chein-Wei Jen
Department of Electronics EngineeringNational Chiao Tung University
{kblee, cwjen}@twins.ee.nctu.edu.tw
Dec. 5, 2002
1/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
SoC Development
High Level Algorithm ModelC/C++/COSSAP/VCC/MATLAB
Hardware/Software PartitionN2C/VCC
Communication RefinementN2C/Port-C/VCC
Front End
Back EndHar
dwar
eD
evel
opm
ent
Syst
em L
evel
Des
ign
Hardware/Software CoverificationN2C/Seamless/"Q/Bridge"
Specification
Chip
Software
Developm
ent
RTOSWinCE/VxWorks
Device DriverDriveway
API EmbeddedSoftware
2/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Lab Organization (Current Status in NCTU)
Getting Start with ADSWorking with AXD Software Quality Measurement
ARM Integrator Environment
µC/OS-II
Virtual Prototyping
Profiling
Application
Digital IP Authoring RTOSµHAL Driver
Rapid Prototyping
Embedded SW Authoring
3/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Semi- HW/SW Co-verification
Spec.
SW+HW modelingARMulator/Mentor CVE
Semi- HW/SW Coverification
HW/SWCoverificationMentor CVE
FPGAARM Integrator/
CIC DBTapeout
SW modeling + HWHDL Simulator
Apps + APIs + HALs + HW modelsHW models are less accurate (cycle approximate)Verify driver and HW/SW synchronization
Host Commander instead of ISS(host commands) + (System infra.+ DUT)SW is less accurateVerifiy HW and SW synchronization (polling, interrupt, memory-maped I/O)
(Apps+ APIs +HALs) + (System infra.+ DUT)Verify driver and HW/SW synchronizationBoth HW & SW can be timing accurate
Reuse HAL & Verification ModelTwo teams: Design & Verification teams
ARM ISS
Timer
InterupterController
DMA
Memory
JPEG
Profiler
Tracer
Page table
Read(,,,);Write(,,,);IDLE(,,);...
DUV
Transactions
Signals
do Readaddr = 0xE8data= 0x27
do Writeaddr = 0xFFdata = 0x12
CLK
control
WRRDATAWDATA
Transactions to signals
BS-lh CVS: Virtual Prototyping Transaction-based BH-ls CVS
4/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Consistent Models in Different Stages
ISS
ISS +BFM
BS-lh CVS
HW/SWCoverification
Application-specific IP, e.g., JPEG encoder, MPEG4Shape Encoder
Mmeory Organization, e.g., Cache, SRAM, ROM,SDRAM
Basic System Peripherals, e.g., timer, interruptcontroller, DMA
Bus Infrastructure, e.g., Arbiter, Decoder, Bridge
Standard I/O, e.g., UART, GPIO
Os, Driver, API, HAL
Application Programs, e.g., JPEG encoder
Performance Moniter, e.g., Profiler, Stack Tracker
Protocol Moniter, e.g., Bus protocol checker
External Model, e.g., UART external model, off-chipmemory
commanderBH-ls CVS
5/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Lab Module: Software Development• Familiarize with ARM software development tools, ADS
– Project management– Configuring the settings of build targets for your project
• Writing code for ARM-based platform design• Software cost estimation
– The cost of a program includes Read Only (RO) data, Read Write (RW) data and Zero-Initialized (ZI) data
• Mixed instruction sets, ARM and Thumb interworking, is learned to balance the performance and code density of an application.
• Profiling utility can be used to estimate percentage time of each function in an application
• Memory configuration– E.g., an embedded system might use fast, 32-bit RAM for performance-
critical code, such as interrupt handlers and the stack, slower 16-bit RAM for application RW data, and ROM for normal application code
• Debug skills to be used to debug both software of processor and memory-mapped hardware design running at the target platform
6/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Getting Start with ADS
Creating a new projectfrom ARM project
stationery
Adding source files tothe project
Building the project
Debugging the projectExistingfiles/library
Creating a new (header)file using CodeWarrior's
built-in editor
Configuring the settingsof build targets
7/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Working with AXD
• Set breakpoints and watchpoints• Locate, examine and change the contents of
variables, registers and memory• Using the command line interface to automate
repetitive tasks (advanced user)
8/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Software Quality Measurement
• Memory requirement of the program• Profiling: build up a picture of the percentage of
time spent in each procedure.• Evaluate software performance prior to
implement on hardware• Writing efficient C for ARM cores
– ARM/Thumb interworking– Coding styles
9/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Application Code and Data Size
• armlink offers two options to provide the relevant information:– -info sizes (sizes of all objects)– -info totals (summary only)============================================================Image component sizes
Code RO Data RW Data ZI Data Debug 25840 3444 0 0 104344 Object Totals22680 762 0 300 9104 Library Totals
=============================================================Code RO Data RW Data ZI Data Debug
48520 4206 0 300 113448 Grand Totals=============================================================
Total RO Size(Code + RO Data) 52726 ( 51.49kB)Total RW Size(RW Data + ZI Data) 300 ( 0.29kB)Total ROM Size(Code + RO Data + RW Data) 52726 ( 51.49kB)
=============================================================
• The size of code/data in – an ELF image can be viewed using fromelf –z– a library can be viewed using armar –sizes
10/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
ARM and Thumb Code Size
The equivalent ARM assemblyIabs CMP r0,#0 ;Compare r0 to zero
RSBLT r0,r0,#0 ;If r0<0 (less than=LT) then do r0= 0-r0MOV pc,lr ;Move Link Register to PC (Return)
The equivalent Thumb assemblyCODE16 ;Directive specifying 16-bit (Thumb) instructions
labs CMP r0,#0 ;Compare r0 to zeroBGE return ;Jump to Return if greater or
;equal to zeroNEG r0,r0 ;If not, negate r0
return MOV pc,lr ;Move Link register to PC (Return)
Simple C routineif (x>=0)
return x;else
return -x;
11/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Memory Map and Size Considerations
• The linker calculates the ROM and RAM requirements for code and data as follows:– ROM: Code size + RO data +
RW data– RAM: RW Data + ZI data.
• You may wish to copy codefrom ROM into faster RAM, which will also increase the RAM requirements
• Placing the stacks in zero-wait state, 32-bit memory on-chip will significantly improve over -8 or 16- bit off-chip memory
ROM
RAM
Default memory map
12/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
ARM Profiler
• About Profiling:– Profiler samples the program counter and computes the
percentage time of each function spent.– Flat Profiling:
• If only pc-sampling info. is present. It can only display the time percentage spent in each function excluding the time in its children.
• Flat profiling accumulates limited information without altering the image
– Call graph Profiling: • If function call count info. is present. It can show the approximations
of the time spent in each function including the time in its children.• Extra code is added to the image
• Limitations:– Profiling is NOT available for code in ROM, or for scatter loaded
images.– No data is gathered for programs that are too small.
13/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Profiler Command-line Options
• The command syntax is as follows:armprof [-parent|-noparent] [-child|-nochild] [-sort options] prf_file
• Sample OutputName cum% self% desc% calls
---------------------------------------------------------------------
main 17.69% 60.06% 1
insert_sort 77.76% 17.69% 60.06% 1
strcmp 60.06% 0.00% 243432
---------------------------------------------------------------------
qs_string_compare 3.21% 0.00% 13021
shell_sort 3.46% 0.00% 14059
insert_sort 60.06% 0.00% 243432
strcmp 66.75% 66.75% 0.00% 270512
---------------------------------------------------------------------
cumulativeselfdescendantscalls
14/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
In ARM Macrocell
AMBA Data
AMBAInterface
Inst. & data
VirtualAddress
JTAG and non-AMBA signals
Inst. & data cache
CP15
MMU
ARM Core
EmbeddedICE & JTAG
WriteBuffer
PhysicalAddress AMBA
Address
15/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Cycle Types, Von Neuman Cores N-cycles Non-sequential cycle. The ARM core requests a transfer to
or from an address which is unrelated to the address used in the preceding cycle.
S-cycles Sequential cycle. The ARM core requests a transfer to or from an address which is either the same, or one word or one-half-word greater than the preceding address.
I-cycles Internal cycle. The ARM core does not require a transfer, as it is performing an internal function, and no usefulprefetching can be performed at the same time.
C-cycles Coprocessor register transfer cycle. The ARM core wishes to use the data bus to communicate with a coprocessor, but does not require any action by the memory system.
Total The sum of the S-Cycles, N-Cycles, I-Cycles and C-Cycles.
16/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Map File
• If no map file is specified:– ARMulator will use a 4GB bank of ‘ideal’ memory, i.e., no wait
states.• The map file defines regions of memory, and, for each
region:– The address range to which that region is mapped.– The data bus width (in bytes).– The access times for the memory region (in ns)
• armsd.map typically contains something like:00000000 00020000 ROM 2 R 150/100 150/10010000000 00008000 RAM 4 RW 100/65 100/65– Columns are (left to right):
start address (in hex) access type (read-only or read/write)length (in hex) read timing in ns (NON-Seq / Seq)name writing timing in ns (NON-Seq / Seq)width (1, 2, or 4 bytes)
17/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Configure for Target System
ARMulator startup Message
Cached core additional statistics
18/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Dhrystone AnalysisCached with different clock domains
TCM on ARM966E-S
19/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Lab Module: ARM Integrator Environment• ARM Integrator: A set of pre-built, well-defined, well-
verified hardware and software components• Standalone/Semishoting• Resource access
– uHAL or memory mapped device– Polling and interrupt
• Memory usage– Performance & constraint of different types of memory (SSRAM,
SDRAM, and Flash, and the cache)– Data alignment and data layout (in which type of memory)– Available data bus and memory bandwidth
• Peripherals– Detail the timers and interrupt controller
20/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Lab Module: Virtual Prototyping
• Features– Trade-off by modifying
system parameters & checking results
– Develop & test device drivers
– Test the correctness of compiler generated code
– Visualize behavior of system and peripherals
– Test the correctness of application algorithms
CPU ISS
CPU Debugger (GUI)
CPU ICE
UARTModel
Intr cntrlModel
Parallel I/OModel
TimerModel
MemoryModel Rapid
PrototypeUSB
Model
CodecModel
BLCModel
21/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Building Soft Prototype
• Requirement– Instruction Set Simulator
(ISS) with a capability to interface C models of the peripherals
– C models of the peripherals– Processor debugger
• Limitations– Limited capacity– Limited speed– Accuracy of models– synchronization
Check for C ModelInterface Support
Is InterfaceSupport
Yes
Yes
Final Software
Study ISS Features
No No Soft Prototype
Create/ModifyC Models
Write/Modify the Application Software
Compile
Run the Application
Debug
Errors?
No
ISS
22/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
In This Lab
• Procedure to add the hardware models toARMulator
• Verify the hardware models
• Example hardware models in this lab– Memory-mapped IP: timer,
MAC– Coprocessor: MAC
Write your modelin C language
Build the new modelBy creating *.dll file
Write your *.dsc file
Modify peripherals.ami
configure the ARMulator to integrate desired model(s)
Modify default.ami
23/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Lab Module: RTOS
• Concept of RTOS• Features of µC/OS-II• Model ARMulator as ARM Integrator platform
– Port µC/OS-II to ARMulator with µHAL• Port applications to µC/OS-II
– Partition original program into tasks– Necessary coding changes– Insert system calls into tasks– Create tasks and resources in main function– Driver authoring
24/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Lab Module: Digital IP Authoring
• HW/SW partition under the analysis of profiling
• Architecture exploration• Attached to a well-defined
on-chip bus• HW/SW synchronization• Robust design
– Coding style: Reuse Methodology Manual, FPGA Reuse Field Guide
– Verification coverage• FPGA proven
Power
AreaFlexibility
Test
Performance Clocking
25/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Coverage-Driven VerificationFunctional Coverage• Determines if all the
functionality was tested• Metrics are user-defined
(explicit)• Strengths
– Complete expressiveness• Cross-correlation, and multi-
cycle scenarios– Objective measure of progress
in regards to test plan– Identifies new holes by
crossing existing items• Weaknesses
– Only as good as the coverage model
– Manual effort is required to implement the metrics
Code Coverage• Determines if all the
implementation was tested • Metrics are defined by the
source code (implicit)– Line/block, events, toggle…
• Strengths– Reveals untested logic– Reveals holes in functional test
plan– No manual effort is required to
implement the metrics• Weakness
– No cross correlations– No multi-cycle scenarios– Manual effort is required to
interpret results
26/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
In the Digital IP Authoring Lab
• IP authoring– Algorithm and architecture exploration in IP core design– RTL coding guidelines for IP authoring– AMBA-compliant IP– Hardware/software coordination
• Verification– Identifies bugs or non-synthesizable constructs before emulation.– Identify the areas of a design that have yet to be fully simulated.– Identifies the smallest set of tests that will meet verification goals– AMBA AHB Property Checking
27/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Lab Module: Rapid Prototyping
• Rapid prototyping– Already ready hardware resource in LM– Procedure to configure the hardware design to LM and
software to CM
28/28
Institute of Electronics,National C
hiao Tung U
niversityBasic ARM
Modules and System
s
Summary
• Assignments are finished– Code development– Debugging and evaluation– Core peripherals– Real-time OS– On-chip bus
• Future work– Seamlessly combine all lab modules (includes NTU
and NCKU)– Organize the material such that the lab modules can
be taught in a great many ways– Toward platform-based SoC design lab