Microprocessor system architectures – ARMv8
description
Transcript of Microprocessor system architectures – ARMv8
Microprocessor system architectures – ARMv8
Jakub Yaghob
ARM architecture RISC Large uniform register file Load/store architecture Simple addressing modes Execution states
AArch64 x AArch32 Architecture profiles
A – application profile R – real-time profile M – microcontroller profile
Execution states – AArch64
AArch64 31 64-bit general-purpose registers
X30 – procedure link 64-bit PC, SPs, ELRs (exception link registers) 32 128-bit SIMD registers Single instruction set A64 Exception levels EL0-EL3 64-bit virtual addressing Names each system register with suffix that indicates
the lowest EL with access PSTATE (Process state)
Execution states – AArch32 AArch32
13 32-bit general purpose registers 32-bit PC, SP, LR (link register) Some registers banked for each execution mode Single ELR (return from Hyp) 32 64-bit SIMD registers A32 instruction set – fixed length encoding,
compatible with ARMv7 T32 instruction set – variable-length, compatible with
ARMv7 Thumb 32-bit virtual address CPSR (current program state register)
Supported data types, cryptographic extension
Integer B, H, W, D, Q
Floating point HP, SP, DP IEEE 754
Cryptographic extension Operates on the vector register file AES, SHA1, SHA2-256
Memory model The ARM memory model supports
Generating an exception on an unaligned memory access Restricting access by applications to specified areas of memory Translating virtual addresses provided by executing instructions
into physical addresses AArch64 – 64-bit addressing, TCR (Translation Control Register)
determines VA range, EL0+EL1 have 2 independent VA ranges each with its own TCR
AArch32 – 32-bit addressing, TCR determines VA range, OS can split VA range into 2 subranges for EL0+EL1 with separate TCR
Altering the interpretation of multi-byte data between big-endian and little-endian
Controlling the order of accesses to memory Controlling caches and address translation structures Synchronizing access to shared memory by multiple PEs
Application architecture – AArch64 31 general-purpose registers R0-R30
64-bit GP registers X0-X30 X30 procedure link
32-bit GP registers W0-W30 Encoding 1Fh for register used as ZR (zero register)
32 vector registers V0-V31 FPCR, FPSR – floating-point status and control
register SP 64-bit
WSP 32-bit Current SP
PC 64-bit
Application architecture – AArch64 – vector registers
Application architecture – AArch64 – PSTATE
Process state for EL0 Data processing flags
N – negative Z – zero C – carry V – overflow
Exception masking bits D – debug mask A – system error mask I – IRQ mask F – FIQ mask
System registers Register naming
<register_name>_Elx, x∈{0,1,2,3} General system control registers Debug registers Generic timer registers Performance monitor registers
Optional Trace registers
Optional Generic Interrupt Controller (GIC) CPU interface registers
Optional
Software control and EL0 Exception handling
Interrupts Memory system aborts Undefined instructions System calls Secure monitor or Hypervisor traps
System instructions for control flow WFI – Wait For Interrupt WFE – Wait For Event YIELD – hint Can enter low-power state
Cache management Must be enabled by EL1
Debug events BKPT – breakpoint DBG – hint to the debug system HLT – entry to Debug state
Caches and memory hierarchy
Point of Unification IC, DC see the same copy of a memory
Point of Coherency All agents that can access memory are guaranteed to see the same copy
Memory types Normal
Bulk memory operations, R/W, R/O Device
Speculative reads forbidden Additional attributes
Gathering Prevents aggregation of R/W
Reordering Preserves access order and synchronization requirements
Early write acknowledgement Write can be acknowledged other than at the end point
Shareability Non-shareable, inner shareable, outer shareable
Cacheability Non-cacheable, write-through cacheable, write-back cacheable
Alignment Instruction alignment
A64 instructions must be word-aligned Data alignment
Unaligned access to any Device memory causes an Alignment fault
Normal memory SCTLR_ELx.A – configure unaligned access behavior
Generate an Alignment fault Perform an unaligned access
Unaligned access Not guaranteed to be atomic Takes a number of additional cycles Can abort more times for memory exceptions
Endian support
Instruction endianness A64 instructions are always little-endian
Data endianness SCTLR_EL1.E0E – configures endianness for
EL0 at EL1 or higher Instructions for reverting data in registers
REV16, REV32, REV64
Synchronization and semaphores
Load-exclusive instructions LDXP, LDXR, LDXRH, LDXRB
Store-exclusive instructions STXP, STXR, STXRH, STXRB
Clear-exclusive CLREX
Should scale on MPS
Exception levels
Exception levels EL0-EL3 EL0 – unprivileged execution, applications EL1 – OS kernel EL2 – supports virtualization of non-secure operation,
hypervisor EL3 – supports switching between two security states
(secure state, non-secure state), secure monitor All implementations must include EL0 and EL1 Stack pointer register selection
SP_ELx
Exception levels
Exception mechanism
Saved Program Status Register Saves PE state on taking exceptions SPSR_ELx for exception taken to ELx When returning from an exception, PE state restored
to the state stored SPSR Exception link registers
ELR_ELx holds preferred exception return address
Exception vectors
Vector Base Address Register (VBAR) Each Elx Defines base address for the table at that ELx
System calls
SVC Supervisor call exception EL0 calls OS at EL1
HVC Hypervisor call exception For EL1 and higher
SMC Secure monitor call exception For EL1 and higher
Virtual Memory System Architecture
VMSA Provides MMU MMU translates VAs to PAs independently for
ELx and security states A64 has 48-bit VA and PA
Address translation system
VMSAv8-64 Translation Table Base Register (TTBR) Translation Control Register (TCR) Up to four levels of address lookup IA of up to 48 bits OA of up to 48 bits A translation granule size of 4K, 16K, 64K
4K translation granule
16K translation granule
64K translation granule
Translation table entries – levels 0-2
Translation table entries – level 3
Attribute fields
MMU faults
All types of MMU exceptions Alignment fault Permission fault Translation fault Address size fault Synchronous external abort on a translation table
walk Access flag fault TLB conflict abort
Translation Lookaside Buffers (TLB)
TLB Caches results from translation table walks Global pages Process-specific pages
Address Space Identifier (ASID) Implementation defined size 8 or 16 bits
Virtual Machine Identifier (VMID) Concept of locked entries
Optional for implementation Maintenance instructions
TLBI <operation>{,Xt}