CEC 320 and 322 Microprocessor Systems Class and Lab

94
October 6, 2019 Sam Siewert CEC 320 and 322 Microprocessor Systems Class and Lab Lecture 7 - ARM ISA Overview

Transcript of CEC 320 and 322 Microprocessor Systems Class and Lab

Page 1: CEC 320 and 322 Microprocessor Systems Class and Lab

October 6, 2019 Sam Siewert

CEC 320 and 322Microprocessor Systems

Class and Lab

Lecture 7 - ARM ISA Overview

Page 2: CEC 320 and 322 Microprocessor Systems Class and Lab

Survey Says …Below 7.0 is a ProblemAssignments are a Chore - I will try to simplifyCome to Office hours for help!

Sam Siewert 2

Page 3: CEC 320 and 322 Microprocessor Systems Class and Lab

Lab #5 DemoUse Potentiometer to detect analog level threshold crossing

Analog input > or < Level– COMP_REF_1_65V from comp.h– ComparatorValueGet()

Count Crossings

Update Display on OLED

Interrupts Service Routines– Pre-installed statically in

startup_ewarm.c– Installed in main dynamically -

comp.h, ComparatorIntRegister()

Check proper wiring of +3.3V, GND, and PC

Power to GND short on bread-board is the main risk, so check!

Sam Siewert 3

Example Menu and Commands

Board Configuration and OLED Display

Page 4: CEC 320 and 322 Microprocessor Systems Class and Lab

44

Chapter 2 : Instruction sets2a Preliminaries

Video 2.1.1 Computer architecture taxonomy.2.1.2 Assembly language.

2b ARM Processor2c TI C64x & C55x DSP2d Intel x86 / AMD64

9-Oct-19 /erau/cec320/s19/btd

Page 5: CEC 320 and 322 Microprocessor Systems Class and Lab

55

2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware

9-Oct-19 /erau/cec320/s19/btd

Page 6: CEC 320 and 322 Microprocessor Systems Class and Lab

66

ARM Technology Overview• ARM: “The Architecture For The Digital

World”• ARM is a physical hardware design and

intellectual property company• ARM licenses its cores out and other

companies make processors based on its cores

• ARM also provides toolchain and debugging tools for its cores

9-Oct-19 /erau/cec320/s19/btd

Page 7: CEC 320 and 322 Microprocessor Systems Class and Lab

77

ARM History• Acorn Computer Group developed world’s first

RISC processor in 1985• Roger Wilson and Steve Furber were the principle

developers• ARM (Advanced RISC Machines) was a spin out

from Acorn in 1990 with goal of defining a new microprocessor standard

9-Oct-19 /erau/cec320/s19/btd

Page 8: CEC 320 and 322 Microprocessor Systems Class and Lab

88

Classic ARM Variations• ARM7xxx

3 stage pipeline Integer processor MMU support for WinCE, Linux and Symbian Used in entry level mobiles, mp3 players, pagers

• ARM9xxx 5 stage pipeline Separate data and instruction cache Higher end mobile and communication devices Telematic and infotainment systems ARM and Thumb instruction set

• ARM11xxx 7 stage pipeline Trustzone security related extensions Reduced power consumption Speed improvements More DSP and SIMD extensions Used in PDA, smartphones, industrial controllers, mobile gaming

9-Oct-19 /erau/cec320/s19/btd

Page 9: CEC 320 and 322 Microprocessor Systems Class and Lab

99

ARM Processor Family• Differences between cores

Processor modes Pipeline Architecture Memory protection unit Memory management unit Cache Hardware accelerated Java … and others

9-Oct-19 /erau/cec320/s19/btd

Page 10: CEC 320 and 322 Microprocessor Systems Class and Lab

1010

ARM Processor Family• Family

IP processor specifications available from ARM

Allows for backwards compatibility and code re-use

Another common family is x86

9-Oct-19 /erau/cec320/s19/btd

Page 11: CEC 320 and 322 Microprocessor Systems Class and Lab

x86 and ARM SoCKey Distinctions between and MCU and an SoC– Both are Single Chip Solutions– SoC includes more processing, memory, and I/O

Multi-Core CPUMemory Controller (Local Bus)

– On-chip Memory (E.g. SRAM)– Off-chip Memory Expansion (E.g. DRAM)– On-chip and Off-chip Persistent Memory (Nand, NOR Flash)

I/O Bus– Expansion I/O Bus (PCIe)– On-chip I/O Bus

Sam Siewert 11

x86 PC System Architecture

for Memory and I/O Bus Interfaces to Peripherals

Intel Altera Cyclone V HPS, Cyclone SoCDual Core ARM Cortex-A9

NVIDIA Tegra K1Quad Cortex-A15

Page 12: CEC 320 and 322 Microprocessor Systems Class and Lab

Tegra K1 SoC Detailed Block Diagram

Sam Siewert 12

Page 13: CEC 320 and 322 Microprocessor Systems Class and Lab

1313

Comparative Volumes• Approx 12 Billion shipped

in 2013• Approx 50 Billion shipped

prior to 2014• Cortex-A

Most smart phones & tablets

• Cortex-R Real-time & safety critical Expensive

• Cortex-M Embedded & inexpensive

9-Oct-19 /erau/cec320/s19/btd

http://www.anandtech.com/show/7909/arm-partners-ship-50-billion-chips-since-1991-where-did-they-go

Page 14: CEC 320 and 322 Microprocessor Systems Class and Lab

1414

Cortex-M Instructions• Each model

adds instructions, but never removes

• M4F adds floating point

9-Oct-19 /erau/cec320/s19/btd

Page 15: CEC 320 and 322 Microprocessor Systems Class and Lab

1515

ARM _ISA_ evolution• More

enhancements move further from “RISC”

• Enhancements improve performance on specific tasks Java Security Signal

Processing Encryption

• Enhancements made reasonable by increasing transistor counts

9-Oct-19 /erau/cec320/s19/btd

Page 16: CEC 320 and 322 Microprocessor Systems Class and Lab

1616

Modern ARM Variations• Three versions – roughly by time & size• We will FOCUS on the Cortex series devices

9-Oct-19 /erau/cec320/s19/btd

Page 17: CEC 320 and 322 Microprocessor Systems Class and Lab

1717

Cortex-A IP implementations

• The designations above are IP cores available from ARM for license

• The ARM ISA has a set of IP implementations available depending upon design requirements

• This table ONLY lists Cortex-A

9-Oct-19 /erau/cec320/s19/btd

Page 18: CEC 320 and 322 Microprocessor Systems Class and Lab

1818

ARM Neoverse

• Future ARM plans for server applications• Not the focus of CEC 320

9-Oct-19 /erau/cec320/s19/btd

Page 19: CEC 320 and 322 Microprocessor Systems Class and Lab

1919

ARM Design Philosophy• ARM core uses RISC architecture

Reduced instruction set Load store architecture Large number of general purpose registers Parallel executions with pipelines

• But some differences from RISC Enhanced instructions for

Thumb mode DSP instructions Conditional execution instruction 32 bit barrel shifter

9-Oct-19 /erau/cec320/s19/btd

Page 20: CEC 320 and 322 Microprocessor Systems Class and Lab

2020

ARM Viewing• How to Choose your ARM Cortex-M

Processor https://youtu.be/qvrmOXtOpvw Good first examination

9-Oct-19 /erau/cec320/s19/btd

Page 21: CEC 320 and 322 Microprocessor Systems Class and Lab

2121

2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware

9-Oct-19 /erau/cec320/s19/btd

Page 22: CEC 320 and 322 Microprocessor Systems Class and Lab

22229-Oct-19 /erau/cec320/s19/btd

ARM data types• Word is 32 bits long.

Cortex-M4F specific

• Word can be divided into four 8-bit bytes.• ARM address space is 32 bits long.• Addressability is a single byte

Instructions are fetched on 32-bit (ARM) or 16-bit (Thumb) boundaries

• ARM has 16 registers specified in instructions operands

Page 23: CEC 320 and 322 Microprocessor Systems Class and Lab

2323

Registers• Registers R0 thru R12 are general

purpose registers• R13 is used as stack pointer (sp)• R14 is used as link register (lr)• R15 is used a program counter (pc)• CPSR – Current program status register• SPSR – Stored program status register

A copy of CPSR for previous mode When exception occurs, ARM copies CPSR of

current mode to the related SPSR All privileged modes but System mode

have individual SPSRs

9-Oct-19 /erau/cec320/s19/btd

Page 24: CEC 320 and 322 Microprocessor Systems Class and Lab

2424

Cortex-M Programmers’ Model• R15-R0

Sixteen “general-purpose” registers• Special functions

R15 is the Program Counter (PC) If R15 is the destination operand, some instructions

will exhibit special behavior for mode changes R14 is the Link Register (LR)

For subroutine calls and interrupts/exceptions, the return address is stored in LR. It must be saved before calls are made in the subroutine.

R13 is used as the Stack Pointer (SP)

9-Oct-19 /erau/cec320/s19/btd

Page 25: CEC 320 and 322 Microprocessor Systems Class and Lab

2525

Programmers’ Model (cont)31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

N Z C V reserved I F T mode

9-Oct-19 /erau/cec320/s19/btd

• Current Process Status Register (CPSR) Condition code flags (N, Z, C, V) Interrupt disable bits (I, F) Thumb mode enable (T)

Never change directly! Operating mode select Reserved bits

Do not alter the state of these bits for compatibility with future ARM products

I, F and mode cannot be changed in user mode!

Page 26: CEC 320 and 322 Microprocessor Systems Class and Lab

2626

Program Status Register• Program status register (PSR)

CPSR (Current PSR) is used to control and store CPU states

CPSR is divided in four 8 bit fields Flags Status Extension Control

9-Oct-19 /erau/cec320/s19/btd

Page 27: CEC 320 and 322 Microprocessor Systems Class and Lab

2727

Program Status Register• Program status register flags

N:1 – Negative result Z:1 – Result is zero C:1 – Carry in addition operation C:0 – Borrow in subtraction operation V:1 – Overflow or underflow

9-Oct-19 /erau/cec320/s19/btd

Page 28: CEC 320 and 322 Microprocessor Systems Class and Lab

2828

Program Status Register• Program status register controls

I:1 – IRQ interrupts disabled F:1 – FIQ interrupts disabled T:0 – ARM Mode T:1 – Thumb Mode

9-Oct-19 /erau/cec320/s19/btd

Page 29: CEC 320 and 322 Microprocessor Systems Class and Lab

2929

Program Status Register• Program status register control modes

0b10000 – User mode 0b10001 – FIQ mode 0b10010 – IRQ mode 0b10011 – Supervisor mode 0b10111 – Abort mode 0b11011 – Undefined mode 0b11111 – System mode

9-Oct-19 /erau/cec320/s19/btd

Page 30: CEC 320 and 322 Microprocessor Systems Class and Lab

3030

Programmers’ Model (cont)• Suspended Process Status Register (SPSR)

SPSR is only present when the CPU is operating in one of the exception modes Each exception mode has its own SPSR, since

exception handlers may cause other exceptions. SPSR is a copy of the CPSR immediately

before the exception mode was entered. When returning from the exception, the value in

SPSR is used to restore the CPSR to the proper state for the process that was interrupted.

9-Oct-19 /erau/cec320/s19/btd

Page 31: CEC 320 and 322 Microprocessor Systems Class and Lab

3131

Operating Modes• User

Normal program execution mode• System

For running operating system tasks at user privilege level• Supervisor

Protected mode for operating system• Abort

Used to implement process and/or memory protection Two classes of aborts – data abort, prefetch abort

• Undefined Supports software emulation of unsupported instructions and

unimplemented hardware coprocessors• FIQ

Fast interrupt handling• IRQ

General purpose interrupt handling9-Oct-19 /erau/cec320/s19/btd

Page 32: CEC 320 and 322 Microprocessor Systems Class and Lab

3232

Banked Registers• Cortex – R & Cortex-A specific• Of total 37 registers only 18 are active in a given

register mode

9-Oct-19 /erau/cec320/s19/btd

Page 33: CEC 320 and 322 Microprocessor Systems Class and Lab

3333

ARM Banked Register Viewing• ARM Architecture Fundamentals

https://youtu.be/7LqPJGnBPMM Overall – tiring, but some good content 27:00 – 28:20 <- good discussion of

Cortex-M registers 30:15 - 30:45 <- good discussion of

Cortex-M CPSR Only these intervals valid for quizzes/exams At least one error identified @ 32:40ish

9-Oct-19 /erau/cec320/s19/btd

Page 34: CEC 320 and 322 Microprocessor Systems Class and Lab

3434

2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware

9-Oct-19 /erau/cec320/s19/btd

Page 35: CEC 320 and 322 Microprocessor Systems Class and Lab

3535

ARM Instruction Set

• ARM and Thumb-2 Quick Reference http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/index.html

Lots of good information on infocenter

• Writing ARM Assembly Language http://www.keil.com/support/man/docs/armasm/armasm_babcfejg.htm http://infocenter.arm.com/help/topic/com.arm.doc.dui0473f/Babcfejg.html

More information on assembler application

9-Oct-19 /erau/cec320/s19/btd

Page 36: CEC 320 and 322 Microprocessor Systems Class and Lab

3636

UAL – Unified Assembly Language• Unified Assembler Language (UAL)

common syntax for ARM and Thumb instructions supersedes earlier versions of both the ARM and

Thumb assembler languages. Code written using UAL can be assembled for

ARM or Thumb for any ARM processor. By default, the assembler expects source code to be

written in UAL. For many assembler instructions this means there

are multiple machine language representations One in 32-bits (ARM) Another in 16-bits (Thumb)

9-Oct-19 /erau/cec320/s19/btd

Page 37: CEC 320 and 322 Microprocessor Systems Class and Lab

37379-Oct-19 /erau/cec320/s19/btd

ARM assembly language• Not all of the instructions• Most common

Instructions have both ARM and Thumb versions

• All of these can be called upon (3) arguments, where (2) must be registers and the last can be a register, a shifted register, or an immediate

Page 38: CEC 320 and 322 Microprocessor Systems Class and Lab

3838

ARM assembly language• Comparison instructions

create status register flags (a.k.a. condition codes) Instructions can make

decisions based upon these values

• Note pseudo-operations which are not directly translated but have an alternate expression

• Many more exist –independent study

9-Oct-19 /erau/cec320/s19/btd

Page 39: CEC 320 and 322 Microprocessor Systems Class and Lab

39399-Oct-19 /erau/cec320/s19/btd

ARM data instructions• Fairly standard assembly language:

LDR r0,[r8] ; a comment

label ADD r4,r0,r1

• Basic format:ADD r0,r1,r2

Computes r1+r2, stores in r0.• Immediate operand:

ADD r0,r1,#2

Computes r1+2, stores in r0.

Page 40: CEC 320 and 322 Microprocessor Systems Class and Lab

409-Oct-19•/erau/cec3

ARM data instructions• ADD, ADC : add (w.

carry)• SUB, SBC : subtract

(w. carry)• RSB, RSC : reverse

subtract (w. carry)• MUL, MLA : multiply

(and accumulate)

• AND, ORR, EOR• BIC : bit clear• LSL, LSR : logical

shift left/right• ASL, ASR : arithmetic

shift left/right• ROR : rotate right• RRX : rotate right

extended with C

Page 41: CEC 320 and 322 Microprocessor Systems Class and Lab

41419-Oct-19 /erau/cec320/s19/btd

Data operation varieties• Logical shift:

fills with zeroes.• Arithmetic shift:

fills with ones.• RRX performs 33-bit rotate, including C bit

from CPSR above sign bit.

Page 42: CEC 320 and 322 Microprocessor Systems Class and Lab

42429-Oct-19 /erau/cec320/s19/btd

ARM comparison instructions• CMP : compare• CMN : negated compare• TST : bit-wise test• TEQ : bit-wise negated test• These instructions set only the NZCV bits

of CPSR.

Page 43: CEC 320 and 322 Microprocessor Systems Class and Lab

43439-Oct-19 /erau/cec320/s19/btd

ARM move instructions• MOV, MVN : move (negated)

MOV r0, r1 ; sets r0 to r1

Page 44: CEC 320 and 322 Microprocessor Systems Class and Lab

44449-Oct-19 /erau/cec320/s19/btd

ARM load/store instructions• LDR, LDRH, LDRB : load (half-word,

byte)• STR, STRH, STRB : store (half-word, byte)• Addressing modes:

register indirect : LDR r0,[r1] with second register : LDR r0,[r1,-r2] with constant : LDR r0,[r1,#4]

Page 45: CEC 320 and 322 Microprocessor Systems Class and Lab

45459-Oct-19 /erau/cec320/s19/btd

Additional addressing modes• Base-plus-offset addressing:

LDR r0,[r1,#16]

Loads from location r1+16• Auto-indexing increments base register:

LDR r0,[r1,#16]!

• Post-indexing fetches, then does offset:LDR r0,[r1],#16

Loads r0 from r1, then adds 16 to r1.

Page 46: CEC 320 and 322 Microprocessor Systems Class and Lab

469-Oct-19 /erau/cec320/s19/btd

From Essentials of Computer Architecture by Douglas E. Comer. ISBN 0131491792. © 2005 Pearson Education, Inc. All rights reserved.

Page 47: CEC 320 and 322 Microprocessor Systems Class and Lab

4747

Addressing Modes

• Illustrates why many (x86 has 19) addressing modes are not necessary

• ARM has very flexible addressing modes, designed for limited hardware costs

9-Oct-19 /erau/cec320/s19/btd

Page 48: CEC 320 and 322 Microprocessor Systems Class and Lab

48489-Oct-19 /erau/cec320/s19/btd

ARM ADR pseudo-op• Cannot refer to an address directly in an

instruction.• Generate value by performing arithmetic on

PC.• ADR pseudo-op generates instruction

required to calculate address:ADR r1,FOO

Page 49: CEC 320 and 322 Microprocessor Systems Class and Lab

49499-Oct-19 /erau/cec320/s19/btd

Example: C assignments• C:

x = (a + b) - c;

• Assembler:ADR r4,a ; get address for a

LDR r0,[r4] ; get value of a

ADR r4,b ; get address for b, reusing r4

LDR r1,[r4] ; get value of b

ADD r3,r0,r1 ; compute a+b

ADR r4,c ; get address for c

LDR r2,[r4] ; get value of c

SUB r3,r3,r2 ; complete computation of x

ADR r4,x ; get address for x

STR r3,[r4] ; store value of x

Page 50: CEC 320 and 322 Microprocessor Systems Class and Lab

50509-Oct-19 /erau/cec320/s19/btd

Example: if statement• C:

if (a > b)

{

x = 5;

y = c + d;

}

else

x = c - d;

Page 51: CEC 320 and 322 Microprocessor Systems Class and Lab

51519-Oct-19 /erau/cec320/s19/btd

If statement, cont’d.• Assembler:; compute and test condition

ADR r4,a ; get address for a

LDR r0,[r4] ; get value of a

ADR r4,b ; get address for b

LDR r1,[r4] ; get value for b

CMP r0,r1 ; calculate NZCV for (a-b)

BLE fblock ; if (a-b)<=0, branch to false block

; true block

MOV r0,#5 ; generate value for x

ADR r4, x ; get address for x

STR r0,[r4] ; store x

ADR r4,c ; get address for c

LDR r0,[r4] ; get value of c

ADR r4,d ; get address for d

LDR r1,[r4] ; get value of d

: :

Page 52: CEC 320 and 322 Microprocessor Systems Class and Lab

52529-Oct-19 /erau/cec320/s19/btd

If statement, cont’d.ADD r0,r0,r1 ; compute y

ADR r4,y ; get address for y

STR r0,[r4] ; store y

B after ; branch around false block

; false block

fblock ADR r4,c ; get address for c

LDR r0,[r4] ; get value of c

ADR r4,d ; get address for d

LDR r1,[r4] ; get value for d

SUB r0,r0,r1 ; compute a-b

ADR r4,x ; get address for x

STR r0,[r4] ; store value of x

after ...

Page 53: CEC 320 and 322 Microprocessor Systems Class and Lab

5353

2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware

9-Oct-19 /erau/cec320/s19/btd

Page 54: CEC 320 and 322 Microprocessor Systems Class and Lab

54

ARM Instruction Encoding31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Multiply (accumulate) cond 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm

Multiply (accumulate) long cond 0 0 0 0 1 U A S Rd_MSW Rd_LSW Rn 1 0 0 1 Rm

Branch and exchange cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn

Single data swap cond 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm

Halfword data transfer, register offset cond 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 0 1 1 Rm

Halfword data transfer, immediate offset cond 0 0 0 P U 1 W L Rn Rd offset 1 0 1 1 offset

Signed data transfer (byte/halfword) cond 0 0 0 P U B W L Rn Rd addr_mode 1 1 H 1 addr_mode

Data processing and PSR transfer cond 0 0 I opcode S Rn Rd operand2

Load/store register/unsigned byte cond 0 1 I P U B W L Rn Rd addr_mode

Undefined cond 0 1 1 1

Block data transfer cond 1 0 0 P U 0 W L Rn register list

Branch cond 1 0 1 L offset

Coprocessor data transfer cond 1 1 0 P U N W L Rn CRd CP# offset

Coprocessor data operation cond 1 1 1 0 CP opcode CRn CRd CP# CP 0 CRm

Coprocessor register transfer cond 1 1 1 0 CP opc L CRn Rd CP# CP 1 CRm

Software interrupt cond 1 1 1 1 ignored by processor

9-Oct-19 /erau/cec320/s19/btd

Page 55: CEC 320 and 322 Microprocessor Systems Class and Lab

5555

ARM Instruction Encoding

9-Oct-19 /erau/cec320/s19/btd

Page 56: CEC 320 and 322 Microprocessor Systems Class and Lab

5656

Data processing instruction encodings

• Many ALU operations utilize this format

• Has multiple options for 2nd

operand Register

Shifted or not

Immediate Rotated or

not

cond 0 0 operand 2# opcode S Rn Rd31 28 2726 25 24 21 20 19 1615 12 11 0

destination registerfirst operand registerset condition codesarithmetic/logic function

8-bit immediate125 11 8 7 0

#rot

Rm11 7 6 5 4 3 0

#shift

Rm

025

11 8 7 6 5 4 3 0

Rs

Sh 0

10 Sh

immediate alignment

immediate shift lengthshift type

second operand register

register shift length

9-Oct-19 /erau/cec320/s19/btd

Page 57: CEC 320 and 322 Microprocessor Systems Class and Lab

5757

ARM data processing opcodes

Opco de[2 4 :2 1 ]

Mnemo ni c Meani ng Effect

0000 AND Logical bit-wise AND Rd := Rn AND Op20001 EOR Logical bit-wise exclusive OR Rd := Rn EOR Op20010 SUB Subtract Rd := Rn - Op20011 RSB Reverse subtract Rd := Op2 - Rn0100 ADD Add Rd := Rn + Op20101 ADC Add with carry Rd := Rn + Op2 + C0110 SBC Subtract with carry Rd := Rn - Op2 + C - 10111 RSC Reverse subtract with carry Rd := Op2 - Rn + C - 11000 TST Test Scc on Rn AND Op21001 TEQ Test equivalence Scc on Rn EOR Op21010 CMP Compare Scc on Rn - Op21011 CMN Compare negated Scc on Rn + Op21100 ORR Logical bit-wise OR Rd := Rn OR Op21101 MOV Move Rd := Op21110 BIC Bit clear Rd := Rn AND NOT Op21111 MVN Move negated Rd := NOT Op2

9-Oct-19 /erau/cec320/s19/btd

Page 58: CEC 320 and 322 Microprocessor Systems Class and Lab

5858

Conditional Execution• Most instruction sets only allow branches to be executed

conditionally.• However by reusing the condition evaluation hardware,

ARM effectively increases number of instructions. All instructions contain a condition field which determines

whether the CPU will execute them. Non-executed instructions soak up 1 cycle.

Still have to complete cycle so as to allow fetching and decoding of following instructions.

• This removes the need for many branches, which stall the pipeline (3 cycles to refill). Allows very dense in-line code, without branches. The Time penalty of not executing several conditional

instructions is frequently less than overhead of the branch or subroutine call that would otherwise be needed.

9-Oct-19 /erau/cec320/s19/btd

Page 59: CEC 320 and 322 Microprocessor Systems Class and Lab

5959

ARM Condition CodesOpcode[31:28]

Mnemonicextension Meaning Condition flag state

0000 EQ Equal Z==1

0001 NE Not equal Z==0

0010 CS/HS Carry set / unsigned higher or same C==1

0011 CC/LO Carry clear / unsigned lower C==0

0100 MI Minus / negative N==1

0101 PL Plus / positive or zero N==0

0110 VS Overflow V==1

0111 VC No overflow V==0

1000 HI Unsigned higher (C==1) AND (Z==0)

1001 LS Unsigned lower or same (C==0) OR (Z==1)

1010 GE Signed greater than or equal N == V

1011 LT Signed less than N != V

1100 GT Signed greater than (Z==0) AND (N==V)

1101 LE Signed less than or equal (Z==1) OR (N!=V)

1110 AL Always (unconditional) Not applicable

1111 (NV) Never Obsolete, ARM7TDMI unpredictable

9-Oct-19 /erau/cec320/s19/btd

Page 60: CEC 320 and 322 Microprocessor Systems Class and Lab

6060

Thumb Instruction Set

9-Oct-19 /erau/cec320/s19/btd

Page 61: CEC 320 and 322 Microprocessor Systems Class and Lab

6161

Instruction Set Advantages• ARM

All instructions are 32 bits long. Most instructions are executed in one single cycle. Every instructions can be conditionally executed. A load/store architecture

Data processing instructions act only on registers Three operand format Combined ALU and shifter for high speed bit manipulation

Specific memory access instructions with powerful auto-indexing addressing modes 32 bit ,16 bit and 8 bit data types Flexible multiple register load and store instructions

• Thumb All instructions are exactly 16 bits long to improve code density over other 32-bit

architectures The Thumb architecture still uses a 32-bit core, with:

32-bit address space 32-bit registers 32-bit shifter and ALU 32-bit memory transfer

Gives.... Long branch range Powerful arithmetic operations Large address space

9-Oct-19 /erau/cec320/s19/btd

Presenter
Presentation Notes
The ARM7TDMI is a member of the Advanced RISC Machines (ARM) family of general purpose 32-bit microprocessors, which offer high performance for very low power consumption and price.
Page 62: CEC 320 and 322 Microprocessor Systems Class and Lab

6262

ARM vs. Thumb size• Generally, routines in THUMB code are

between 65 and 70% the size of the equivalent ARM code.

65% 70% 75%60%% of ARM code size

9-Oct-19 /erau/cec320/s19/btd

Page 63: CEC 320 and 322 Microprocessor Systems Class and Lab

6363

Code performances vs Memory width

This figure shows performance in Dhrystone 2.1 MIPS of an ARM7TDMI with 8, 16 and 32-bit wide memory systems. From 32-bit wide memory, ARM code is executed at one instruction per cycle. However, in narrower memory systems, From 16-bit memory, 2 cycles are required while from 8-bit memory the processor generates 4 wait cycles. The Thumb version however can still execute at one instruction per cycle from 16-bit memory, or 2 cycles from 8-bit memory. It therefore has better performance with narrow memory.

9-Oct-19 /erau/cec320/s19/btd

Presenter
Presentation Notes
This figure shows performance in Dhrystone 2.1 MIPS of an ARM7TDMI with 8, 16 and 32-bit wide memory systems. From 32-bit wide memory, ARM code is executed at one instruction per cycle. However, in narrower memory systems, From 16-bit memory, 2 cycles are required while from 8-bit memory the processor generates 4 wait cycles. The Thumb version however can still execute at one instruction per cycle from 16-bit memory, or 2 cycles from 8-bit memory. It therefore has better performance with narrow memory.
Page 64: CEC 320 and 322 Microprocessor Systems Class and Lab

6464

2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware

9-Oct-19 /erau/cec320/s19/btd

Page 65: CEC 320 and 322 Microprocessor Systems Class and Lab

6565

Address Space• The standard

ARM C programaddress space model Others COULD

be used – not advised

• The address space is 32 bits

9-Oct-19 /erau/cec320/s19/btd

Page 66: CEC 320 and 322 Microprocessor Systems Class and Lab

6666

hw_memmap.h• Defines the base addresses

of all peripherals in the TM4C system

• Each of these peripheral circuits is communicated with through read/write operations to a I/O interface mapped into memory

• Allows use of conventional load/store instructions to configure and/or communicate with peripherals

9-Oct-19 /erau/cec320/s19/btd

Page 67: CEC 320 and 322 Microprocessor Systems Class and Lab

6767

ARM Reserved Addresses0x00000000 Reset0x00000004 Undefined instruction exception0x00000008 Software interrupt0x0000000C Prefetch abort exception0x00000010 Data abort exception0x00000014 Reserved0x00000018 Interrupt request (IRQ)0x0000001C Fast interrupt request (FIQ)

9-Oct-19 /erau/cec320/s19/btd

Page 68: CEC 320 and 322 Microprocessor Systems Class and Lab

68689-Oct-19 /erau/cec320/s19/btd

Endianness• Relationship between

byte within word ordering defines endianness:

• Only significant for multi-byte accesses

• Little : IntelBig : MotorolaBi-Endian : ARM

byte 3 byte 2 byte 1 byte 0 byte 0 byte 1 byte 2 byte 3

bit 31 bit 0 bit 0 bit 31

little-endian big-endian

• Value being written 0x012345678

• MSB : 0x01LSB : 0x78

Page 69: CEC 320 and 322 Microprocessor Systems Class and Lab

6969

2B ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware

9-Oct-19 /erau/cec320/s19/btd

Page 70: CEC 320 and 322 Microprocessor Systems Class and Lab

7070

ARM Procedure Call Standard• Support for high-level languages• In some areas it is important to adopt

software-defined ‘standard’ solutions the ARM Procedure Call Standard (APCS) is

an example it provides a regular way for procedures to

operate

9-Oct-19 /erau/cec320/s19/btd

Page 71: CEC 320 and 322 Microprocessor Systems Class and Lab

7171

ARM Procedure Call Standard• The APCS defines:

particular uses for the ‘general-purpose’ registers

the form of stack to be used a stack-based data structure for backtracing an argument and result passing mechanism support for shared (re-entrant) libraries

9-Oct-19 /erau/cec320/s19/btd

Page 72: CEC 320 and 322 Microprocessor Systems Class and Lab

7272

APCS Register Use Convention

9-Oct-19 /erau/cec320/s19/btd

Page 73: CEC 320 and 322 Microprocessor Systems Class and Lab

7373

APCS Argument and Result Passing

• The arguments are arranged into a list of words the first 4 arguments are passed in a1 - a4 the remaining arguments are passed via the

stack• A simple result is returned via a1

more complex results are passed via memory, using a1 as the pointer

9-Oct-19 /erau/cec320/s19/btd

Page 74: CEC 320 and 322 Microprocessor Systems Class and Lab

74749-Oct-19 /erau/cec320/s19/btd

Runtime Stack in procedure calls

• Each unit is a stack frame• Frame specifics differ between architectures & even compilers

Page 75: CEC 320 and 322 Microprocessor Systems Class and Lab

75759-Oct-19 /erau/cec320/s19/btd

Alternate Procedure Stack• Not optimal – Would

use consistent colors for: Saved State Arguments Local Variables Dynamic Stack Usage

• Call Depth of (4)• Each of (N) is a

unique procedure

Page 76: CEC 320 and 322 Microprocessor Systems Class and Lab

7676

A Typical Frame Organization• Shows:

Fp – frame pointerConstant for procedure lifetime

Sp – stack pointmay change as local variables are added or removed

Activation recordThe contiguous block of memory on the stack corresponding to a procedure

9-Oct-19 /erau/cec320/s19/btd

Page 77: CEC 320 and 322 Microprocessor Systems Class and Lab

77779-Oct-19 /erau/cec320/s19/btd

ARM subroutine linkage• Branch and link instruction:

BL foo

Copies current PC to r14.• To return from subroutine:

MOV r15, r14

Page 78: CEC 320 and 322 Microprocessor Systems Class and Lab

78789-Oct-19 /erau/cec320/s19/btd

Nested subroutine calls• Nesting/recursion requires coding

convention:f1 LDR r0,[r13] ; load arg into r0 from stack

; call f2()

STR r13!,[r14] ; store f1’s return adrs

STR r13!,[r0] ; store arg to f2 on stack

BL f2 ; branch and link to f2

; return from f1()

SUB r13, #4 ; pop f2’s arg off stack

LDR r13!,r15 ; restore register and return

Page 79: CEC 320 and 322 Microprocessor Systems Class and Lab

7979

2.2 ARM Processor• ARM versions.• ARM ISA ( Programmer’s Model)• ARM assembly language.• ARM machine language• ARM memory organization.• ARM flow of control.• ARM example Hardware

9-Oct-19 /erau/cec320/s19/btd

Page 80: CEC 320 and 322 Microprocessor Systems Class and Lab

8080

Cortex-A57• ARM’s

64-bit (ARM v8)IP core

• 15-24 stage pipeline

• 3 wide execution

• http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-review/5

9-Oct-19 /erau/cec320/s19/btd

Page 81: CEC 320 and 322 Microprocessor Systems Class and Lab

8181

big.LITTLE

9-Oct-19 /erau/cec320/s19/btd

• Heterogeneous Multi-Core• Combine high-performance and high-efficiency cores on

a single die and choose core based on task characteristics

• Up to 70% energy reduction on common workloads• Transparent to software / Apps; Managed by O/S

https://www.mobilegeeks.de/samsung-plant-neues-chromebook-mit-arm-big-little-prozessor-octacore/

Page 82: CEC 320 and 322 Microprocessor Systems Class and Lab

8282

Power / Performance Relationship• Relationship is a function of microarchitecture &

fabrication technology• At high performance levels, sub linear performance

improvement when power is increased

9-Oct-19 /erau/cec320/s19/btd

http://cdn2.ubergizmo.com/wp-content/uploads/2013/01/ARM-Big-LITTLE-03.jpg

Page 83: CEC 320 and 322 Microprocessor Systems Class and Lab

8383

big.LITTLE Viewing• ARM big.LITTLE Technology Explained

https://youtu.be/KClygZtp8mA Starts & Ends as marketing Technical in middle of the video Significant OS discussion beyond microprocessors

• ARM DynamIQ Redefines Multi-Core Computing https://youtu.be/qPGTP_ZxDyY Good Heterogeneous animation Technical – but high level

9-Oct-19 /erau/cec320/s19/btd

Page 84: CEC 320 and 322 Microprocessor Systems Class and Lab

8484

Chap2B. ARM Processor (Cortex-M)• Summary

Assembly language is UAL – atypical in that it has multiple encodings (16/32-bit)

Very flexible instructions Using RISC lessons

Load/store General purpose registers

With complex functionality Rotate, shift, mask on one operand in many instructions Conditional execution Banked Registers

Handles branches and conditional execution efficiently Good for general purpose program encoding AMAZING number of hardware microarchitecture’s available

Common family allows for code compatibility & ease of development efforts

9-Oct-19 /erau/cec320/s19/btd

Page 85: CEC 320 and 322 Microprocessor Systems Class and Lab

85859-Oct-19 /erau/cec320/s19/btd

Summary (Wolf)• Load/store architecture• Most instructions are RISCy, operate in

single cycle. Some multi-register operations take longer.

• All instructions can be executed conditionally.

Page 86: CEC 320 and 322 Microprocessor Systems Class and Lab

86869-Oct-19 /erau/cec320/s19/btd

Example: C assignment• C:

y = a*(b+c);

• Assembler:ADR r4,b ; get address for b

LDR r0,[r4] ; get value of b

ADR r4,c ; get address for c

LDR r1,[r4] ; get value of c

ADD r2,r0,r1 ; compute partial result

ADR r4,a ; get address for a

LDR r0,[r4] ; get value of a

Page 87: CEC 320 and 322 Microprocessor Systems Class and Lab

87879-Oct-19 /erau/cec320/s19/btd

C assignment, cont’d.MUL r2,r2,r0 ; compute final value for y

ADR r4,y ; get address for y

STR r2,[r4] ; store y

Page 88: CEC 320 and 322 Microprocessor Systems Class and Lab

88889-Oct-19 /erau/cec320/s19/btd

Example: C assignment• C:

z = (a << 2) | (b & 15);

• Assembler:ADR r4,a ; get address for a

LDR r0,[r4] ; get value of a

MOV r0,r0,LSL 2 ; perform shift

ADR r4,b ; get address for b

LDR r1,[r4] ; get value of b

AND r1,r1,#15 ; perform AND

ORR r1,r0,r1 ; perform OR

Page 89: CEC 320 and 322 Microprocessor Systems Class and Lab

89899-Oct-19 /erau/cec320/s19/btd

C assignment, cont’d.ADR r4,z ; get address for z

STR r1,[r4] ; store value for z

Page 90: CEC 320 and 322 Microprocessor Systems Class and Lab

90909-Oct-19 /erau/cec320/s19/btd

ARM flow of control• All operations can be performed

conditionally, testing CPSR: EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE,

LT, GT, LE• Branch operation:

B #100

Can be performed conditionally.

Page 91: CEC 320 and 322 Microprocessor Systems Class and Lab

91919-Oct-19 /erau/cec320/s19/btd

Example: FIR filter• C:

for (i=0, f=0; i<N; i++)

f = f + c[i]*x[i];

• Assembler; loop initiation code

MOV r0,#0 ; use r0 for I

MOV r8,#0 ; use separate index for arrays

ADR r2,N ; get address for N

LDR r1,[r2] ; get value of N

MOV r2,#0 ; use r2 for f

Page 92: CEC 320 and 322 Microprocessor Systems Class and Lab

92929-Oct-19 /erau/cec320/s19/btd

FIR filter, cont’.dADR r3,c ; load r3 with base of c

ADR r5,x ; load r5 with base of x

; loop body

loop LDR r4,[r3,r8] ; get c[i]

LDR r6,[r5,r8] ; get x[i]

MUL r4,r4,r6 ; compute c[i]*x[i]

ADD r2,r2,r4 ; add into running sum

ADD r8,r8,#4 ; add one word offset to array index

ADD r0,r0,#1 ; add 1 to i

CMP r0,r1 ; exit?

BLT loop ; if i < N, continue

Page 93: CEC 320 and 322 Microprocessor Systems Class and Lab

93939-Oct-19 /erau/cec320/s19/btd

Example: Conditional instruction implementation

; true block

MOVLT r0,#5 ; generate value for x

ADRLT r4,x ; get address for x

STRLT r0,[r4] ; store x

ADRLT r4,c ; get address for c

LDRLT r0,[r4] ; get value of c

ADRLT r4,d ; get address for d

LDRLT r1,[r4] ; get value of d

ADDLT r0,r0,r1 ; compute y

ADRLT r4,y ; get address for y

STRLT r0,[r4] ; store y

Page 94: CEC 320 and 322 Microprocessor Systems Class and Lab

94949-Oct-19 /erau/cec320/s19/btd

Example: switch statement• C:

switch (test) { case 0: … break; case 1: … }

• Assembler:ADR r2,test ; get address for test

LDR r0,[r2] ; load value for test

ADR r1,switchtab ; load address for switch table

LDR r1,[r1,r0,LSL #2] ; index switch table

switchtab DCD case0

DCD case1

...