New Armlect1

download New Armlect1

of 80

Transcript of New Armlect1

  • 8/3/2019 New Armlect1

    1/80

    1 1

    ARM

    The ARM Instruction SetARMAdvanced RISC Machines

  • 8/3/2019 New Armlect1

    2/80

    2 2

    ARM

    P rocessor Modes

    The ARM has six operating modes: User User (unprivileged mode under which most tasks run) (10000) FIQFIQ (entered when a high priority (fast) interrupt is raised)

    (10001) IRQIRQ (entered when a low priority (normal) interrupt is raised)

    (10010) Supervisor Supervisor (entered on reset and when a Software Interrupt

    instruction is executed) (10011) Abort Abort (used to handle memory access violations) (10111) Undef Undef (used to handle undefined instructions) (11011)

    ARM Architecture Version 4 adds a seventh mode: SystemSystem (privileged mode using the same registers as user

    mode) (11111)

  • 8/3/2019 New Armlect1

    3/80

    3 3

    ARM

    ARM has 37 registers in total, all of which are 32-bitslong.

    1 dedicated program counter 1 dedicated current program status register

    5 dedicated saved program status registers 30 general purpose registers

    H owever these are arranged into several banks, with theaccessible bank being governed by the processor mode.Each mode can access

    a particular set of r0-r12 registers a particular r13 (the stack pointer) and r14 (link register) r15 (the program counter) cpsr (the current program status register)

    and privileged modes can also access a particular spsr (saved program status register)

    The Registers

  • 8/3/2019 New Armlect1

    4/80

  • 8/3/2019 New Armlect1

    5/80

    5 5

    ARM

    Accessing Registers usingARM Instructions

    N o breakdown of currently accessible registers. All instructions can access r0-r14 directly. Most instructions also allow use of the PC.

    Specific instructions to allow access to CPSRand SPSR.

  • 8/3/2019 New Armlect1

    6/80

    6 6

    ARM

    The P rogram Status Registers(C P SR and S P SRs)

    Co pies o f the ALU s tatus f lags (l atched if theinstructio n has the "S" bit set).

    N = N egati ve result fr om ALU f lag.Z = Z er o result fr om ALU f lag.C = ALU o perati o n C arried o utV = ALU o perati o n o Verf low ed

    * Interrupt Disa b le b its.I = 1, disables the IRQ.F = 1, disables the FIQ.

    * T Bit ( A rchitecture v4T only)T = 0, P r o cesso r in ARM s tateT = 1, P r o cesso r in Thum b state

    * Condition Code Flags

    M odeN Z C V

    2831 8 4 0

    I F T

    * M ode BitsM [4:0] define the pr o cesso r mo de.

  • 8/3/2019 New Armlect1

    7/80

    7 7

    ARM

    L ogical Instruction A rithmetic Instruction

    Flag

    Negati ve No m eaning Bit 31 o f the result has been set(N=1) Indicates a negati ve num ber in

    signed o perati o ns

    Zer o R esult is all z er o es R esult o f o perati o n w aszer o(Z=1)

    C arry After Shift o perati o n R esult w as greater than 32 bits(C =1) 1w as left in carry f lag

    o Verf low No m eaning R esult w as greater than 31 bits(V=1) Indicates a po ss ible

    co rru ptio n o f the sign bit in signed

    Condition Flags

  • 8/3/2019 New Armlect1

    8/80

    8 8

    ARM

    W hen the processor is executing in ARM state: All instructions are 32 bits in length All instructions must be word aligned Therefore the PC value is stored in bits [31:2] with bits [1:0]

    equal to zero (as instruction cannot be halfword or bytealigned).

    R14 is used as the subroutine link register (LR) andstores the return address when Branch with Linkoperations are performed,calculated from the PC.Thus to return from a linked branch

    MOV r15,r14

    or

    The P rogram Counter (R15)

  • 8/3/2019 New Armlect1

    9/80

    9 9

    ARM

    Register Example:User to FIQ Mode

    spsr_fiq

    cpsr

    r7

    r4r 5

    r2r 1

    r 0

    r3

    r6

    r 15 (pc)r 14_fiq

    r 13_fiq

    r 12_fiq

    r 10 _fiq

    r 11 _fiq

    r9_fiq

    r8_fiq

    r 14 (lr )

    r 13 (sp)

    r 12

    r 10

    r 11

    r9

    r8

    U ser mode CPS R copied to FIQ mode SPS R

    cpsr

    r 15 (pc)r 14 (lr )

    r 13 (sp)

    r 12

    r 10

    r 11

    r9

    r8

    r7

    r4r 5

    r2r 1

    r 0

    r3

    r6

    r 14_fiq

    r 13_fiq

    r 12_fiq

    r 10 _fiq

    r 11 _fiq

    r9_fiq

    r8_fiq

    R eturn address calculated from U ser modePC value and stored in FIQ mode LR

    R egisters in use R egisters in use

    EXCEPTIO N

    U ser M ode FIQ M ode

    spsr_fiq

    D isable FIQ

  • 8/3/2019 New Armlect1

    10/80

    10 10

    ARM

    Exceptions

    E xceptions generated as the direct effect of executing

    an instruction Software Interrupts, Undefined Instructions & Prefetch AbortsE xceptions generated as a side effect of an Instruction Data Aborts (Caused by Load/Store Instructions)

    E xceptions generated externally, unrelated toInstruction flow Reset, IRQ, FIQ

  • 8/3/2019 New Armlect1

    11/80

    11 11

    ARM

    Exception Entry

    Changes Operating Mode

    Save Address of next Instruction in r14 of thenew modeSaves Old value of CPSR into SPSR of newmode

    Disables either IRQ or FIQ if the exception isIRQ or FIQ respectivelyForces PC to vector to new address

  • 8/3/2019 New Armlect1

    12/80

    12 12

    ARM

    Exception Return

    Any modified user registers should be restored

    from the StackThe CPSR should be restored from appropriateSPSRThe PC must be changed to relevant User Instruction Stream

    P roblem: Last two cannot be carried outindependently.

  • 8/3/2019 New Armlect1

    13/80

    13 13

    ARM

    Solution

    To return from S W I MOVS pc, r14To return from IRQ, FIQ or Prefetch Abort SUBS pc, r14, #4

    To return from Data Aborts SUBS pc, r14, #8

    The S modifier signifies special form of Instruction when the destination is P C

    This way only if S P SR, r14 stored onto the Stack:

    LD MFD r13!, {r0-r3,pc} ^

  • 8/3/2019 New Armlect1

    14/80

    14 14

    ARM

    P riority

    Reset

    Data AbortFIQIRQPrefetch Abort

    S W I, Undefined Instruction

    There should be some doubt

  • 8/3/2019 New Armlect1

    15/80

    15 15

    ARM

    W hen an exception occurs, thecore: Copies CPSR into SPSR_ Sets appropriate CPSR bits

    If core implements ARM Architecture 4Tand is currently in Thumb state, then

    ARM state is entered.Mode field bits

    Interrupt disable flags if appropriate. Maps in appropriate banked registers Stores the return address in

    LR_ Sets PC to vector address

    To return, exception handler

    Exception H andlingand the Vector Table

    0x00000000

    0x0000001 C

    0x0000001 8

    0x0000001 4

    0x00000010

    0x0000000 C

    0x0000000 8

    0x0000000 4

    R eset

    U ndefined Instruction

    FIQ

    IR Q

    R eserved

    Data Ab ort

    Prefetch Ab ort

    Software Interrupt

  • 8/3/2019 New Armlect1

    16/80

    16 16

    ARM

    The Instruction P ipeline

    The ARM uses a pipeline in order to increase thespeed of the flow of instructions to the processor.

    Allows several operations to be undertaken simultaneously,rather than serially.FE TCH

    DE COD E

    EXE CUT E

    Instruction fetched from memory

    Decoding of registers used in instruction

    Register(s) read from Register BankShift and ALU operationW rite register(s) back to Register Bank

    PC

    PC - 4

    PC - 8

    ARM

  • 8/3/2019 New Armlect1

    17/80

    17 17

    ARM

    Quiz #1 - Verbal

    W hat registers are used to store the program counter

    and link register?W hat is r13 often used to store?W hich mode, or modes has the fewest availablenumber of registers available? How many and why?N

    ame the exceptions in the ARM?Mention their priorities.W hat happens on E xception E ntry & E xception E xit?

  • 8/3/2019 New Armlect1

    18/80

    18 18

    ARM

    ARM Instruction Set Format

    Instruction typeD ata processing /

    P SR Transfer

    Multiply

    Long Multiply(v3M / v4 only)

    Swap

    Load/StoreByte/Word

    Load/Store MultipleH alfword transfer :

    Immediate offset (v4only)

    H alfword transfer:Re ister offset v4 onl

    Cond 0 0 I Opc ode S R n Rd Ope ra nd2

    Cond 0 0 0 0 0 0 A S R d Rn Rs 1 0 0 1 Rm

    Cond 0 0 0 1 0 B 0 0 R n Rd 0 0 0 0 1 0 0 1 Rm

    Cond 0 1 I P U B W L R n Rd Offs et

    Cond 1 0 0 P U S W L R n Regi s te r L i s t

    Cond 0 0 0 0 1 U A S R dHi Rd Lo Rs 1 0 0 1 Rm

    Cond 0 0 0 P U 1 W L R n Rd Offs et 1 1 S H 1 Offs et2

    Cond 1 0 1 L Offs et

    Cond 1 1 0 P U N W L R n CRd CPNum Offs et

    Cond 1 1 1 0 Op1 CRn CRd CPNum Op 2 0 CRm

    Cond 1 1 1 0 Op1 L CRn Rd CPNum Op 2 1 CRm

    Cond 1 1 1 1 SWI Numb e r

    Cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 R n

    Cond 0 0 0 P U 0 W L R n Rd 0 0 0 0 1 S H 1 Rm

    31 2827 1615 87 0

  • 8/3/2019 New Armlect1

    19/80

    19 19

    ARM

    Conditional Execution

    Most instruction sets only allow branches to beexecuted conditionally.However by reusing the condition evaluation hardware,

    ARM effectively increases number of instructions. All instructions contain a condition field which determines

    whether the CPU will execute them. N on-executed instructions soak up 1 cycle.

    Still have to complete cycle so as to allow fetching and decodingof following instructions.

    This removes the need for many branches, which stallthe pipeline (3 cycles to refill). Allows very dense in-line code, without branches. The Time penalty of not executing several conditional

  • 8/3/2019 New Armlect1

    20/80

    20 20

    ARM

    The Condition Field

    2831 24 20 16 12 8 4 0

    Cond

    0000 = EQ - Z s et (equa l)

    0001 = NE - Z cl ear (no t equa l)

    0010 = HS / C S - C set (unsigned higher o r sam e)

    0011 = LO / CC - C clear (unsigned low er )

    0100 = MI -N s et (negati ve)

    0101 = PL - N cl ear (po sitive o r zer o )

    0110 = VS - V set (o verf low )

    0111 = VC - V clear (no o verf low )

    1000 = HI - C set and Z clear (unsigned higher )

    1001 = LS - C clear or Z (setunsigned lower or same)

    1010 = GE - N set and V set,or N clear and V clear (>or =)

    1011 = LT - N set and Vclear, or N clear and Vset (>)

    1100 = GT - Z clear, andeither N set and V set,or N clear and V set(>)

  • 8/3/2019 New Armlect1

    21/80

    21 21

    ARM

    Using and updating the ConditionField

    To execute an instruction conditionally, simply postfix itwith the appropriate condition: For example an add instruction takes the form:

    ADD r0,r1,r 2 ; r0 = r1 + r 2 (ADDAL) To execute this only if the zero flag is set:

    ADDEQ r0,r1,r 2 ; If z e r o fla g s et then; ... r0 = r1 + r 2

    By default, data processing operations do not affect thecondition flags (apart from the comparisons where thisis the only effect). To cause the condition flags to beupdated, the S bit of the instruction needs to be set bypostfixing the instruction (and any condition code) with

    an S .

  • 8/3/2019 New Armlect1

    22/80

    22 22

    ARM

    Branch : B{} labelBranch with Link : BL{}sub_routine_label

    The offset for branch instructions is calculated bythe assembler:

    By taking the difference between the branch instructionand the target address minus 8 (to allow for thepipeline).

    This gives a 26 bit offset which is right shifted 2 bits (as

    Branch instructions (1)

    2831 24 0

    Cond 1 0 1 L Offset

    Condition field

    L ink b it 0 = Bran ch1 = Bran ch w ith link

    232527

  • 8/3/2019 New Armlect1

    23/80

    23 23

    ARM

    Branch instructions (2)

    W hen executing the instruction, the processor: shifts the offset left two bits, sign extends it to 32 bits, and adds

    it to PC.E xecution then continues from the new PC, once thepipeline has been refilled.The "Branch with link" instruction implements asubroutine call by writing PC-4 into the LR of thecurrent bank.

    i.e. the address of the next instruction following the branch withlink (allowing for the pipeline).

    To return from subroutine, simply need to restore thePC from the LR:

    MOV pc, lr

  • 8/3/2019 New Armlect1

    24/80

    24 24

    ARM

    D ata processing Instructions

    Largest family of ARM instructions, all sharing thesame instruction format.Contains: Arithmetic operations Comparisons (no results - just set condition codes) Logical operations Data movement between registers

    Remember, this is a load / store architecture These instruction only work on registers, NOT NOT memory.

    They each perform a specific operation on one or twooperands. First operand always a register - Rn

  • 8/3/2019 New Armlect1

    25/80

    25 25

    ARM

    Arithmetic Operations

    Operations are: ADD operand1 + operand2 ADC operand1 + operand2 + carry SUB operand1 - operand2 SBC operand1 - operand2 + carry -1 RSB operand2 - operand1 RSC operand2 - operand1 + carry - 1

    Syntax: {}{S} Rd, Rn, Operand2

    E xamples ADD r0, r1, r2

    SUBGT r3, r3, #1 RSBL E S r4, r5, #5

  • 8/3/2019 New Armlect1

    26/80

    26 26

    ARM

    Comparisons

    The only effect of the comparisons is to UPDA TE THE CON DI T I ON FL AGS UPDA TE THE CON DI T I ON FL AGS. Thus no need to set S

    bit.Operations are: CMP operand1 - operand2, but result not written CMN operand1 + operand2, but result not written TST operand1 A N D operand2, but result not written TE Q operand1 E OR operand2, but result not written

    Syntax: {} Rn, Operand2

    E xamples:

    CMP r0, r1 TST E Q r2 #5

  • 8/3/2019 New Armlect1

    27/80

    27 27

    ARM

    Logical Operations

    Operations are: AN D operand1 A N D operand2 E OR operand1 E OR operand2 ORR operand1 OR operand2 BIC operand1 A N D N OT operand2 [ie bit clear]

    Syntax: {}{S} Rd, Rn, Operand2

    E xamples: AN D r0, r1, r2 E ORS r1,r3,r0

  • 8/3/2019 New Armlect1

    28/80

    28 28

    ARM

    D ata Movement

    Operations are: MOV operand2 MVN N OT operand2

    N ote that these make no use of operand1.Syntax: {}{S} Rd, Operand2

    E xamples: MOV r0, r1 MOVS r2, #10 MVNE Q r1,#0

  • 8/3/2019 New Armlect1

    29/80

    29 29

    ARM

    Quiz #2Start

    Stopr0 = r1?

    r0 > r1?

    r0 = r0 - r1 r1 = r1 - r0

    Yes

    No Yes

    No

    Convert the GCDalgorithm given in thisflowchart into

    1) N ormal assembler,where only branchescan be conditional.

    2)ARM assembler,

    where all instructionsare conditional, thusimproving codedensity.

    The only instructionsyou need are CMP, B

  • 8/3/2019 New Armlect1

    30/80

    30 30

    ARM

    Quiz #2 - Sample Solutions

    ARM Conditional Assembler

    g c d cmp r0, r1 ;i f r0 > r1sub gt r0, r0, r1 ; sub t rac t r1

    fr o m r0subl t r1, r1, r0 ;e ls e sub t rac t

    r0 fr o m r1b ne g c d ; r e ac hed the

    end?

  • 8/3/2019 New Armlect1

    31/80

    31 31

    ARM

    The Barrel Shifter

    The ARM doesnt have actual shift instructions.

    Instead it has a barrel shifter which provides amechanism to carry out shifts as part of other instructions.

    So what operations does the barrel shifter support?

  • 8/3/2019 New Armlect1

    32/80

    32 32

    ARM

    Shifts left by the specified amount (multiplies

    by powers of two) e.g.LSL #5 = multiply by 32

    Barrel Shifter - Left Shift

    L ogical Shift L eft ( L SL )

    DestinationCF 0

  • 8/3/2019 New Armlect1

    33/80

    33 33

    ARM

    Logical Shift RightShifts right bythe specifiedamount (dividesby powers of two)e.g.

    LSR #5 = divideby 32

    Arithmetic ShiftRightShifts right

    (divides byowers of two

    Barrel Shifter - Right Shifts

    Destination CF

    Destination CF

    L ogical Shift R ight

    A rithmetic Shift R ight

    ...0

    Sign b it shifted in

  • 8/3/2019 New Armlect1

    34/80

    34 34

    ARM

    Barrel Shifter - RotationsRotate Right (ROR)Similar to an ASR butthe bits wrap aroundas they leave the LSB

    and appear as theMSB.

    e.g. ROR #5N ote the last bitrotated is also usedas the Carry Out.

    Rotate Right E xtended(RR X )

    This operation uses

    Destination CF

    R otate R ight

    Destination CF

    R otate R ight through Carry

  • 8/3/2019 New Armlect1

    35/80

    35 35

    ARM

    Using the Barrel Shifter:The Second Operand

    * Immediate value8 bit num ber C an be r o tated right thr o ugh an even num ber o f

    po sitio ns.Ass em bler w ill calculate r o tate f o r yo u fr om co nstant .

    Register, optionallywith shift operationapplied.Shift value can be

    either be: 5 bit unsigned integer Specified in bottom

    byte of another register.

    Operand1

    Result

    ALU

    BarrelShifter

    Operand2

  • 8/3/2019 New Armlect1

    36/80

    36 36

    ARM

    Second Operand :Shifted Register

    The amount by which the register is to be shiftedis contained in either: the immediate 5-bit field in the instruction

    NO OVE RH E ADNO OVE RH E ADShift is done for free - executes in single cycle.

    the bottom byte of a register (not PC)Then takes extra cycle to execute

    ARM doesnt have enough read ports to read 3 registersat once.Then same as on other processors where shift isseparate instruction.

    If no shift is specified then a default shift isapplied: LSL #0 i.e. barrel shifter has no effect on value in register.

  • 8/3/2019 New Armlect1

    37/80

    37 37

    ARM

    Second Operand :Using a Shifted Register

    Using a multiplication instruction to multiply by aconstant means first loading the constant into aregister and then waiting a number of internal cyclesfor the instruction to complete.

    A more optimum solution can often be found by usingsome combination of MOVs, ADDs, SUBs and RSBswith shifts. Multiplications by a constant equal to a ((power of 2) 1) can

    be done in one cycle.E xample: r0 = r1 * 5E xample: r0 = r1 + (r1 * 4)

    ADD r0, r1, r1, LSL #2E xam le: r2 = r3 * 105

  • 8/3/2019 New Armlect1

    38/80

    38 38

    ARM

    Second Operand :Immediate Value (1)

    There is no single instruction which will load a 32 bitimmediate constant into a register without performing adata load from memory. All ARM instructions are 32 bits long ARM instructions do not use the instruction stream as data.

    The data processing instruction format has 12 bitsavailable for operand2 If used directly this would only give a range of 4096.

    Instead it is used to store 8 bit constants, giving arange of 0 - 255.These 8 bits can then be rotated right through an even

    number of positions (ie RORs by 0, 2, 4,..30). This ives a much lar er ran e of constants that can be

  • 8/3/2019 New Armlect1

    39/80

    39 39

    ARM

    Second Operand :Immediate Value (2)

    This gives us: 0 - 255 [0 - 0xff] 256,260,264,..,1020 [0x100-0x3fc, step 4, 0x40-

    0xff ror 30] 1024,1040,1056,..,4080 [0x400-0xff0, step 16, 0x40-0xff ror

    28] 4096,4160, 4224,..,16320 [0x1000-0x3fc0, step 64, 0x40-0xff

    ror 26]These can be loaded using, for example:

    MOV r0, #0x40, 26 ; => MOV r0, #0x1000 (ie4096)

    To make this easier, the assembler will convert to this

    form for us if simply given the required constant:=

  • 8/3/2019 New Armlect1

    40/80

    40 40

    ARM

    Loading full 32 bit constants

    Although the MOV/MV N mechansim will load a largerange of constants into a register, sometimes thismechansim will not generate the required constant.Therefore, the assembler also provides a methodwhich will load ANY ANY 32 bit constant:

    LDR r d ,= n ume r i c c on s t a nt

    If the constant can be constructed using either a MOVor MV N then this will be the instruction actuallygenerated.Otherwise, the assembler will produce an LDRinstruction with a PC-relative address to read the

    constant from a literal pool. LDR r0,=0x4 2 ; gene ra te s MOV r0,#0x4 2

  • 8/3/2019 New Armlect1

    41/80

    41 41

    ARM

    Multiplication Instructions

    The Basic ARM provides two multiplicationinstructions.Multiply MUL{}{S} Rd, Rm, Rs ; Rd = Rm * Rs

    Multiply Accumulate - does addition for free MLA{}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn

    Restrictions on use: Rd and Rm cannot be the same register

    Can be avoid by swapping Rm and Rs around. This worksbecause multiplication is commutative.

    Cannot use PC.

    These will be picked up by the assembler if overlooked.

  • 8/3/2019 New Armlect1

    42/80

    42 42

    ARM

    Multiplication Implementation

    The ARM makes use of Booths Algorithm to performinteger multiplication.On non-M ARMs this operates on 2 bits of Rs at a time.

    For each pair of bits this takes 1 cycle (plus 1 cycle to startwith).

    However when there are no more 1s left in Rs, themultiplication will early-terminate.

    E xample: Multiply 18 and -1 : Rd = Rm * Rs

    0 0 0 0 0 0 1 00 0 0 10 0 0 00 0 0 00 0 0 0 0 0 0 0 0 0 0 0

    1 1 1 1 1 1 1 11 1 1 11 1 1 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1

    R m

    R s

    17 cycles

    R s

    R m4 cycles

    18

    -1

    18

    -1

  • 8/3/2019 New Armlect1

    43/80

    43 43

    ARM

    Extended Multiply Instructions

    M variants of ARM cores contain extendedmultiplication hardware. This provides threeenhancements:

    An 8 bit Booths Algorithm8 bit Booths Algorithm is usedMultiplication is carried out faster (maximum for standardinstructions is now 5 cycles).

    E arly termination method improved E arly termination method improved so that now completes

    multiplication when all remaining bit sets containall zeroes (as with non-M ARMs), or all ones.

    Thus the previous example would early terminate in 2cycles in both cases.

    64 bit results64 bit results can now be produced from two 32bitoperands

  • 8/3/2019 New Armlect1

    44/80

    44 44

    ARM

    Multiply-Long andMultiply-Accumulate Long

    Instructions are MULL which gives RdHi,RdLo:=Rm*Rs MLAL which gives RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo

    However the full 64 bit of the result now matter (lower precision multiply instructions simply throws top 32bitsaway)

    N eed to specify whether operands are signed or unsigned

    Therefore syntax of new instructions are: UMULL{}{S} RdLo,RdHi,Rm,Rs UMLAL{}{S} RdLo,RdHi,Rm,Rs SMULL{}{S} RdLo, RdHi, Rm, Rs SMLAL{}{S} RdLo, RdHi, Rm, Rs

  • 8/3/2019 New Armlect1

    45/80

    45 45

    ARM

    Quiz #31. Specify instructions which will implement the following:a) r0 = 16 b) r1 = r0 * 4

    c) r0 = r1 / 16 ( r1 signed 2's comp.) d) r1 = r2 * 7

    2. W hat will the following instructions do?a) ADDS r0, r1, r1, LSL #2 b) RSB r2, r1, #0

    3. W hat does the following instruction sequence do? ADD r0, r1, r1, LSL #1SUB r0, r0, r1, LSL #4

    ADD r0, r0, r1, LSL #7

  • 8/3/2019 New Armlect1

    46/80

    46 46

    ARM

    Load / Store Instructions

    The ARM is a Load / Store Architecture: Does not support memory to memory data processing

    operations. Must move data values into registers before using them.

    This might sound inefficient, but in practice isnt: Load data values from memory into registers. Process data in registers using a number of data

    processing instructions which are not slowed down bymemory access.

    Store results from registers out to memory.

    The ARM has three sets of instructions whichinteract with main memory. These are: Single register data transfer (LDR / STR).

  • 8/3/2019 New Armlect1

    47/80

    47 47

    ARM

    Single register data transfer

    The basic load and store instructions are: Load and Store W ord or Byte

    LDR / STR / LDRB / STRB

    ARM Architecture Version 4 also adds support for halfwords and signed data.

    Load and Store HalfwordLDRH / STRH

    Load Signed Byte or Halfword - load value and sign extend it to32 bits.

    LDRSB / LDRSH

    All of these instructions can be conditionally executedby inserting the appropriate condition code after STR /LDR.

  • 8/3/2019 New Armlect1

    48/80

    48 48

    ARM

    Load and Store Word or Byte:Base Register

    The memory location to be accessed is held in

    a base register STR r0, [r1] ; Store contents of r0 to location

    pointed to; by contents of r1.

    LDR r2, [r1] ; Load r2 with contents of memorylocation

    ; pointed to by contents of r1.r1

    0x200Base

    Register

    Memory

    0x50x200

    r0

    0x5Source

    Register for STR

    r2

    0x5D estination

    Register for L D R

  • 8/3/2019 New Armlect1

    49/80

    49 49

    ARM

    Load and Store Word or Byte:Offsets from the Base Register

    As well as accessing the actual location contained inthe base register, these instructions can access alocation offset from the base register pointer.This offset can be

    An unsigned 12bit immediate value (ie 0 - 4095 bytes). A register, optionally shifted by an immediate value

    This can be either added or subtracted from thebase register: Prefix the offset value or register with + (default) or -.

    This offset can be applied: before the transfer is made: PrePre- -indexed a ddressing indexed a ddressing

    optionally autoauto--incrementing incrementing the base register, by postfixingthe instruction with an !.

  • 8/3/2019 New Armlect1

    50/80

    50 50

    ARM

    Load and Store Word or Byte:P re-indexed Addressing

    E xample: STR r0, [r1,#12]

    To store to location 0x1f4 instead use: STR r0, [r1,#-

    12]

    r1

    0x200Base

    Register

    Memory

    0x5

    0x200

    r0

    0x5Source

    Register for STR

    Offset

    12 0x20c

  • 8/3/2019 New Armlect1

    51/80

    51 51

    ARM

    Load and Store Word or Byte:P ost-indexed Addressing

    E xample: STR r0, [r1], #12

    To auto-increment the base register to location 0x1f4instead use:

    STR r0, [r1], #-12

    r1

    0x200Original

    BaseRegister

    Memory

    0x50x200

    r0

    0x5Source

    Register for STR

    Offset

    12 0x20c

    r1

    0x20c Updated

    BaseRegister

  • 8/3/2019 New Armlect1

    52/80

    52 52

    ARM

    Load and Storeswith User Mode P rivilege

    W hen using post-indexed addressing, there is a further form of Load/Store W ord/Byte: {}{B} T Rd,

    W hen used in a privileged mode, this does the

    load/store with user mode privilege. N ormally used by an exception handler that is emulating amemory access instruction that would normally execute in user mode.

  • 8/3/2019 New Armlect1

    53/80

    53 53

    ARM

    Example Usage of Addressing Modes

    Imagine an array, the first element of which is pointedto by the contents of r0.If we want to access a particular element,then we can use pre-indexed addressing:

    r1 is element we want. LDR r2, [r0, r1, LSL #2]

    If we want to step through everyelement of the array, for instanceto produce sum of elements in thearray, then we can use post-indexed addressing withina loop:

    r1 is address of current element (initially equal to r0).

    0

    1

    2

    3

    element

    0

    4

    8

    12

    MemoryOffset

    r0

    Pointer tostart of array

  • 8/3/2019 New Armlect1

    54/80

    54 54

    ARM

    Offsets for H alfword and SignedH alfword / Byte Access

    The Load and Store Halfword and Load Signed Byte or Halfword instructions can make use of pre- and post-indexed addressing in much the same way as thebasic load and store instructions.However the actual offset formats are moreconstrained: The immediate value is limited to 8 bits (rather than 12 bits)

    giving an offset of 0-255 bytes. The register form cannot have a shift applied to it.

    ARM

  • 8/3/2019 New Armlect1

    55/80

    55 55

    ARM

    Effect of endianess

    The ARM can be set up to access its data in either little or big endian format.Little endian: Least significant byte of a word is stored in b its 0 b its 0 - -7 7 of an

    addressed word.

    Big endian: Least significant byte of a word is stored in b its 24b its 24 - -31 31 of

    an addressed word.This has no real relevance unless data is stored aswords and then accessed in smaller sizedquantities (halfwords or bytes). W hich byte / halfword is accessed will depend on the

    endianess of the system involved.

    ARM

  • 8/3/2019 New Armlect1

    56/80

    56 56

    ARM

    Endianess Example

    Big-endianL ittle-endian

    r1 = 0x100

    r 0 = 0x11 22334431 24 23 16 15 8 7 0

    11 22 33 44

    31 24 23 16 15 8 7 0

    11 22 33 44

    31 24 23 16 15 8 7 0

    44 33 22 11

    31 24 23 16 15 8 7 0

    00 00 00 44

    31 24 23 16 15 8 7 0

    00 00 00 11

    r2 = 0x 44 r2 = 0x11

    STR r0, [r1]

    LD RB r2, [r1]

    r1 = 0x100Memo ry

    ARM

  • 8/3/2019 New Armlect1

    57/80

    57 57

    ARM

    Quiz #4

    W rite a segment of code that add together elements xto x+(n-1) of an array, where the element x=0 is thefirst element of the array.E ach element of the array is word sized (ie. 32 bits).The segment should use post-indexed addressing.

    At the start of your segments, you should assume that: r0 points to the start of the array. r1 = x r2 = n r0

    xx + 1

    x + (n - 1)

    Elements

    {n elements

    0

    ARM

  • 8/3/2019 New Armlect1

    58/80

    58 58

    ARM

    Quiz #4 - Sample SolutionADD r0, r0, r1, LSL# 2 ; Set r0 to a dd r e ss o fe l e ment xADD r 2 , r0, r 2 , LSL# 2 ; Set r 2 to a dd r e ss o fe l e ment n +1MOV r1, #0 ; I niti al i s e c o u nte r

    l oo pLDR r3, [r0], #4 ; Acc e ss e l e ment a nd move to ne x t

    ADD r1, r1, r3 ; Add c ontent s to c o u nte rCMP r0, r 2 ; Ha ve we r e ac hed e l e ment x+ n?

    BLT l oo p ; If not - r e p e a t f o r

    ; ne x t e l e ment

    ARM

  • 8/3/2019 New Armlect1

    59/80

    59 59

    ARM

    BlockD

    ata Transfer (1)

    Cond 1 0 0 P U S W L R n R egister list

    Condition field Base registerL oad/Store b it0 = S to re to m emo ry1 = L o ad fr om m emo ry

    Write- b ack b it0 = no w rite - back 1 = w rite addre ss into base

    PS R and force user b it0 = do nt lo ad PSR o r f o r ce user mo de1 = l o ad PSR o r f o r ce user mo de

    U p/Down b it0 = D ow n; subtra ct o ff set fr om base1 = Up ; add o ff set to base

    Pre/Post indexing b it0 = P o st; add o ff set after tran sfer ,1 = P re ; add o ff set bef o re tran sfer

    2831 22 16 023 21 1527 20 1924

    Each b it corresponds to a particular

    register. For example:Bit 0 set causes r 0 to be tran sferred .Bit 0 unset causes r 0 no t to be tran sferred .

    A t least one register must b etransferred as the list cannot b e empty.

    The Load and Store Multiple instructions (LDM

    / STM) allow betweeen 1 and 16 registers to betransferred to or from memory.The transferred registers can be either: Any subset of the current bank of registers (default).

    Any subset of the user mode bank of registers whenin a priviledged mode (postfix instruction with a ^ ).

    ARM

  • 8/3/2019 New Armlect1

    60/80

    60 60

    ARM

    BlockD

    ata Transfer (2)

    Base register used to determine where memory accessshould occur. 4 different addressing modes allow increment and decrement

    inclusive or exclusive of the base register location. Base register can be optionally updated following the transfer

    (by appending it with an !!. Lowest register number is always transferred to/from lowest

    memory location accessed.

    These instructions are very efficient for Saving and restoring context

    For this useful to view memory as a stack. Moving large blocks of data around memory

    For this useful to directly represent functionality of theinstructions.

    ARM

  • 8/3/2019 New Armlect1

    61/80

    61 61

    ARM

    Stacks

    A stack is an area of memory which grows asnew data is pushed onto the top of it, andshrinks as data is popped off the top.Two pointers define the current limits of thestack. A base pointer

    used to point to the bottom of the stack (the first location).

    A stack pointer used to point the current top of the stack.

    SP

    BA SE

    PUSH {1,2,3}

    1

    23

    BA SE

    SP

    POP

    1

    2Result of pop = 3

    BA SE

    SP

    ARM

  • 8/3/2019 New Armlect1

    62/80

    62 62

    ARM

    Stack Operation

    Traditionally, a stack grows down in memory, with thelast pushed value at the lowest address. The ARMalso supports ascending stacks, where the stackstructure grows up through memory.The value of the stack pointer can either:

    Point to the last occupied address (Full stack)and so needs pre-decrementing (ie before the push)

    Point to the next occupied address ( E mpty stack)and so needs post-decrementing (ie after the push)

    The stack type to be used is given by the postfix to theinstruction:

    STMFD / LDMFD : Full Descending stack STMFA / LDMFA : Full Ascending stack.

    ARM

  • 8/3/2019 New Armlect1

    63/80

    63 63

    ARM

    Stack ExamplesSTMFD sp!,{r0,r1,r3-r5}

    r5r4r3

    r1r0SP

    Old SP

    STMED sp!,{r0,r1,r3-r5}

    r5r4r3r1

    r0SP

    Old SP

    r5r4r3

    r1r0

    STMF A sp!,{r0,r1,r3-r5}

    SP

    Old SP 0x400

    0x418

    0x3e8

    STME A sp!,{r0,r1,r3-r5}

    r5r4

    r3r1r0

    SP

    Old SP

  • 8/3/2019 New Armlect1

    64/80

    ARM

  • 8/3/2019 New Armlect1

    65/80

    65 65

    ARM

    D irect functionality of Block D ata Transfer

    W hen LDM / STM are not being used to implementstacks, it is clearer to specify exactly what functionalityof the instruction is: i.e. specify whether to increment / decrement the base pointer,

    before or after the memory access.

    In order to do this, LDM / STM support a further syntaxin addition to the stack one: STMIA / LDMIA : Increment After STMIB / LDMIB : Increment Before STMDA / LDMDA : Decrement After STMDB / LDMDB : Decrement Before

    ARM

  • 8/3/2019 New Armlect1

    66/80

    66 66

    ARM

    Example: Block Copy Copy a block of memory, which is an exact multiple

    of 12 words long from the location pointed to by r12to the location pointed to by r13. r14 points to the endof block to be copied.

    ; r12 points to the start of the sourcedata

    ; r14 points to the end of the source data

    ; r13 points to the start of thedestination dataloop L DMI Ar12!, {r0-r11} ; load 48 bytes

    STMI Ar13!, {r0-r11} ; and store them C MP r12, r14 ; check for the end

    r13

    r14

    r12

    IncreasingM emory

    ARM

  • 8/3/2019 New Armlect1

    67/80

    67 67

    ARM

    Quiz #5

    The contents of registers r0 to r6 need to be swappedaround thus: r0 moved into r3 r1 moved into r4 r2 moved into r6 r3 moved into r5 r4 moved into r0 r5 moved into r1 r6 moved into r2

    W rite a segment of code that uses full descendingstack operations to carry this out, and hence requiresno use of any other registers for temporary storage.

    ARM

  • 8/3/2019 New Armlect1

    68/80

    68 68

    ARM

    Quiz #5 - Sample SolutionSTMFD sp!,

    {r0-r6}LDMFD sp!,{r3,r4,r6}

    r3 = r0r4 = r1r6 = r2

    LDMFD sp!,{r5}

    r5 = r3

    LDMFD sp!,{r0-r2}

    r0 = r4r1 = r5r2 = r6

    Old SP

    r5r4r3r2r1

    SP

    r6

    r0

    r5r4

    SP

    r6

    r3

    r5SP

    r6

    r4

    SP

    ARM

  • 8/3/2019 New Armlect1

    69/80

    69 69

    ARM

    Atomic operation of a memory read followed by amemory write which moves byte or wordquantities between registers and memory.Syntax: S W P{}{B} Rd, Rm, [Rn]

    Swap and Swap Byte Instructions

    Rm Rd

    Rn

    32

    1temp

    Memory

    ARM

  • 8/3/2019 New Armlect1

    70/80

    70 70

    ARM

    Software Interrupt (SWI)

    In effect, a S W I is a user-defined instruction.It causes an exception trap to the S W I hardware vector (thus causing a change to supervisor mode, plus theassociated state saving), thus causing the S W Iexception handler to be called.The handler can then examine the comment field of theinstruction to decide what operation has beenrequested.B makin use of the S W I mechansim, an o eratin

    2831 2427 0

    Cond 1 1 1 1 Comment field (ignored b y Processor)

    Condition Field

    23

    ARM

  • 8/3/2019 New Armlect1

    71/80

    71 71

    ARM

    PSR Transfer Instructions

    MRS and MSR allow contents of CPSR/SPSR to betransferred from appropriate status register to a general

    purpose register. All of status register, or just the flags, can be transferred.

    Syntax: MRS{} R d , ; Rd = MSR{} ,Rm ; = Rm MSR{} ,Rm ; = Rm

    where = CPSR, CPSR_all, SPSR o r SPSR_all = CPSR_fl g o r SPSR_fl g

    Also an immediate form < > < >

    ARM

  • 8/3/2019 New Armlect1

    72/80

    72 72

    ARM

    Using MRS and MSR

    Currently reserved bits, may be used in future,therefore:

    they must be preserved when altering PSR the value they return must not be relied upon when testing

    other bits.

    Thus read-modify-write strategy must be followed whenmodifying any PSR:

    Transfer PSR to register using MRS

    Modify relevant bits

    M odeN Z C V

    2831 8 4 0

    I F T

    ARM

  • 8/3/2019 New Armlect1

    73/80

    73 73

    ARM

    Coprocessors

    The ARM architecture supports 16 coprocessorsE ach coprocessor instruction set occupies part of the

    ARM instruction set.There are three types of coprocessor instruction Coprocessor data processing Coprocessor (to/from ARM) register transfers Coprocessor memory transfers (load and store to/from

    memory) Assembler macros can be used to transform customcoprocessor mneumonics into the generic mneumonicsunderstood by the processor.

    A coprocessor may be implemented

    ARM

  • 8/3/2019 New Armlect1

    74/80

    74 74

    ARM

    Coprocessor D

    ataP

    rocessing

    This instruction initiates a coprocessor operationThe operation is performed only on internalcoprocessor state

    For example, a Floating point multiply, which multiplies thecontents of two registers and stores the result in a third register

    Syntax: CDP{}

    ,< o pc_1>, CRd , CRn , CRm,{< o pc_ 2 >}

    31 28 27 26 2 5 2 4 2 3 2 0 19 1 6 15 1 2 11 8 7 5 4 3 0

    Destination Register Source Registers

    Opcode

    Condition Code Specifier Opcode

    Cond 1 1 1 0 opc_1 CRn CRd cp_num opc_2 0 CRm

    ARM

  • 8/3/2019 New Armlect1

    75/80

    75 75

    Coprocessor Register Transfers

    These two instructions move data between ARMregisters and coprocessor registers MRC : Move to Register from Coprocessor MCR : Move to Coprocessor from Register

    An operation may also be performed on the data as itis transferred For example a Floating Point Convert to Integer instruction can

    be implemented as a register transfer to ARM that alsoconverts the data from floating point format to integer format.

    Syntax {}

    ,< o pc_1>,R d , CRn , CRm,< o pc_ 2 >

    31 28 27 26 2 5 2 4 2 3 22 2 1 2 0 19 1 6 15 1 2 11 8 7 5 4 3 0

    ARM Source/Dest Register Coprocesor Source/Dest Registers

    Opcode

    Condition Code Specifier Opcode

    Transfer To/From Coprocessor

    Cond 1 1 1 0 opc_1 L CRn Rd cp_num opc_2 1 CRm

  • 8/3/2019 New Armlect1

    76/80

    ARM

  • 8/3/2019 New Armlect1

    77/80

    77 77

    Coprocessor Memory Transfers (2)

    Syntax of these is similar to word transfers between ARM and memory:

    {}{} , CRd ,PC relative offset generated if possible, else causes an error.

    {}{}

    , CRd ,Pre-indexed form, with optional writeback of the base register

    {}{} , CRd ,Post-indexed form

    where when present causes a long transfer to be performed

    (N =1) else causes a short transfer to be performed ( N =0).

    ARM

  • 8/3/2019 New Armlect1

    78/80

    78 78

    Quiz #6

    W rite a short code segment that performs a modechange by modifying the contents of the CPSR The mode you should change to is user mode which has the

    value 0x10. This assumes that the current mode is a priveleged mode such

    as supervisor mode. This would happen for instance when the processor is reset -

    reset code would be run in supervisor mode which would thenneed to switch to user mode before calling the main routine inyour application.

    You will need to use MSR and MRS, plus 2 logical operations.M odeN Z C V2831 8 4 0

    I F T

    ARM

  • 8/3/2019 New Armlect1

    79/80

    79 79

    Quiz #6 - Sample Solution

    Set up useful constants:

    mmask EQU 0x1f ; mask to clear mode bits

    userm EQU 0x10 ; user mode value

    Start off here in supervisor mode.MRS r0, cpsr ; t ak e a c o py o f the CPSRBI C r0,r0,#mmask ; cl e ar the mode b it sORR r0,r0,#us e rm ; s e l e c t ne w m odeMSR cpsr, r0 ; wr ite back the modi f ied

    ; CPSR

    ARM

  • 8/3/2019 New Armlect1

    80/80

    80 80

    Main features of theARM Instruction Set

    All instructions are 32 bits long.Most instructions execute in a single cycle.E very instruction can be conditionally executed.

    A load/store architecture Data processing instructions act only on registers

    Three operand formatCombined ALU and shifter for high speed bit manipulation

    Specific memory access instructions with powerful auto-indexing addressing modes.

    32 bit and 8 bit data types and also 16 bit data types on ARM Architecture v4.

    Flexible multiple register load and store instructions