ARM ASSEMBLY TUTORIAL

download ARM ASSEMBLY TUTORIAL

of 85

Transcript of ARM ASSEMBLY TUTORIAL

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    1/85

    EE382N-4 Embedded Systems Architecture

    TheARMInstructionSetArchitecture

    MarkMcDermott

    WithhelpfromourgoodfriendsatARM

    Fall2008

    8/22/2008

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    2/85

    EE382N-4 Embedded Systems Architecture

    MainfeaturesoftheARMInstructionSet

    Allinstructions

    are

    32

    bits

    long.

    Mostinstructionsexecuteinasinglecycle.

    Mostinstructionscanbeconditionallyexecuted.

    Aload/storearchitectureDataprocessinginstructionsactonlyonregisters

    Threeoperandformat

    Combined

    ALU

    and

    shifter

    for

    high

    speed

    bit

    manipulationSpecificmemoryaccessinstructionswithpowerfulautoindexingaddressingmodes. 32bitand8bitdatatypes

    andalso16bitdatatypesonARMArchitecturev4.

    Flexiblemultiple

    register

    load

    and

    store

    instructions

    Instructionsetextensionviacoprocessors

    Verydense16bitcompressedinstructionset(Thumb)

    28/22/2008

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    3/85

    EE382N-4 Embedded Systems Architecture

    Coprocessors

    3

    Upto16 coprocessorscanbedefined

    ExpandstheARMinstructionset

    Eachcoprocessorcanhaveupto16privateregistersofanyreasonablesize

    Loadstore

    architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    4/85

    EE382N-4 Embedded Systems Architecture

    Thumb

    Thumbis

    a16

    bit

    instruction

    set

    OptimizedforcodedensityfromCcodeImprovedperformanceformnarrowmemorySubsetofthefunctionalityoftheARMinstructionset

    Corehastwoexecutionstates ARMandThumbSwitchbetweenthemusingBXinstruction

    Thumbhascharacteristicfeatures:Most

    Thumb

    instruction

    are

    executed

    unconditionally

    ManyThumbdataprocessinstructionusea2addressformatThumbinstructionformatsarelessregularthanARMinstructionformats,as

    aresultofthedenseencoding.

    4

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    5/85

    EE382N-4 Embedded Systems Architecture

    ProcessorModes

    TheARM

    has

    six

    operating

    modes:

    User(unprivilegedmodeunderwhichmosttasksrun)

    FIQ(enteredwhenahighpriority(fast)interruptisraised)IRQ(enteredwhenalowpriority(normal)interruptisraised)Supervisor(enteredonresetandwhenaSoftwareInterruptinstructionis

    executed)

    Abort(usedtohandlememoryaccessviolations)Undef(usedtohandleundefinedinstructions)

    ARMArchitectureVersion4addsaseventhmode:System(privilegedmodeusingthesameregistersasusermode)

    58/22/2008

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    6/85

    EE382N-4 Embedded Systems Architecture

    TheRegisters

    ARMhas

    37

    registers

    in

    total,

    all

    of

    which

    are

    32

    bits

    long.

    1dedicatedprogramcounter1dedicatedcurrentprogramstatusregister5dedicatedsavedprogramstatusregisters30generalpurposeregisters

    Howeverthesearearrangedintoseveralbanks,withthe

    accessiblebankbeinggovernedbytheprocessormode.Each

    modecan

    access

    aparticularsetofr0r12registersaparticularr13(thestackpointer)andr14(linkregister)r15(theprogramcounter)cpsr

    (the

    current

    program

    status

    register)

    Andprivilegedmodescanalsoaccessaparticularspsr(savedprogramstatusregister)

    68/22/2008

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    7/85

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    8/85

    EE382N-4 Embedded Systems Architecture

    RegisterOrganizationSummary

    88/22/2008

    Usermode

    r0-r7,r15,andcpsr

    r8

    r9

    r10

    r11

    r12

    r13 (sp)

    r14 (lr)

    spsr

    FIQ

    r8

    r9

    r10

    r11

    r12

    r13 (sp)

    r14 (lr)r15 (pc)

    cpsr

    r0

    r1

    r2

    r3

    r4

    r5

    r6

    r7

    User

    r13 (sp)

    r14 (lr)

    spsr

    IRQ

    Usermoder0-r12,

    r15,andcpsr

    r13 (sp)

    r14 (lr)

    spsr

    Undef

    Usermoder0-r12,

    r15,andcpsr

    r13 (sp)

    r14 (lr)

    spsr

    SVC

    Usermoder0-r12,

    r15,andcpsr

    r13 (sp)

    r14 (lr)

    spsr

    Abort

    Usermoder0-r12,

    r15,andcpsr

    Thumb stateLow register

    Thumb stateHigh register

    Note: System mode uses the User mode register set

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    9/85

    EE382N-4 Embedded Systems Architecture

    AccessingRegistersusingARMInstructions

    Nobreakdown

    of

    currently

    accessible

    registers.

    Allinstructionscanaccessr0r14directly.MostinstructionsalsoallowuseofthePC.

    SpecificinstructionstoallowaccesstoCPSRandSPSR.

    Note:When

    in

    aprivileged

    mode,

    it

    is

    also

    possible

    to

    load

    store

    the(bankedout)usermoderegisterstoorfrommemory.

    98/22/2008

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    10/85

    EE382N-4 Embedded Systems Architecture

    TheProgramStatusRegisters (CPSRandSPSRs)

    108/22/2008

    CopiesoftheALUstatusflags(latchediftheinstructionhasthe"S"bitset).

    N=NegativeresultfromALUflag.Z=ZeroresultfromALUflag.C=ALUoperationCarriedoutV=ALUoperationoVerflowed

    * InterruptDisablebits.I =

    1,

    disables

    the

    IRQ.

    F =1,disablestheFIQ.

    * TBit (Architecturev4Tonly)

    T=0,

    Processor

    in

    ARM

    stateT=1,ProcessorinThumbstate

    *

    Condition

    Code

    Flags

    ModeN Z C V

    2831 8 4 0

    I F T

    * ModeBitsM[4:0]definetheprocessormode.

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    11/85

    EE382N-4 Embedded Systems Architecture

    LogicalInstruction ArithmeticInstruction

    Flag

    Negative Nomeaning Bit31oftheresulthasbeenset(N=1) Indicatesanegativenumberin

    signedoperations

    Zero Resultisallzeroes Resultofoperationwaszero(Z=1)

    Carry AfterShiftoperation Resultwasgreaterthan32bits

    (C=1) 1

    was

    left

    in

    carry

    flag

    oVerflow Nomeaning Resultwasgreaterthan31bits(V=1) Indicatesapossiblecorruptionof

    thesignbitinsignednumbers

    ConditionFlags

    118/22/2008

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    12/85

    EE382N-4 Embedded Systems Architecture

    TheProgramCounter(R15)

    Whenthe

    processor

    is

    executing

    in

    ARM

    state:

    Allinstructionsare32bitsinlengthAllinstructionsmustbewordalignedThereforethePCvalueisstoredinbits[31:2]withbits[1:0]equaltozero(as

    instructioncannot

    be

    halfword

    or

    byte

    aligned).

    R14isusedasthesubroutinelinkregister(LR)andstoresthe

    returnaddresswhenBranchwithLinkoperationsareperformed,

    calculatedfrom

    the

    PC.

    Thustoreturnfromalinkedbranch:MOVr15,r14

    orMOVpc,lr

    128/22/2008

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    13/85

    EE382N-4 Embedded Systems Architecture

    ExceptionHandlingandtheVectorTable

    Whenan

    exception

    occurs,

    the

    core:

    CopiesCPSRintoSPSR_SetsappropriateCPSRbits

    IfcoreimplementsARMArchitecture4Tandis

    currentlyin

    Thumb

    state,

    then

    ARMstateisentered.

    Modefieldbits

    Interruptdisableflagsifappropriate.

    Mapsin

    appropriate

    banked

    registers

    StoresthereturnaddressinLR_SetsPCtovectoraddress

    Toreturn,exceptionhandlerneedsto:Restore

    CPSR

    from

    SPSR_

    RestorePCfromLR_

    138/22/2008

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    14/85

    EE382N-4 Embedded Systems Architecture

    TheOriginalInstructionPipeline

    TheARM

    uses

    apipeline

    in

    order

    to

    increase

    the

    speed

    of

    the

    flowofinstructionstotheprocessor.Allowsseveraloperationstobeundertakensimultaneously,ratherthan

    serially.

    Ratherthanpointingtotheinstructionbeingexecuted,thePC

    pointsto

    the

    instruction

    being

    fetched.

    148/22/2008

    FETCH

    DECODE

    EXECUTE

    Instruction fetched from memory

    Decoding of registers used in instruction

    Register(s) read from Register Bank

    Shift and ALU operationWrite register(s) back to Register Bank

    PC

    PC - 4

    PC - 8

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    15/85

    EE382N-4 Embedded Systems Architecture

    PipelinechangesforARM9TDMI

    InstructionFetch Shift + ALU MemoryAccess RegWriteRegRead

    RegDecode

    FETCH DECODE EXECUTE MEMORY WRITE

    ARM9TDMI

    ARM or Thumb

    Inst Decode

    Reg Select

    RegRead

    Shift ALURegWrite

    Thumb ARMdecompress

    ARM decodeInstructionFetch

    FETCH DECODE EXECUTE

    ARM7TDMI

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    16/85

    EE382N-4 Embedded Systems Architecture

    Pipelinechangesfor ARM10vs.ARM11Pipelines

    ARM11

    Fetch1

    Fetch2

    Decode Issue

    Shift ALU Saturate

    Writeback

    MAC1

    MAC2

    MAC3

    AddressDataCache

    1

    DataCache

    2

    Shift + ALUMemoryAccess Reg

    Write

    FETCH DECODE EXECUTE MEMORY WRITE

    Reg Read

    Multiply

    BranchPrediction

    InstructionFetch

    ISSUE

    ARM orThumb

    InstructionDecode Multiply

    Add

    ARM10

    EE N E b dd d S A hi

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    17/85

    EE382N-4 Embedded Systems Architecture

    ARMInstructionSetFormat

    178/22/2008

    3

    1

    3

    0

    2

    9

    2

    8

    2

    7

    2

    6

    2

    5

    2

    4

    2

    3

    2

    2

    2

    1

    2

    0

    1

    9

    1

    8

    1

    7

    1

    6

    1

    5

    1

    4

    1

    3

    1

    2

    1

    1

    1

    0 9 8 7 6 5 4 3 2 1 0InstructionType

    Condition 0 0 I OPCODE S Rn Rs OPERAND2 Dataprocessing

    Condition 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm Multiply

    Condition 0 0 0 0 1 U A S RdHIGH Rd LOW Rs 1 0 0 1 Rm Long

    Multiply

    Condition 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm Swap

    Condition 0 1 I P U B W L Rn Rd OFFSET Load/Store Byte/Word

    Condition 1 0 0 P U B W L Rn REGISTERLIST Load/Store Multiple

    Condition 0 0 0 P U 1 W L Rn Rd OFFSET1 1 S H 1 OFFSET2 Halfword TransferImm Off

    Condition 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 S H 1 Rm Halfword TransferReg Off

    Condition 1 0 1 L BRANCH OFFSET Branch

    Condition 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn Branch Exchange

    Condition 1 1 0 P U N W L Rn CRd CPNum OFFSET COPROCESSOR DATAXFER

    Condition 1 1 1 0 Op1 CRn CRd CPNum OP2 0 CRm COPROCESSOR DATAOP

    Condition OP1 L CRn Rd CPNum OP2 1 CRm COPROCESSOR REGXFER

    Condition 1 1 1 1 SWI NUMBER SoftwareInterrupt

    EE382N 4 E b dd d S A hi

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    18/85

    EE382N-4 Embedded Systems Architecture

    ConditionalExecution

    Mostinstruction

    sets

    only

    allow

    branches

    to

    be

    executed

    conditionally.

    Howeverbyreusingtheconditionevaluationhardware, ARM

    effectivelyincreases

    number

    of

    instructions.

    AllinstructionscontainaconditionfieldwhichdetermineswhethertheCPUwillexecutethem.

    Nonexecutedinstructionsconsume1cycle. Cant

    collapse

    the

    instruction

    like

    aNOP.

    Still

    have

    to

    complete

    cycle

    so

    as

    to

    allow

    fetchinganddecodingofthefollowinginstructions.

    Thisremovestheneedformanybranches,whichstallthe

    pipeline(3

    cycles

    to

    refill).Allowsverydenseinlinecode,withoutbranches.

    TheTimepenaltyofnotexecutingseveralconditionalinstructionsisfrequentlylessthanoverheadofthebranch

    orsubroutine

    call

    that

    would

    otherwise

    be

    needed.

    188/22/2008

    EE382N 4 E b dd d S t A hit t

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    19/85

    EE382N-4 Embedded Systems Architecture

    TheConditionField

    198/22/2008

    1001 = LS - C c lear o r Z ( se t uns igned l ow e ro r same)

    1010 = GE - N se t and V set , o r N c l ea r and Vcl ea r ( > o r = )

    1011 = LT - N se t and V cl ear , o r N cl ear andV se t ( > )

    1100 = GT - Z c lear , and e i t he r N se t and Vset , o r N c lea r and V se t ( > )

    1 1 0 1 = LE - Z se t , o r N se t a n d V cl ea r ,o r Nclear and V set ( < , o r = )

    1 11 0 = AL - a lw ay s

    1 1 1 1 = N V - r ese rv ed .

    0 0 0 0 = EQ - Z se t ( e q u al )

    0 0 0 1 = N E - Z cl ea r ( n o t e q u al )

    0 0 1 0 = H S / CS - C s et ( u n s ig n ed h i gh e r orsame)

    0 0 1 1 = LO / CC - C cl ea r ( u n s i g n ed l o w e r )

    0 1 0 0 = M I - N se t ( n e g a t i v e)

    0 1 0 1 = PL - N cl ea r ( p o s it i v e o r ze r o )

    0 1 1 0 = V S - V set ( o v er f lo w )

    0 1 1 1 = VC - V cl ea r ( n o o v er f l o w )

    1 0 0 0 = H I - C se t a n d Z cl ea r ( u n s i g n ed

    h ighe r )

    3

    1

    3

    0

    2

    9

    2

    8

    2

    7

    2

    6

    2

    5

    2

    4

    2

    3

    2

    2

    2

    1

    2

    0

    1

    9

    1

    8

    1

    7

    1

    6

    1

    5

    1

    4

    1

    3

    1

    2

    1

    1

    1

    0 9 8 7 6 5 4 3 2 1 0InstructionType

    Condition 0 0 I OPCODE S Rn Rs OPERAND2 Dataprocessing

    EE382N 4 E b dd d S t A hit t

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    20/85

    EE382N-4 Embedded Systems Architecture

    UsingandupdatingtheConditionField

    Toexecute

    an

    instruction

    conditionally,

    simply

    postfix

    it

    with

    the

    appropriate

    condition: Forexampleanaddinstructiontakestheform:

    ADDr0,r1,r2 ;r0=r1+r2(ADDAL)

    Toexecute

    this

    only

    if

    the

    zero

    flag

    is

    set:

    ADDEQr0,r1,r2 ;Ifzeroflagsetthen

    ;...r0=r1+r2

    Bydefault,

    data

    processing

    operations

    do

    not

    affect

    the

    condition

    flags

    (apart

    fromthecomparisonswherethisistheonlyeffect).Tocausethecondition

    flagstobeupdated,theSbitoftheinstructionneedstobesetbypostfixing

    theinstruction(andanyconditioncode)withanS.

    Forexample

    to

    add

    two

    numbers

    and

    set

    the

    condition

    flags:

    ADDSr0,r1,r2 ;r0=r1+r2 ;...

    andsetflags

    208/22/2008

    EE382N 4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    21/85

    EE382N-4 Embedded Systems Architecture

    ConditionalExecutionandFlags

    ARMinstructions

    can

    be

    made

    to

    execute

    conditionally

    by

    postfixing

    them

    with

    the

    appropriateconditioncodefield.

    Thisimprovescodedensityandperformancebyreducingthenumberofforwardbranchinstructions.

    CMP r3,#0 CMP r3,#0BEQ skip ADDNE r0,r1,r2

    ADD r0,r1,r2

    skip

    Bydefault,dataprocessinginstructionsdonotaffecttheconditioncodeflagsbuttheflagscanbeoptionallysetbyusingS. CMPdoesnotneedS.

    loop

    SUBS r1,r1,#1BNE loop

    218/22/2008

    if Z flag clear then branch

    decrement r1 and set flags

    EE382N 4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    22/85

    EE382N-4 Embedded Systems Architecture

    Branchinstructions(1)

    228/22/2008

    Branch: B{}label

    BranchwithLink: BL{}sub_routine_label

    Theoffsetforbranchinstructionsiscalculatedbytheassembler: Bytakingthedifferencebetweenthebranchinstructionandthetargetaddress

    minus8(toallowforthepipeline).

    Thisgivesa26bitoffsetwhichisrightshifted2bits(asthebottomtwobitsarealwayszeroasinstructionsareword aligned)andstoredintotheinstructionencoding.

    Thisgivesarangeof 32Mbytes.

    Conditionfield

    Linkbit 0=Branch1=Branchwithlink

    3

    1

    3

    0

    2

    9

    2

    8

    2

    7

    2

    6

    2

    5

    2

    4

    2

    3

    2

    2

    2

    1

    2

    0

    1

    9

    1

    8

    1

    7

    1

    6

    1

    5

    1

    4

    1

    3

    1

    2

    1

    1

    1

    0 9 8 7 6 5 4 3 2 1 0

    Condition 1 0 1 L BRANCH OFFSET

    EE382N 4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    23/85

    EE382N-4 Embedded Systems Architecture

    Branchinstructions(2)

    Whenexecuting

    the

    instruction,

    the

    processor:

    shiftstheoffsetlefttwobits,signextendsitto32bits,andaddsittoPC.

    ExecutionthencontinuesfromthenewPC,oncethepipelinehas

    beenrefilled.

    The"Branchwithlink"instructionimplementsasubroutinecall

    bywritingPC4intotheLRofthecurrentbank.

    i.e.

    the

    address

    of

    the

    next

    instruction

    following

    the

    branch

    with

    link

    (allowingforthepipeline).

    Toreturnfromsubroutine,simplyneedtorestorethePCfrom

    theLR:MOV

    pc,

    lr

    Again,pipelinehastorefillbeforeexecutioncontinues.

    238/22/2008

    EE382N 4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    24/85

    EE382N-4 Embedded Systems Architecture

    Branchinstructions(3)

    The"Branch"

    instruction

    does

    not

    affect

    LR.

    Note:Architecture4ToffersafurtherARMbranchinstruction,BXSeeThumbInstructionSetModulefordetails.

    BL

    StoresreturnaddressinLRReturningimplementedbyrestoringthePCfromLRFornonleaffunctions,LRwillhavetobestacked

    248/22/2008

    STMFDsp!,{regs,lr}:BLfunc2

    :LDMFDsp!,{regs,pc}

    func1 func2

    ::BLfunc1

    ::

    :::

    ::MOVpc,lr

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    25/85

    EE382N-4 Embedded Systems Architecture

    ConditionalBranches

    258/22/2008

    Branch Interpretation NormalusesB

    BAL

    Unconditional

    Always

    Alwaystakethisbranch

    Alwaystakethisbranch

    BEQ Equal Comparisonequalorzeroresult

    BNE Notequal Comparisonnotequalornonzeroresult

    BPL

    Plus

    Result

    positive

    or

    zero

    BMI Minus Resultminusornegative

    BCC

    BLO

    Carryclear

    Lower

    Arithmeticoperationdidnotgivecarryout

    Unsignedcomparisongavelower

    BCS

    BHS

    Carryset

    Higher

    or

    same

    Arithmeticoperationgavecarryout

    Unsigned

    comparison

    gave

    higher

    or

    same

    BVC Overflowclear Signedintegeroperation;nooverflowoccurred

    BVS Overflowset Signedintegeroperation;overflowoccurred

    BGT Greaterthan Signedintegercomparisongavegreaterthan

    BGE Greaterorequal Signedintegercomparisongavegreaterorequal

    BLT

    Less

    than

    Signed

    integer

    comparison

    gave

    less

    than

    BLE Lessorequal Signedintegercomparisongavelessthanorequal

    BHI Higher Unsignedcomparisongavehigher

    BLS Lowerorsame Unsignedcomparisongavelowerorsame

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    26/85

    EE382N-4 Embedded Systems Architecture

    DataprocessingInstructions

    Largestfamily

    of

    ARM

    instructions,

    all

    sharing

    the

    same

    instructionformat.

    Contains:

    Arithmeticoperations

    Comparisons(noresults justsetconditioncodes)LogicaloperationsDatamovementbetweenregisters

    Remember,this

    is

    aload

    /store

    architecture

    Theseinstructiononlyworkonregisters, NOT memory.

    Theyeachperformaspecificoperationononeortwooperands.

    Firstoperand

    always

    aregister

    Rn

    SecondoperandsenttotheALUviabarrelshifter.

    Wewillexaminethebarrelshiftershortly.

    268/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    27/85

    EE382N-4 Embedded Systems Architecture

    ArithmeticOperations

    Operationsare:

    ADD operand1+operand2 ;AddADC operand1+operand2+carry ;AddwithcarrySUB operand1 operand2 ;SubtractSBC operand1

    operand2

    +carry

    1

    ;Subtract

    with

    carry

    RSB operand2 operand1 ;ReversesubtractRSC operand2 operand1+carry 1 ;Reversesubtractwithcarry

    Syntax:{}{S}Rd,Rn,Operand2

    ExamplesADDr0,r1,r2SUBGT

    r3,

    r3,

    #1

    RSBLESr4,r5,#5

    278/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    28/85

    EE382N 4 Embedded Systems Architecture

    Comparisons

    Theonly

    effect

    of

    the

    comparisons

    is

    to

    update

    the

    condition

    flags.ThusnoneedtosetSbit.

    Operationsare:

    CMP operand1

    operand2 ;Compare

    CMN operand1+operand2 ;ComparenegativeTST operand1ANDoperand2 ;TestTEQ operand1EORoperand2 ;Testequivalence

    Syntax:{}Rn,Operand2

    Examples:

    CMP r0,r1

    TSTEQ r2,#5

    288/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    29/85

    EE382N 4 Embedded Systems Architecture

    LogicalOperations

    Operationsare:

    AND operand1ANDoperand2

    EOR operand1EORoperand2

    ORR operand1ORoperand2

    ORN

    operand1NOR

    operand2

    BIC operand1ANDNOToperand2[iebitclear]

    Syntax:

    {}{S}Rd,

    Rn,

    Operand2

    Examples:AND r0,r1,r2

    BICEQ r2,r3,#7

    EORS r1,r3,r0

    298/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    30/85

    EE382N 4 Embedded Systems Architecture

    DataMovement

    Operationsare:

    MOV operand2

    MVN NOToperand2

    Notethat

    these

    make

    no

    use

    of

    operand1.

    Syntax:{}{S}Rd,Operand2

    Examples:MOV r0,r1MOVS r2,#10

    MVNEQ r1,#0

    308/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    31/85

    EE382N 4 Embedded Systems Architecture

    TheBarrelShifter

    TheARM

    doesnt

    have

    actual

    shift

    instructions.

    Insteadithasabarrelshifterwhichprovidesamechanismto

    carryout

    shifts

    as

    part

    of

    other

    instructions.

    Sowhatoperationsdoesthebarrelshiftersupport?

    318/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    32/85

    38 bedded Syste s c tectu e

    BarrelShifter LeftShift

    Shiftsleft

    by

    the

    specified

    amount

    (multiplies

    by

    powers

    of

    two)

    e.g.LSL#5 =>multiplyby32

    328/22/2008

    LogicalShift

    Left

    (LSL)

    DestinationCF 0

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    33/85

    y

    BarrelShifter RightShifts

    338/22/2008

    LogicalShiftRight(LSR)

    Shiftsrightbythespecified

    amount(dividesbypowersof

    two)e.g.

    LSR#5=divideby32

    ArithmeticShiftRight(ASR)

    Shiftsright(dividesbypowersof

    two)andpreservesthesignbit,

    for2'scomplementoperations.e.g.

    ASR#5=divideby32

    Destination CF

    Destination CF

    LogicalShiftRight

    ArithmeticShiftRight

    ...0

    Signbitshiftedin

    zeroshiftedin

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    34/85

    y

    BarrelShifter Rotations

    348/22/2008

    RotateRight(ROR)

    SimilartoanASRbutthebitswraparoundastheyleavethe

    LSB

    and

    appear

    as

    the

    MSB.e.g. ROR#5

    NotethelastbitrotatedisalsousedastheCarryOut.

    RotateRightExtended(RRX)

    ThisoperationusestheCPSRCflagasa33rdbit.

    Rotatesrightby1bit.Encodedas ROR#0

    Destination CF

    RotateRight

    Destination CF

    RotateRightthroughCarry

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    35/85

    y

    UsingtheBarrelShifter:TheSecondOperand

    358/22/2008

    Register,optionallywithshift

    operationapplied.

    Shiftvaluecanbeeitherbe:

    5bitunsignedinteger

    Specifiedinbottombyteofanotherregister.

    * Immediatevalue

    8bitnumber

    Canberotatedrightthroughanevennumber

    ofpositions.

    Assemblerwillcalculaterotateforyoufromconstant.

    Operand1

    Result

    ALU

    BarrelShifter

    Operand2

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    36/85

    y

    SecondOperand:ShiftedRegister

    Theamount

    by

    which

    the

    register

    is

    to

    be

    shifted

    is

    contained

    in

    either:theimmediate5bitfieldintheinstruction

    NOOVERHEAD

    Shiftis

    done

    for

    free

    executes

    in

    single

    cycle.

    thebottombyteofaregister(notPC) Thentakesextracycletoexecute

    ARMdoesnthaveenoughreadportstoread3registersatonce.

    Thensame

    as

    on

    other

    processors

    where

    shift

    is

    separateinstruction.

    Ifnoshiftisspecifiedthenadefaultshiftisapplied:LSL#0i.e.barrelshifterhasnoeffectonvalueinregister.

    368/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    37/85

    SecondOperand:UsingaShiftedRegister

    Usingamultiplication

    instruction

    to

    multiply

    by

    aconstant

    means

    first

    loading

    theconstantintoaregisterandthenwaitinganumberofinternalcyclesfor

    theinstructiontocomplete.

    Amoreoptimumsolutioncanoftenbefoundbyusingsomecombinationof

    MOVs,ADDs,

    SUBs

    and

    RSBs

    with

    shifts.

    Multiplicationsbyaconstantequaltoa((powerof2) 1)canbedoneinonecycle.

    MOVR2,R0,LSL#2 ;ShiftR0leftby2,writetoR2,(R2=R0x4)

    ADDR9,

    R5,

    R5,

    LSL

    #3

    ;R9

    =R5

    +R5

    x8or

    R9

    =R5

    x9

    RSBR9,R5,R5,LSL#3 ;R9=R5x8 R5orR9=R5x7

    SUBR10,R9,R8,LSR#4;R10=R9 R8/16

    MOVR12,R4,RORR3 ;R12=R4rotatedrightbyvalueofR3

    378/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    38/85

    SecondOperand:ImmediateValue(1)

    Thereis

    no

    single

    instruction

    which

    will

    load

    a32

    bit

    immediate

    constant

    into

    aregisterwithoutperformingadataloadfrommemory. AllARMinstructionsare32bitslong ARMinstructionsdonotusetheinstructionstreamasdata.

    Thedata

    processing

    instruction

    format

    has

    12

    bits

    available

    for

    operand2

    Ifuseddirectlythiswouldonlygivearangeof4096.

    Insteaditisusedtostore8bitconstants,givingarangeof0 255.

    These

    8

    bits

    can

    then

    be

    rotated

    right

    through

    an

    even

    number

    of

    positions

    (ie

    RORsby0,2,4,..30). Thisgivesamuchlargerrangeofconstantsthatcanbedirectlyloaded,thoughsome

    constantswillstillneedtobeloadedfrommemory.

    388/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    39/85

    SecondOperand: ImmediateValue(2)

    Thisgives

    us:

    0 255 [0 0xff] 256,260,264,..,1020 [0x1000x3fc,step4,0x400xffror 30] 1024,1040,1056,..,4080 [0x4000xff0,step16,0x400xffror 28]

    4096,4160,

    4224,..,16320 [0x1000

    0x3fc0,

    step

    64,

    0x40

    0xff

    ror 26] Thesecanbeloadedusing,forexample:

    MOVr0,#0x40,26 ;=>MOVr0,#0x1000 (ie4096)

    Tomakethiseasier,theassemblerwillconverttothisformforusifsimply

    giventhe

    required

    constant:

    MOVr0,#4096 ;=>MOVr0,#0x1000 (ie0x40ror 26)

    ThebitwisecomplementscanalsobeformedusingMVN: MOVr0,#0xFFFFFFFF ;assemblestoMVNr0,#0

    Iftherequiredconstantcannotbegenerated,anerrorwillbereported.

    398/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    40/85

    Loadingfull32bitconstants

    Althoughthe

    MOV/MVN

    mechanism

    will

    load

    alarge

    range

    of

    constants

    into

    a

    register,sometimesthismechanismwillnotgeneratetherequiredconstant.

    Therefore,theassembleralsoprovidesamethodwhichwillloadANY32bit

    constant: LDR

    rd,=numeric

    constant

    IftheconstantcanbeconstructedusingeitheraMOVorMVNthenthiswillbe

    theinstructionactuallygenerated.

    Otherwise,the

    assembler

    will

    produce

    an

    LDR

    instruction

    with

    aPC

    relative

    addresstoreadtheconstantfromaliteralpool.LDRr0,=0x42 ;generates MOVr0,#0x42

    LDRr0,=0x55555555 ;generate LDRr0,[pc,offsettolitpool]

    :

    :

    DCD 0x55555555

    Asthismechanismwillalwaysgeneratethebestinstructionforagivencase,it

    is

    the

    recommended

    way

    of

    loading

    constants.

    408/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    41/85

    MultiplicationInstructions

    TheBasic

    ARM

    provides

    two

    multiplication

    instructions.

    MultiplyMUL{}{S}Rd,Rm,Rs ;Rd=Rm*Rs

    MultiplyAccumulate

    does

    addition

    for

    free

    MLA{}{S}Rd,Rm,Rs,Rn ;Rd=(Rm*Rs)+Rn

    Restrictionsonuse:

    Rd

    and

    Rm

    cannot

    be

    the

    same

    register CanbeavoidedbyswappingRmandRsaround.Thisworksbecausemultiplicationiscommutative.

    CannotusePC.

    Thesewill

    be

    picked

    up

    by

    the

    assembler

    if

    overlooked.

    OperandscanbeconsideredsignedorunsignedUptousertointerpretcorrectly.

    418/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    42/85

    MultiplicationImplementation

    TheARM

    makes

    use

    of

    Booths

    Algorithm

    to

    perform

    integer

    multiplication.

    OnnonMARMsthisoperateson2bitsofRsatatime.

    Foreach

    pair

    of

    bits

    this

    takes

    1cycle

    (plus

    1cycle

    to

    start

    with).

    Howeverwhentherearenomore1sleftinRs,themultiplicationwillearlyterminate.

    Example:Multiply18and 1:Rd=Rm*Rs

    Note:Compilerdoesnotuseearlyterminationcriteriato

    decideon

    which

    order

    to

    place

    operands.

    428/22/2008

    0000 00100001000000000000 0000 0000

    1111 1

    1111

    1111

    1111

    1111

    111 1

    111 1

    111

    Rm

    Rs

    17cycles

    Rs

    Rm

    4cycles

    18

    1

    18

    1

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    43/85

    ExtendedMultiplyInstructions

    Mvariants

    of

    ARM

    cores

    contain

    extended

    multiplication

    hardware.Thisprovidesthreeenhancements:An8bitBoothsAlgorithmisused

    Multiplicationiscarriedoutfaster(maximumforstandardinstructionsisnow5

    cycles).

    Earlyterminationmethodimprovedsothatnowcompletesmultiplicationwhenallremainingbitsetscontain allzeroes(aswithnonMARMs),or

    allones.

    Thusthepreviousexamplewouldearlyterminatein2cyclesinbothcases.

    64bitresultscannowbeproducedfromtwo32bitoperands

    Higheraccuracy.

    Pairofregistersusedtostoreresult.

    438/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    44/85

    MultiplyLong&MultiplyAccumulateLong

    Instructionsare

    MULLwhichgivesRdHi,RdLo:=Rm*RsMLALwhichgivesRdHi,RdLo:=(Rm*Rs)+RdHi,RdLo

    However

    the

    full

    64

    bit

    of

    the

    result

    now

    matter

    (lower

    precision

    multiplyinstructionssimplythrowstop32bitsaway)Needtospecifywhetheroperandsaresignedorunsigned

    Thereforesyntaxofnewinstructionsare:UMULL{}{S}

    RdLo,RdHi,Rm,Rs

    UMLAL{}{S}RdLo,RdHi,Rm,RsSMULL{}{S}RdLo,RdHi,Rm,RsSMLAL{}{S}RdLo,RdHi,Rm,Rs

    Notgeneratedbythecompiler.

    Warning:UnpredictableonnonMARMs.

    448/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    45/85

    Load/StoreInstructions

    TheARM

    is

    aLoad

    /Store

    Architecture:

    Doesnotsupportmemorytomemorydataprocessingoperations.Mustmovedatavaluesintoregistersbeforeusingthem.

    This

    might

    sound

    inefficient,

    but

    in

    practice

    it

    isnt:Loaddatavaluesfrommemoryintoregisters.Processdatainregistersusinganumberofdataprocessinginstructions

    whicharenotsloweddownbymemoryaccess.

    Storeresults

    from

    registers

    out

    to

    memory.

    TheARMhasthreesetsofinstructionswhichinteractwithmain

    memory.Theseare:Singleregisterdatatransfer(LDR/STR).Blockdatatransfer(LDM/STM).SingleDataSwap(SWP).

    458/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    46/85

    Singleregisterdatatransfer

    Thebasic

    load

    and

    store

    instructions

    are:

    LoadandStoreWordorByte LDR/STR/LDRB/STRB

    ARM

    Architecture

    Version

    4

    also

    adds

    support

    for

    Halfwords

    and

    signeddata.LoadandStoreHalfword

    LDRH/STRH

    LoadSigned

    Byte

    or

    Halfword

    load

    value

    and

    sign

    extend

    it

    to

    32

    bits. LDRSB/LDRSH

    Alloftheseinstructionscanbeconditionallyexecutedby

    insertingtheappropriateconditioncodeafterSTR/LDR.e.g.

    LDREQB

    Syntax:{}{}Rd,

    468/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    47/85

    LoadandStoreWordorByte: BaseRegister

    Thememory

    location

    to

    be

    accessed

    is

    held

    in

    abase

    register

    STRr0,[r1] ;Storecontentsofr0tolocationpointedto

    ;

    by

    contents

    of

    r1.LDRr2,[r1] ;Loadr2withcontentsofmemorylocation

    ;pointedtobycontentsofr1.

    478/22/2008

    r1

    0x200Base

    Register

    Memory

    0x50x200

    r0

    0x5Source

    RegisterforSTR

    r2

    0x5Destination

    RegisterforLDR

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    48/85

    Load/StoreWordorByte:OffsetsfromtheBaseRegister

    Aswell

    as

    accessing

    the

    actual

    location

    contained

    in

    the

    base

    register,theseinstructionscanaccessalocationoffsetfromthe

    baseregisterpointer.

    Thisoffset

    can

    be

    Anunsigned12bitimmediatevalue(ie0 4095bytes).Aregister,optionallyshiftedbyanimmediatevalue

    Thiscanbeeitheraddedorsubtractedfromthebaseregister:Prefixtheoffsetvalueorregisterwith+(default)or.

    Thisoffsetcanbeapplied:beforethetransferismade:Preindexedaddressing

    optionallyauto

    incrementing

    the

    base

    register,

    by

    postfixing

    the

    instruction

    with

    an!.

    afterthetransferismade:Postindexedaddressing causingthebaseregistertobeautoincremented.

    488/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    49/85

    Load/StoreWordorByte:PreindexedAddressing

    Example:STR

    r0,

    [r1,#12]

    Tostoretolocation0x1f4insteaduse:STRr0,[r1,#12]Toautoincrementbasepointerto0x20cuse:STRr0,[r1,#12]!Ifr2contains3,access0x20cbymultiplyingthisby4:

    STRr0,[r1,r2,LSL#2]

    498/22/2008

    r1

    0x200

    Base

    Register

    Memory

    0x5

    0x200

    r0

    0x5Source

    RegisterforSTR

    Offset

    12 0x20c

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    50/85

    LoadandStoreWordorByte:PostindexedAddressing

    Example:STR

    r0,

    [r1],

    #12

    Toautoincrementthebaseregistertolocation0x1f4insteaduse: STRr0,[r1],#12

    Ifr2contains3,autoincrementbaseregisterto0x20cbymultiplyingthisby4: STRr0,[r1],r2,LSL#2

    508/22/2008

    r1

    0x200Original

    BaseRegister

    Memory

    0x50x200

    r0

    0x5Source

    Registerfor STR

    Offset12 0x20c

    r10x20c

    UpdatedBase

    Register

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    51/85

    LoadandStoreswithUserModePrivilege

    Whenusing

    post

    indexed

    addressing,

    there

    is

    afurther

    form

    of

    Load/StoreWord/Byte:{}{B}TRd,

    Whenusedinaprivilegedmode,thisdoestheload/storewith

    usermodeprivilege.

    Normallyusedbyanexceptionhandlerthatisemulatingamemoryaccess

    instructionthatwouldnormallyexecuteinusermode.

    518/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    52/85

    ExampleUsageofAddressingModes

    Imaginean

    array,

    the

    first

    element

    of

    which

    is

    pointed

    to

    by

    the

    contents

    of

    r0.

    Ifwewanttoaccessaparticularelement,

    thenwecanusepreindexedaddressing: r1iselementwewant.

    LDRr2,

    [r0,

    r1,

    LSL

    #2]

    Ifwewanttostepthroughevery

    element

    of

    the

    array,

    for

    instancetoproducesumofelementsinthe

    array,thenwecanusepostindexedaddressingwithinaloop: r1isaddressofcurrentelement(initiallyequaltor0). LDRr2,[r1],#4

    Useafurtherregistertostoretheaddressoffinalelement,

    sothattheloopcanbecorrectlyterminated.

    528/22/2008

    0

    1

    2

    3

    element

    0

    4

    8

    12

    MemoryOffset

    r0

    Pointer tostart of array

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    53/85

    OffsetsforHalfwordandSignedHalfword/ByteAccess

    TheLoad

    and

    Store

    Halfword

    and

    Load

    Signed

    Byte

    or

    Halfword

    instructionscanmakeuseofpre andpostindexedaddressingin

    muchthesamewayasthebasicloadandstoreinstructions.

    Howeverthe

    actual

    offset

    formats

    are

    more

    constrained:

    Theimmediatevalueislimitedto8bits(ratherthan12bits)givinganoffsetof0255bytes.

    Theregisterformcannothaveashiftappliedtoit.

    538/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    54/85

    Effectofendianess

    TheARM

    can

    be

    set

    up

    to

    access

    its

    data

    in

    either

    little

    or

    big

    endianformat.

    Littleendian:

    Leastsignificant

    byte

    of

    aword

    is

    stored

    in

    bits

    07of

    an

    addressed

    word.

    Bigendian:Leastsignificantbyteofawordisstoredinbits2431ofanaddressedword.

    Thishas

    no

    real

    relevance

    unless

    data

    is

    stored

    as

    words

    and

    then

    accessedinsmallersizedquantities(halfwords orbytes).Whichbyte/halfwordisaccessedwilldependontheendianess ofthe

    systeminvolved.

    548/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    55/85

    YAEndianess Example

    558/22/2008

    Big-endianLittle-endian

    r1 = 0x100

    r0 = 0x1122334431 24 23 16 15 8 7 0

    11 22 33 44

    31 24 23 16 15 8 7 0

    11 22 33 44

    31 24 23 16 15 8 7 0

    44 33 22 11

    31 24 23 16 15 8 7 0

    00 00 00 44

    31 24 23 16 15 8 7 0

    00 00 00 11

    r2 = 0x44 r2 = 0x11

    STR r0, [r1]

    LDRB r2, [r1]

    r1 = 0x100Memory

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    56/85

    BlockDataTransfer(1)

    TheLoad

    and

    Store

    Multiple

    instructions

    (LDM

    /STM)

    allow

    betweeen1and16registerstobetransferredtoorfrom

    memory.

    Thetransferred

    registers

    can

    be

    either:

    Anysubsetofthecurrentbankofregisters(default).Anysubsetoftheusermodebankofregisterswheninapriviledgedmode

    (postfixinstructionwitha^).

    568/22/2008

    Cond 1 0 0 P U S W L Rn Register list

    Condition field Base registerLoad/Store bit0 = Store to memory1 = Load from memory

    Write- back bit0 = no write-back1 = write address into base

    PSR and force user bit0 = dont load PSR or force user mode1 = load PSR or force user mode

    Up/Down bit0 = Down; subtract offset from base1 = Up ; add offset to base

    Pre/Post indexing bit0 = Post; add offset after transfer,1 = Pre ; add offset before transfer

    2831 22 16 023 21 1527 20 1924

    Each bit corresponds to a particularregister. For example: Bit 0 set causes r0 to be transferred. Bit 0 unset causes r0 not to be transferred.

    At least one register must betransferred as the list cannot be empty.

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    57/85

    BlockDataTransfer(2)

    Baseregister

    used

    to

    determine

    where

    memory

    access

    should

    occur.4differentaddressingmodesallowincrementanddecrementinclusiveor

    exclusiveofthebaseregisterlocation.

    Baseregister

    can

    be

    optionally

    updated

    following

    the

    transfer

    (by

    appending

    itwithan!.

    Lowestregisternumberisalwaystransferredto/fromlowestmemorylocationaccessed.

    TheseinstructionsareveryefficientforSavingandrestoringcontext

    Forthisusefultoviewmemoryasastack.

    Movinglarge

    blocks

    of

    data

    around

    memory

    Forthisusefultodirectlyrepresentfunctionalityoftheinstructions.

    578/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    58/85

    Stacks

    Astack

    is

    an

    area

    of

    memory

    which

    grows

    as

    new

    data

    is

    pushedontothetopofit,andshrinksasdataispoppedoff

    thetop.

    Twopointers

    define

    the

    current

    limits

    of

    the

    stack.

    Abasepointer usedtopointtothebottomofthestack(thefirstlocation).

    Astackpointer

    usedto

    point

    the

    current

    top

    of

    the

    stack.

    588/22/2008

    SPBASE

    PUSH{1,2,3}

    1

    23

    BASE

    SP

    POP

    1

    2Result ofpop = 3

    BASE

    SP

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    59/85

    StackOperation

    Traditionally,astack

    grows

    down

    in

    memory,

    with

    the

    last

    pushed

    value

    at

    thelowestaddress.TheARMalsosupportsascendingstacks,wherethestack

    structuregrowsupthroughmemory.

    Thevalueofthestackpointercaneither:

    Pointto

    the

    last

    occupied

    address

    (Full

    stack)

    andsoneedspredecrementing(iebeforethepush)

    Pointtothenextoccupiedaddress(Emptystack) andsoneedspostdecrementing(ieafterthepush)

    Thestack

    type

    to

    be

    used

    is

    given

    by

    the

    postfix

    to

    the

    instruction:

    STMFD/LDMFD:FullDescendingstack STMFA/LDMFA:FullAscendingstack. STMED/LDMED:EmptyDescendingstack

    STMEA

    /

    LDMEA

    :

    Empty

    Ascending

    stack Note:ARMCompilerwillalwaysuseaFulldescendingstack.

    598/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    60/85

    StackExamples

    608/22/2008

    STMFD sp!,{r0,r1,r3-r5}

    r5

    r4

    r3

    r1r0SP

    Old SP

    STMED sp!,{r0,r1,r3-r5}

    r5r4

    r3

    r1

    r0SP

    Old SP

    r5

    r4

    r3

    r1

    r0

    STMFA sp!,{r0,r1,r3-r5}

    SP

    Old SP 0x400

    0x418

    0x3e8

    STMEA sp!,{r0,r1,r3-r5}

    r5

    r4

    r3

    r1

    r0

    SP

    Old SP

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    61/85

    StacksandSubroutines

    Oneuse

    of

    stacks

    is

    to

    create

    temporary

    register

    workspace

    for

    subroutines.

    Anyregistersthatareneededcanbepushedontothestackatthestartofthe

    subroutineandpoppedoffagainattheendsoastorestorethembefore

    returntothecaller:

    STMFD sp!,{r0-r12, lr} ; stack all registers

    ........ ; and the return address

    ........

    LDMFD sp!,{r0-r12, pc} ; load all the registers

    ; and return automatically

    SeethechapterontheARMProcedureCallStandardintheSDTReference

    Manualforfurtherdetailsofregisterusagewithinsubroutines.

    IfthepopinstructionalsohadtheSbitset(using^)thenthetransferofthe

    PCwhen

    in

    aprivileged

    mode

    would

    also

    cause

    the

    SPSR

    to

    be

    copied

    into

    the

    CPSR(seeexceptionhandlingmodule).

    618/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    62/85

    DirectfunctionalityofBlockDataTransfer

    WhenLDM

    /STM

    are

    not

    being

    used

    to

    implement

    stacks,

    it

    is

    clearertospecifyexactlywhatfunctionalityoftheinstructionis:i.e.specifywhethertoincrement/decrementthebasepointer,beforeor

    afterthememoryaccess.

    Inordertodothis,LDM/STMsupportafurthersyntaxin

    additiontothestackone:STMIA/LDMIA:IncrementAfter

    STMIB/LDMIB

    :Increment

    Before

    STMDA/LDMDA:DecrementAfterSTMDB/LDMDB:DecrementBefore

    628/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    63/85

    Example:BlockCopy

    Copyablock

    of

    memory,

    which

    is

    an

    exact

    multiple

    of

    12

    words

    long

    from

    the

    locationpointedtobyr12tothelocationpointedtobyr13.r14pointstothe

    endofblocktobecopied.

    ; r12 points to the start of the source data

    ; r14 points to the end of the source data

    ; r13 points to the start of the destination data

    loop LDMIA r12!, {r0-r11} ; load 48 bytes

    STMIA r13!, {r0-r11} ; and store them

    CMP r12, r14 ; check for the end

    BNE loop ; and loop until done

    Thislooptransfers48bytesin31cyclesOver50Mbytes/secat33MHz

    638/22/2008

    r13

    r14

    r12

    IncreasingMemory

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    64/85

    SwapandSwapByteInstructions

    Atomicoperation

    of

    amemory

    read

    followed

    by

    amemory

    write

    whichmovesbyteorwordquantitiesbetweenregistersand

    memory.

    Syntax:SWP{}{B}Rd,Rm,[Rn]

    ToimplementanactualswapofcontentsmakeRd=Rm.

    Thecompilercannotproducethisinstruction.

    648/22/2008

    Rm Rd

    Rn

    32

    1temp

    Memory

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    65/85

    SoftwareInterrupt(SWI)

    Ineffect,aSWIisauserdefinedinstruction.

    Itcauses

    an

    exception

    trap

    to

    the

    SWI

    hardware

    vector

    (thus

    causingachangetosupervisormode,plustheassociatedstate

    saving),thuscausingtheSWIexceptionhandlertobecalled.

    Thehandler

    can

    then

    examine

    the

    comment

    field

    of

    the

    instructiontodecidewhatoperationhasbeenrequested.

    BymakinguseoftheSWImechanism,anoperatingsystemcan

    implementaset

    of

    privileged

    operations

    which

    applications

    runninginusermodecanrequest.

    SeeExceptionHandlingModuleforfurtherdetails.

    658/22/2008

    31

    30

    29

    28

    27

    26

    25

    24

    23

    22

    21

    20

    19

    18

    17

    16

    15

    14

    13

    12

    11

    10 9 8 7 6 5 4 3 2 1 0 InstructionType

    Condition 1 1 1 1 SWI NUMBER SoftwareInterrupt

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    66/85

    Backup

    8/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    67/85

    Assembler:Pseudoops

    AREA >chunksofdata($data)orcode($code)

    ADR >loadaddressintoaregister

    ADRR0,BUFFER

    ALIGN >adjustlocationcountertowordboundaryusuallyaftera

    storagedirective

    END >nomoretoassemble

    678/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    68/85

    Assembler:Pseudoops

    DCD >definedwordvaluestoragearea

    BOW DCD 1024,2055,9051

    DCB >definedbytevaluestoragearea

    BOBDCB10,12,15

    % >zeroedoutbytestoragearea

    BLBYTE%30

    688/22/2008

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    69/85

    Assembler:Pseudoops

    IMPORT >nameofroutinetoimportforuseinthisroutine

    IMPORT_printf;Cprintroutine

    EXPORT >nameofroutinetoexportforuseinotherroutines

    EXPORTadd2;add2routine

    EQU >symbolreplacement

    loopcntEQU5

    698/22/2008

    EE382N-4 Embedded Systems Architecture

    bl

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    70/85

    AssemblyLineFormat

    708/22/2008

    label instruction ; comment

    label: created by programmer, alphanumeric

    whitespace: space(s) or tab character(s)

    instruction: op-code mnemonic or pseudo-op with required fields

    comment: preceded by ;ignored by assembler but usefulto the programmer for documentation

    NOTE: All fields are optional.

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    71/85

    Example:Cassignments

    C:

    x = (a + b) - c;

    Assembler:

    ADR r4,a ; get address for a

    LDR r0,[r4] ; get value of a

    ADR r4,b ; get address for b, reusing r4

    LDR r1,[r4] ; get value of b

    ADD r3,r0,r1 ; compute a+b

    ADR r4,c ; get address for c

    LDR r2,[r4] ; get value of cSUB r3,r3,r2 ; complete computation of x

    ADR r4,x ; get address for x

    STR r3,[r4] ; store value of x

    718/22/2008

    2008WayneWolf ComputersasComponents2nded.

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    72/85

    Example:Cassignment

    C:y = a*(b+c);

    Assembler:

    ADR r4,b ; get address for bLDR r0,[r4] ; get value of b

    ADR r4,c ; get address for c

    LDR r1,[r4] ; get value of c

    ADD r2,r0,r1 ; compute partial resultADR r4,a ; get address for a

    LDR r0,[r4] ; get value of a

    MUL r2,r2,r0 ; compute final value for y

    ADR r4,y ; get address for ySTR r2,[r4] ; store y

    728/22/2008

    2008WayneWolf ComputersasComponents2nded.

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    73/85

    Example:Cassignment

    C:z = (a

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    74/85

    Example:ifstatement

    C:

    if (a > b) { x = 5; y = c + d; } else x = c - d;

    Assembler:

    ; compute and test condition

    ADR r4,a ; get address for a

    LDR r0,[r4] ; get value of a

    ADR r4,b ; get address for b

    LDR r1,[r4] ; get value for b

    CMP r0,r1 ; compare a < b

    BLE fblock ; if a >

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    75/85

    ifstatement,contd.

    ; true block

    MOV r0,#5 ; generate value for x

    ADR r4,x ; get address for x

    STR r0,[r4] ; store xADR r4,c ; get address for c

    LDR r0,[r4] ; get value of c

    ADR r4,d ; get address for d

    LDR r1,[r4] ; get value of d

    ADD r0,r0,r1 ; compute y

    ADR r4,y ; get address for y

    STR r0,[r4] ; store yB after ; branch around false block

    758/22/2008

    2008WayneWolf ComputersasComponents2nded.

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    76/85

    ifstatement,contd.

    ; false block

    fblock ADR r4,c ; get address for c

    LDR r0,[r4] ; get value of c

    ADR r4,d ; get address for d

    LDR r1,[r4] ; get value for d

    SUB r0,r0,r1 ; compute a-b

    ADR r4,x ; get address for xSTR r0,[r4] ; store value of x

    after ...

    768/22/2008

    2008WayneWolf ComputersasComponents2nded.

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    77/85

    Example:Conditionalinstructionimplementation

    ; true block

    MOVLT r0,#5 ; generate value for x

    ADRLT r4,x ; get address for x

    STRLT r0,[r4] ; store x

    ADRLT r4,c ; get address for c

    LDRLT r0,[r4] ; get value of c

    ADRLT r4,d ; get address for dLDRLT r1,[r4] ; get value of d

    ADDLT r0,r0,r1 ; compute y

    ADRLT r4,y ; get address for ySTRLT r0,[r4] ; store y

    778/22/2008

    2008WayneWolf ComputersasComponents2nded.

    EE382N-4 Embedded Systems Architecture

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    78/85

    Conditionalinstructionimplementation,contd.

    ; false block

    ADRGE r4,c ; get address for c

    LDRGE r0,[r4] ; get value of c

    ADRGE r4,d ; get address for d

    LDRGE r1,[r4] ; get value for d

    SUBGE r0,r0,r1 ; compute a-b

    ADRGE r4,x ; get address for xSTRGE r0,[r4] ; store value of x

    788/22/2008

    2008WayneWolf ComputersasComponents2nded.

    EE382N-4 Embedded Systems Architecture

    l h

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    79/85

    Example:switchstatement

    C:

    switch (test) { case 0: break; case 1: }

    Assembler:

    ADR r2,test ; get address for test

    LDR r0,[r2] ; load value for test

    ADR r1,switchtab ; load address for switch table

    LDR r1,[r1,r0,LSL #2] ; index switch table

    switchtab DCD case0

    DCD case1

    ...

    798/22/2008

    2008WayneWolf ComputersasComponents2nded.

    EE382N-4 Embedded Systems Architecture

    l fil

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    80/85

    Example:FIRfilter

    C:for (i=0, f=0; i

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    81/85

    FIRfilter,cont.d

    ADR r3,c ; load r3 with base of c

    ADR r5,x ; load r5 with base of x

    ; loop body

    loop LDR r4,[r3,r8] ; get c[i]

    LDR r6,[r5,r8] ; get x[i]

    MUL r4,r4,r6 ; compute c[i]*x[i]

    ADD r2,r2,r4 ; add into running sumADD r8,r8,#4 ; add one word offset to array index

    ADD r0,r0,#1 ; add 1 to i

    CMP r0,r1 ; exit?BLT loop ; if i < N, continue

    818/22/2008

    2008WayneWolf ComputersasComponents2nded.

    EE382N-4 Embedded Systems Architecture

    ARM Instruction Set Summary (1/4)

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    82/85

    ARMInstructionSetSummary(1/4)

    82

    EE382N-4 Embedded Systems Architecture

    ARM I t ti S t S (2/4)

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    83/85

    83

    ARMInstructionSetSummary(2/4)

    EE382N-4 Embedded Systems Architecture

    ARM I t ti S t S (3/4)

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    84/85

    84

    ARMInstructionSetSummary(3/4)

    EE382N-4 Embedded Systems Architecture

    ARM I t ti S t S (4/4)

  • 7/29/2019 ARM ASSEMBLY TUTORIAL

    85/85

    ARMInstructionSetSummary(4/4)