[PPT]PowerPoint Presentation - University of Calgary...

20
A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit Based on a ENEL619.23 white paper prepared by Darrell Anklovitch

Transcript of [PPT]PowerPoint Presentation - University of Calgary...

Page 1: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit

Based on a ENEL619.23 white paperprepared by Darrell Anklovitch

Page 2: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Overview

• Architecture Overview• Register Map• ALU features and sample instructions• Multiplier features and sample instructions• Shifter features and sample instructions

Page 3: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

References

• ADSP-BF535 Blackfin Processor Hardware Reference, Rev 2, April 2004, Analog Devices. – Section 2

• Blackfin Processor Instruction Set Reference, Rev 2, May 2003, Analog Devices. – Sections 8 ~ 10, 14 & 15

• A number of the figures in this presentation are based on figures found in the ADSP-BF535 Blackfin Processor Hardware Reference.

Page 4: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

ADSP-2106x Core ArchitectureADSP-2106x Core Architecture

DAG 2 8 x 4 x 24

DAG 1 8 x 4 x 32

CACHE MEMORY

32 x 48

PROGRAM SEQUENCER

PMD BUS

DMD BUS

24PMA BUS

PMD

DMD

PMA

32DMA BUSDMA

48

40

JTAG TEST & EMULATION

FLAGS

FLOATING & FIXED-POINT MULTIPLIER, FIXED-POINT

ACCUMULATOR

32-BIT BARREL SHIFTER

FLOATING-POINT & FIXED-POINT

ALU

REGISTER FILE

16 x 40

BUS CONNECT

TIMER

Page 5: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Register File and COMPUTE Units

• Key issues– 5 data paths FROM COMPUTE units– 5 data paths TO COMPUTE units– Highly parallel operations UNDER THE RIGHT CONDITIONS

Page 6: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

BF533 Memory Accesses

Under the right conditions -- 4 memory accesses at same time64 bit Instruction Fetch, 2x32 bit Data Loads, 32 bit Data Store

PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same timePLUS background DMA activity

Page 7: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Compute Unit Architecture

2 Multipliers

2 ALUs

1 set of Video ALUs1

Shifter

RegisterFile

Page 8: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Register File

8 x 32 bit OR

16 x 16 bit

2 x 40 bitaccumulators

DATA REGISTER SYNTAX:•R0, R1 etc refer to 32 bit registers•R0.L refers to the low 16 bits of the R0 32 bit reg•R0.H refers to the high 16 bits of the R0 registerACCUMULATOR SYNTAX:•A0.L => low 16 bits•A0.H => next 16 bits•A0.W => least significant 32 bit word•A0.X => MS 8 bit extension

SHARC – 16 32-bit data registers, integer and floatThere is a pair of SHARC accumulator registers too

Page 9: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

ALU Data Flow2 x 32 bit paths to dualMultiplier/ALU units

2 x 32 bit paths back to register file

Page 10: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Sample instructionsBlackfinR0 = R1 + R2;

R0.L = R1.L + R2.H;

R0 = R1 +|- R2;Means

R0.L = R1.L – R2.Lin parallel withR0.H = R1.H + R2.H

SHARCR0 = R1 + R2;

ClosestR0 = R1 + R2, R4 = R1 – R2;

68KMOVE.L R2, R0ADD.L R1, R0

MOVE.W R2, R0ADD.W R1, R0

MOVE.L R2, R0ASR.L #16, R0MOVE.L R1, R3ASR.L #16, R3ADD.W R3, R0ASL.L #16, R0MOVE.W R2, R0ADD.W R1, R0

Page 11: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

ALU FeaturesDual 16 bit OPS:

Can be :

Single 16 bit OPS:

Single 32 bit OPS:

31

31

Rm

Rp

Rn

Rm

Rp

Rn

Dual 16 bit Cross:

Page 12: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

ALU Sample InstructionsSingle 16 bit ops: Dual 16 bit ops:

Quad 16 bit ops:

A B A BDC

Single 32 bit ops:

Dual 32 bit ops:

•A & B registers must stay on the same side of the ‘|’ for bothInstructions•For dual and quad 16 bit operations the (CO) option causes the destination registers to cross

Operator order is important+ must come before -

Does not work in parallelMust have this option

Page 13: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Multiply Data Flow2 x 32 bit paths to dualMultiplier/ALU units

2 x 32 bit paths back to register file

2 x 40 bitaccumulator

Multiplier share the same operand/result buses as the ALU

Page 14: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Multiply Features

H H

H L

L H

L L

•Multiplies are signed fractional by default•Signed fractional multiply result is automatically leftshifted 1 bit. •Signed fractional multiply != signed integer multiply•Rounding available on fractional number multiplies andspecial option of integer number multiplies

Page 15: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Rounding2 cases:

0x8000

31

Rd

top 16 bits go to destination register

31

Rm31

Rp

0x8000

31

Rd

top 16 bits go to destination register

32 bit result

Rounding adds 0x8000 to the 32 bit multiplier result oraccumulator value before extracting a 16 bit value to thedestination register

Page 16: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Fractional Multiply

•When extracting a 16 bit fractional value from an accumulator the high 16 bits is taken•Where in the destination register it goes depends on whichaccumulator is being extracted from

Fractional Multiply !=Integer Multiply

Fractional Multiply !=Integer Multiply

Page 17: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Integer Multiply

•When extracting a 16 bit integer value from an accumulatorthe low 16 bits is taken.•Where in the destination register the 16 bit value goes depends on which accumulator is being extracted from

Fractional Multiply !=Integer Multiply

Page 18: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Multiply Sample Instructions16 bit extraction from ACC 0 16 bit extraction from ACC 1

32 bit extraction A1 += R1.H * R2.L , A0 += R1.L * R2.L;R3.H = (A1 += R1.H * R2.L) , R3.L = (A0 += R1.L * R2.L);Any combination of .H and .L in the 2 operands is allowed

R3 = (A1 += R1.H*R2.L), R2 = (A0 += R1.L * R2.L);Where destination registers must be paired as follows: R[1,0], R[3,2], R[5,4] and R[7,6]

R3.H = (A1 += R1.H * R2.L), A0 += R1.L * R2.L;

Multi-issue MAC Instruction Examples

Page 19: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Shifter Sample Instructions

2 operatorRegistershifts

2 operatorImmediateshifts

3 opRegshift

3 opImmediateshift

Arithmetic shift

Page 20: [PPT]PowerPoint Presentation - University of Calgary …people.ucalgary.ca/.../04March/04BlackfinCompute.ppt · Web viewA comparison of DSP Architectures BlackFin ADSP-BFXXX Compute

Parallel Instruction Examples• In general there are 16 and 32 bit versions of

the arithmetic instructions• Most of the 32 bit instructions can be

executed in parallel with 2 x 16 bit memory/index operations

• Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands

• || means parallel• Examples:

– A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\– R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0];