Ideas for the design of an ASIP for LQCD

15
CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 1 Ideas for the design of an ASIP for LQCD Target Compiler Technologies CASTNESS’11, Rome, Italy

description

Ideas for the design of an ASIP for LQCD. Target Compiler Technologies CASTNESS’11, Rome, Italy. Agenda. ASIPs and IP Designer EURETILE platform An ASIP for LQCD. ASIPs in Multi-Core SoC. ASIP: Application-Specific Processor Anything between general-purpose  P and hardwired data-path - PowerPoint PPT Presentation

Transcript of Ideas for the design of an ASIP for LQCD

Page 1: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 1

Ideas for the design of an ASIP for LQCD

Target Compiler TechnologiesCASTNESS’11, Rome, Italy

Page 2: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 2

Agenda

ASIPs and IP Designer

EURETILE platform

An ASIP for LQCD

Page 3: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 3

ASIPs in Multi-Core SoC

ASIP: Application-Specific Processor Anything between general-purpose P and hardwired data-path Flexibility through programmability and design-time reconfigurability High throughput, low energy through parallelism and specialization

ASIP is foundation of heterogeneous multi-core SoC Balanced SoC architecture offers best performance at lowest energy and lowest cost

Page 4: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 4

Why ASIPs?

Maximise performance Specialisation Parallelism: VLIW, SIMD, multi-core

Minimise power dissipation Specialisation Parallelism: VLIW, SIMD, multi-core Power-optimised RTL generation

Leverage the benefits of programmability React to changing requirements Ship first for evolving standards Remedy defects Extend products to new markets without an SoC respin

Page 5: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 5

IP Designer Tool Suite

Page 6: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 6

nML – ASIP description language

Structural skeleton

reg V[4]<vector>; trn vecr<vector>; trn vecs<vector>; trn vecd<vector>;trn vect<vector>;fu vec;fu vabs;...

opn vec_adiff_opn(t:c2u, r:c2u){ action { stage E1: vecd = vsub(vecr=V[r],vecs=V[t]) @vec; V[t] = vect = vabs(vecd) @vabs; } syntax : "vadiff v"t ",v"r ",v"t; image : t::r;}

Instruction-set grammar

Example: architectural specialisation Absolute-difference instruction in motion estimation

• Registers, busses, functional units

• Application specific data type ‘vector’

• Registers, busses, functional units

• Application specific data type ‘vector’

Primitive functions:•vsub()•vabs()

Primitive functions:•vsub()•vabs()

Operation pattern:V vabs() vsub() V, V Operation pattern:V vabs() vsub() V, V

Page 7: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 7

Agenda

ASIPs and IP Designer

EURETILE platform

An ASIP for LQCD

Page 8: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 8

EURETILE hardware platform

Communication DNP

Control RISC

Computation DSP ASIPs: specialised towards the application

− Lattice quantum chromo dynamics (LQCD)

− Neural network (Izhikevich)

DNP

RISC

DSP

MEM

***

ASIP1

Page 9: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 9

Agenda

ASIPs and IP Designer

EURETILE platform

An ASIP for LQCD

Page 10: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 10

LQCD ASIP

Goals Increase performance Decrease gate count or usage of FPGA blocks

Means Task level parallelism (multi tile architecture) Data level parallelism Instruction level parallelism Architecture specialisation

Page 11: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 11

LQCD ASIP

Instruction level parallelism

VU_1 … VU_n LS_0 … LS_m

VLIW instruction word

Arithmetic operations in parallel with load/store operations Appropriate mix of n and m based on feedback from

compilation of Qphi() function n*m speed improvement over scalar architecture

Data level parallelism

c1 c2 c3

3-way SIMD fits with SU(3) matrix algebra 3x speed improvement over scalar architecture

Page 12: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 12

LQCD ASIP

Architecture specialisation: complex floating point operations:

C + C, C + i*C → 2x speedup over scalar architecture

C – C, C – i*C

C * R → 4x speedup over scalar architecture

C * C → 8x speedup over scalar architecture

Behaviour of floating point operations • Defined in a C dialect intended for the modelling of functional units

• Translated into simulation and implementation (RTL) models

• Synthesis on standard cell library, mapping on FPGA primitives

Vector types and operators defined for the C compiler

vector v1, va[4], vb[4];

v1 += va[0] * vb[1];

Page 13: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 13

LQCD ASIP

Architecture specialisation: address generationGoal: Vector units should be used every cycle, address generation must be done in parallel

How: to be investigated, after feedback from C compiler!

Deliverables SDK (Compiler, Assembler, Linker, Simulator, Debugger) based

on IP Designer SystemC model RTL Model + FPGA mapping

Page 14: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 14

Page 15: Ideas for the design of an ASIP for LQCD

CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 15