ATI Stream Computing ATI Intermediate Language (IL) Micah Villmow May 30, 2008.
-
Upload
jagger-mawson -
Category
Documents
-
view
221 -
download
4
Transcript of ATI Stream Computing ATI Intermediate Language (IL) Micah Villmow May 30, 2008.
ATI Stream ComputingATI Intermediate Language (IL)
Micah VillmowMay 30, 2008
| ATI Stream Computing Update | Confidential2 2 | ATI Stream Computing – ATI Intermediate Language (IL)
ATI IL – What is it?
• Device agnostic forward compatible language
• Called Intermediate Language
• Portable ISA
• Can write for lowest common denominator
• First level to expose new ATI CAL features
• Allows finely-detailed optimizations
• Based on Microsoft® DirectX® 9.0 Shader Language
| ATI Stream Computing Update | Confidential3 3 | ATI Stream Computing – ATI Intermediate Language (IL)
Outline
• Pipeline – A quick recap
• Instructions Setup and teardown ALU Texture units Memory access Functions Flow Control
• Examples
• Future additions
| ATI Stream Computing Update | Confidential4 4 | ATI Stream Computing – ATI Intermediate Language (IL)
Pixel Pipeline
• IL instructions modify the state of the various stages of the pipeline
• Declarations instruction the setup engine how to setup the graphics card correctly
• ALU instructions instruct the stream processing units what do calculate
• TEX instructions instruct the texture units what data to fetch
• Global buffer accesses instruct the shader export path to get correct data
• Color buffer instructions send data through the render backends
| ATI Stream Computing Update | Confidential5 5 | ATI Stream Computing – ATI Intermediate Language (IL)
Compute Pipeline
• ATI Radeon™ HD 4800 Series GPUs introduce compute shader
• Pipeline now includes LDS, GDS, and SR
• Dedicated L1 per SIMD on ATI Radeon™ HD 4800 Series GPUs
| ATI Stream Computing Update | Confidential6 6 | ATI Stream Computing – ATI Intermediate Language (IL)
ATI IL Instruction Syntax
The language to write CAL Shader
A portable immediate language for AMD GPUs
Resembles DirectX® assembly
ATI IL kernel follows basic pattern of:
1. Setup state
2. Read texture data
3. Compute results
4. Write results
| ATI Stream Computing Update | Confidential7 7 | ATI Stream Computing – ATI Intermediate Language (IL)
ATI IL Instructions - SetupShader Type:
il_ps_2_0 – IL Pixel Shader version 2.0
il_cs_2_0 – IL Compute Shader version 2.0
Inputs:
dcl_input_position_interp(linear_noperspective)_centered vWinCoord0.xy__ - Interpolated X/Y float coordinates
dcl_input vObjIndex0 - Auto-indexed integer value
Outputs:
dcl_output_generic oN - Declare that color output buffer number N will be used, max N is 8 on R6XX based cards and 16 on R7XX
Constants:
dcl_cb cbN[X] – Declare that constant buffer N will be used of size X, N is between 0-14, max X is 4096
Literals:
dcl_literal lN, <NUM>, <NUM>, <NUM>, <NUM> - Declare that literal number N will be used with four values
Resources:
dcl_resource(N)_type([1d|2d],[unnorm|norm])_fmtx(TYPE)_fmty(TYPE) _fmtz(TYPE)_fmtw(TYPE)
Scratch Buffer:
dcl_indexed_temp_array N[X] – Declare that scratch buffer N will be used of size X, max size 4096
Compute Shader:
dcl_num_thread_per_group N - Declare that N threads will be working together in one group
Local Data Share:
dcl_lds_size_per_thread N – Declare that each thread will use N dwords of LDS, must be multiple of 4 and <= 64
dcl_lds_sharing_mode _wavefront[Rel|Abs] – Declare that sharing mode of LDS uses relative or absolute addressing
Global Shared Registers:
dcl_shared_temp srN – Declare that the kernel will use N shared registers.
| ATI Stream Computing Update | Confidential8 8 | ATI Stream Computing – ATI Intermediate Language (IL)
ATI IL Instructions - Registers
vObjIndex0.x – Integer register that stores the index of the thread within the domain
vWinCoord0.xy – Floating point register that stores the Euclidean coordinates of the thread within the domain
r# - General Purpose Registers that are 128 bits wide
x#[idx] – Scratch buffer register to read/write at offset idx
l# - Literal register that is 128 bits wide
cb#[idx] – constant buffer access to read from offset idx
g[idx] – Global buffer read/write register
o# - Output buffer register
| ATI Stream Computing Update | Confidential9 9 | ATI Stream Computing – ATI Intermediate Language (IL)
ATI IL Instructions – Syntax/Opcodes
Instruction syntax<opcode>[_<ctrl>][_<ctrl(val)>] <= opcode with specifiers
[<dst>[_<mod>][.<write-mask>]] <= dst with modifier/mask
[, <src>[_<mod>][.<swizzle-mask>]] <= src with modifier/mask
Sample Opcodes ALU:
– mad r0, r1, r2, r3 // r0 = r1 * r2 + r3
– dmul r0.xy, r1.xy, r2.xy // same as above but with doubles
TEX:– sample_resource(0)_sampler(0) r0, vWinCoord0.xy00
– sample_l_resource(0)_sampler(0) r0, vWinCoord0.xy00, r0.1000 // sample instruction required in loops
MEM:– lds_read_vec r0, vTid0.x0 // read from the current threads lds space at offset 0
– lds_write_vec mem.xy__, vaTid0.xxxx // Write the absolute thread id
| ATI Stream Computing Update | Confidential1010 | ATI Stream Computing – ATI Intermediate Language (IL)
ATI IL Instructions – Write Masks/Read Swizzles
Write Masks: Each destination can have a write mask
There are four possible combinations for each component
– Component – The original component position, which means write results
– ‘_’ – Do not write the results of this component to the register
– ‘0’ – Write the value 0.0f to the destination component
– ‘1’ – Write the value 1.0f to the destination component
Example: “mov r0.x10w, vWinCoord0.xy”, Places copies x element over and places y element in the w component of r0.
Read Swizzles: Each source register can have a read swizzle
The read swizzle reorders the way in which data is read
Read swizzles are extended based on the last swizzle used to fill the vector. i.e. r0.xy is equivalent to r0.xyyy
Each component can have up to one of 6 options
– Component – Each position in the 4 vector can have a component specified, i.e. {xyzw} and there is no restriction on ordering
– ‘1’ – Use the value 1.0f as the source component
– ‘0’ – Use the value 0.0f as the source component
Example “mov r0, r0.wzyx” – Reverses the data in a register
| ATI Stream Computing Update | Confidential1111 | ATI Stream Computing – ATI Intermediate Language (IL)
ATI IL Instructions – Functions
Functions are possible in IL following a few constraints:
1. Must begin with “func <integer>”
2. Must end with “endfunc”
3. Must use “ret” before “endfunc”
4. Only use “ret_dyn” for early_return
5. Must be placed after main function
Main function must use “endmain” if functions are in use
To call a function, use “call <integer>” or the conditional versions.
| ATI Stream Computing Update | Confidential1212 | ATI Stream Computing – ATI Intermediate Language (IL)
ATI IL Instructions - Example
il_ps_2_0
dcl_literal l0, 0x40800000, 0x3f800000, 0x40000000, 0x40400000
dcl_cb cb0[2]
dcl_input_position_interp(linear_noperspective)_centered vWinCoord0.xy__
dcl_output_generic o0
dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(1)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
mul r0, vWinCoord0.xy00, l0.z100
add r17, r0.xyxy, l0.00y0
mov r33, r0.xyxy
div_zeroop(infinity) r33, r0.1111, r33
sample_resource(1)_sampler(1) r101, r17.xy00
sample_resource(1)_sampler(1) r102, r17.zw00
mul r35, r35, r33
add r101, r101, r35_neg(xyzw)
mad r35, r101, r42.xxxx, r35
mul r36, r36, r33
add r102, r102, r36_neg(xyzw)
mad r36, r102, r42.xxxx, r36
mad r19.x, r0.y, cb0[0].z, r0.x
ftoi r21.x, r19.x
mov o0, r35
mov g[r21.x], r36
ret
end
;PS; -------- Disassembly --------------------00 ALU: ADDR(32) CNT(7) KCACHE0(CB0:0-15) 0 x: MOV*2 R0.x, R0.x y: MOV R0.y, R0.y z: MULADD R0.z, R0.x, (0x40000000, 2.0f).x, 1.0f 1 z: MULADD R4.z, PV0.y, KC0[0].z, PV0.x w: MOV R0.w, PV0.y t: RCP_e R1.z, PV0.x 01 TEX: ADDR(64) CNT(2) VALID_PIX 2 SAMPLE R2, R0.xyxx, t1, s1 UNNORM(XYZW) 3 SAMPLE R3, R0.zwzz, t1, s1 UNNORM(XYZW) 02 ALU: ADDR(39) CNT(22) 4 x: MUL T2.x, 0.0f, R1.z t: RCP_e ____, R0.y 5 x: ADD T0.x, R2.z, -PV4.x y: ADD T0.y, R3.x, -PV4.x z: ADD ____, R2.x, -PV4.x VEC_120 w: MUL T0.w, 0.0f, PS4 t: ADD T1.x, R3.z, -PV4.x 6 x: ADD ____, R3.y, -PV5.w y: ADD ____, R2.y, -PV5.w VEC_120 z: ADD T0.z, R3.w, -PV5.w w: ADD ____, R2.w, -PV5.w VEC_120 t: MULADD R0.x, PV5.z, 0.0f, T2.x VEC_021 7 x: MULADD R2.x, T0.y, 0.0f, T2.x y: MULADD R0.y, PV6.y, 0.0f, T0.w z: MULADD R0.z, T0.x, 0.0f, T2.x w: MULADD R0.w, PV6.w, 0.0f, T0.w t: MULADD R2.y, PV6.x, 0.0f, T0.w VEC_021 8 z: MULADD R2.z, T1.x, 0.0f, T2.x w: MULADD R2.w, T0.z, 0.0f, T0.w t: F_TO_I ____, R4.z 9 t: MULLO_INT R3.x, PS8, (0x00000004, 5.605193857e-45f).x 03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R3.x], R2, ELEM_SIZE(3) 04 EXP_DONE: PIX0, R0END_OF_PROGRAMGprPoolSize = 122CodeLen = 544;BytesSQ_PGM_END_CF = 5; words(64 bit)SQ_PGM_END_ALU = 61; words(64 bit)SQ_PGM_END_FETCH = 68; words(64 bit);SQ_PGM_RESOURCES = 0x00000005SQ_PGM_RESOURCES:NUM_GPRS = 5
| ATI Stream Computing Update | Confidential1313 | ATI Stream Computing – ATI Intermediate Language (IL)
ATI IL Instructions – Flow Control
Flow control is based on the result of comparison instructions. 4 signed integer comparison instructions + negation 2 unsigned integer comparison instructions 4 floating point comparison instructions 4 double comparison instructions
Flow control consists of: if-else-endif or if-endif call-return in static and conditional versions switch-case1…n-endswitch whileloop-continue/break-endloop
| ATI Stream Computing Update | Confidential1414 | ATI Stream Computing – ATI Intermediate Language (IL)
Example Slide 1 – Output IL & Input IL
il_ps_2_0
dcl_output_generic o0
dcl_literal l0, 1.0, 0.5, 0.5, 0.5
mov o0, l0
end
il_ps_2_0
dcl_input_position_interp(linear_noperspective)_centered_center vWinCoord0.xy__
dcl_output_generic o0
dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
sample_resource(0)_sampler(0) o0, vWinCoord0.xy
end
| ATI Stream Computing Update | Confidential1515 | ATI Stream Computing – ATI Intermediate Language (IL)
Example Slide 2 - Bursting
il_cs_2_0dcl_cb cb0[1]dcl_num_thread_per_group 64itof r0.z, vaTid0.xdiv r0.y, r0.z, cb0[0].xmod r0.x, r0.z, cb0[0].xflr r0, r0mul r0.x, r0.x, cb0[0].zdcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)imul r0.w, vaTid0.x, cb0[0].wsample_resource(0)_sampler(0) r1, r0.xyadd r0.x, r0.x, r0.1sample_resource(0)_sampler(0) r2, r0.xyadd r0.x, r0.x, r0.1sample_resource(0)_sampler(0) r3, r0.xyadd r0.x, r0.x, r0.1sample_resource(0)_sampler(0) r4, r0.xyadd r0.x, r0.x, r0.1mov g[r0.w + 0], r1mov g[r0.w + 1], r2mov g[r0.w + 2], r3mov g[r0.w + 3], r4end
export_burst_perf.exe –w 2048 –h 2048 –t –e –r -2
Burst 1 Perf:88.73GB/sBurst 2 Perf:104.98GB/sBurst 3 Perf:111.39GB/sBurst 4 Perf:114.49GB/s
03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R0.x], R7, ELEM_SIZE(3) BRSTCNT(0)
Export Instruction:03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R0.x], R7, ELEM_SIZE(3) BRSTCNT(1) 03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R0.x], R7, ELEM_SIZE(3) BRSTCNT(2) 03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R0.x], R7, ELEM_SIZE(3) BRSTCNT(3)
115.2GB/s Peak
| ATI Stream Computing Update | Confidential1616 | ATI Stream Computing – ATI Intermediate Language (IL)
Example 3 – Scratch Buffer
il_ps_2_0
dcl_input_position_interp(linear_noperspective)_centered_center vWinCoord0.xy__
dcl_output_generic o0
dcl_indexed_temp_array x0[2]
dcl_cb cb0[1]
mov r6, r6.0000
flr r5, vWinCoord0.xy
ftoi r0.x, cb0[0].y
ftoi r2.x, cb0[0].z
mad r3, r5.y, cb0[0].x, r5.x
mad r4, r5.y, cb0[0].x, r5.x
mov x0[r0.x], r3
mov x0[r2.x], r4
add r0.x, r0.x, cb0[0].y
add r2.x, r2.x, cb0[0].y
add r5, x0[r0.x], x0[r2.x]
add r6, r5, r6
mov o0, r6
end
| ATI Stream Computing Update | Confidential1717 | ATI Stream Computing – ATI Intermediate Language (IL)
Example 4 – LDS & Shared Registers
il_cs_2_0
dcl_cb cb0[1]
dcl_num_thread_per_group 64
dcl_lds_size_per_thread 4
dcl_lds_sharing_mode _wavefrontRel
dcl_literal l0, 64, 64, 64, 4
iadd r0, vTid0.x0, cb0[0].x0
mov r2, r2.0000
iadd r0.x, r0.x, cb0[0].y
iadd r0.y, r0.y, l0.w
and r0.x, r0.x, l0.x
lds_read_vec r1, r0.xy
fence_lds_threads
add r2, r2, r1
lds_write_vec mem, r2
end
il_cs_2_0
dcl_cb cb0[1]
dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_num_thread_per_group 64
dcl_shared_temp sr1
dcl_lds_size_per_thread 4
dcl_literal l0, 0, 0, 0, 0
dcl_literal l1, 0, 1, 41, 0x000000FF
mov r0, r0.0000
if_logicalz vTgroupid0.x
mov sr0.x, vaTid0.x
mov r0.x, sr0.x
else
ieq r1.w, vTgroupid0.x, l1.y
cmov_logical r0.x, r1.w, sr0.x, l1.z
endif
mov r0.z, vTgroupid0.x
mov g[vaTid0.x], r0
ret
endmain
end
| ATI Stream Computing Update | Confidential1818 | ATI Stream Computing – ATI Intermediate Language (IL)
Disclaimer & AttributionDISCLAIMERThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION© 2010 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI Logo, Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Microsoft and DirectX are trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.