PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL /...

50
www.openpowerfoundation.org

Transcript of PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL /...

Page 2: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation iiWorkgroup Specification

Standard Track

PSL / AFU Interface: CAPI 2.0Accelerator Work Group <[email protected]>OpenPower Foundation

Version 1.0 (August 9, 2017)Copyright © 2017 OpenPOWER Foundation

All capitalized terms in the following text have the meanings assigned to them in the OpenPOWERIntellectual Property Rights Policy (the "OpenPOWER IPR Policy"). The full Policy may be found at theOpenPOWER website or are available upon request.

This document and translations of it may be copied and furnished to others, and derivative works thatcomment on or otherwise explain it or assist in its implementation may be prepared, copied, published,and distributed, in whole or in part, without restriction of any kind, provided that the above copyright noticeand this section are included on all such copies and derivative works. However, this document itself maynot be modified in any way, including by removing the copyright notice or references to OpenPOWER,except as needed for the purpose of developing any document or deliverable produced by an OpenPOW-ER Work Group (in which case the rules applicable to copyrights, as set forth in the OpenPOWER IPRPolicy, must be followed) or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by OpenPOWER or itssuccessors or assigns.

This document and the information contained herein is provided on an "AS IS" basis AND TO THEMAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, THE OpenPOWER Foundation AS WELLAS THE AUTHORS AND DEVELOPERS OF THIS STANDARDS FINAL DELIVERABLE OR OTHERDOCUMENT HEREBY DISCLAIM ALL OTHER WARRANTIES AND CONDITIONS, EITHER EXPRESS,IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED TO, ANY IMPLIED WARRANTIES, DUTIESOR CONDITIONS OF MERCHANTABILITY, OF FITNESS FOR A PARTICULAR PURPOSE, OFACCURACY OR COMPLETENESS OF RESPONSES, OF RESULTS, OF WORKMANLIKE EFFORT, OFLACK OF VIRUSES, OF LACK OF NEGLIGENCE OR NON-INFRINGEMENT.

OpenPOWER, the OpenPOWER logo, and openpowerfoundation.org are trademarks or registeredtrademarks of OpenPOWER Foundation, Inc., registered in many jurisdictions worldwide. Other company,product, and service names may be trademarks or service marks of others.

Abstract

This document is a Standard Track, Work Group Specification work product owned by the Accelera-tor Workgroup and handled in compliance with the requirements outlined in the OpenPOWER Founda-tion Work Group (WG) Process document. It was created using the Master Template Guide version1.0.0. Comments, questions, etc. can be submitted to the public mailing list for this document at<[email protected]>.

Acknowledgement to members of the workgroup for their contributions

Page 3: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation iiiWorkgroup Specification

Standard Track

Table of ContentsPreface .......................................................................................................................................... vi

1. Conventions ...................................................................................................................... vi2. Document change history ................................................................................................. vii

1. PSL AFU Interface ..................................................................................................................... 11.1. AFU Command Interface ................................................................................................ 11.2. AFU Buffer Interface ....................................................................................................... 91.3. PSL Response Interface ............................................................................................... 111.4. AFU MMIO Interface ..................................................................................................... 141.5. AFU Control Interface ................................................................................................... 141.6. DMA Interface ............................................................................................................... 21

2. Timing Diagram Examples ....................................................................................................... 313. Conformance to this Specification ............................................................................................ 33

3.1. AFU Command Interface ............................................................................................... 333.2. AFU Buffer Interface ..................................................................................................... 333.3. PSL Response Interface ............................................................................................... 343.4. AFU MMIO Interface ..................................................................................................... 343.5. AFU Control Interface ................................................................................................... 35

Glossary ....................................................................................................................................... 36A. OpenPOWER Foundation overview ......................................................................................... 42

A.1. Foundation documentation ............................................................................................ 42A.2. Technical resources ...................................................................................................... 42A.3. Contact the foundation .................................................................................................. 43

Page 4: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation ivWorkgroup Specification

Standard Track

List of Figures1.1. PSL Command/Response Flow ............................................................................................. 131.2. PSL AFU Control Interface Flow in Non-Shared Mode .......................................................... 161.3. PSL AFU Control Interface Flow in Non-Shared Mode .......................................................... 181.4. AFU DMA Interface Write Request Example ......................................................................... 221.5. AFU DMA Interface Read Request Example ......................................................................... 231.6. DMA Write Data Alignment ................................................................................................... 252.1. Control Interface, Reset ........................................................................................................ 312.2. Control Interface, Start .......................................................................................................... 312.3. Command Interface, Read_cl_na .......................................................................................... 312.4. Buffer Interface, Write of buffer from Read_cl_na .................................................................. 312.5. Response Interface, Read_cl_na complete ........................................................................... 32

Page 5: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation vWorkgroup Specification

Standard Track

List of Tables1.1. AFU Command Interface ........................................................................................................ 11.2. PSL Command Opcodes Directed at the PSL Cache .............................................................. 31.3. PSL Command Opcodes That Do Not Allocate in the PSL Cache ........................................... 41.4. PSL Command Opcodes Reserved for Scratch Pad ............................................................... 41.5. PSL Command Opcodes Reserved for LPC Services ............................................................. 41.6. PSL Command Opcodes for Management .............................................................................. 51.7. PSL Command Opcodes for DMA Address Translation Phase (See AFU DMA Interface foradditional information) .................................................................................................................... 51.8. ah_cabt Translation Ordering Behavior ................................................................................... 61.9. CAS Operand Alignment on AFU Buffer Interface ................................................................... 81.10. AFU Buffer Interface ........................................................................................................... 101.11. PSL Response Interface ...................................................................................................... 111.12. PSL Response Codes ......................................................................................................... 121.13. AFU MMIO Interface ........................................................................................................... 141.14. AFU Control Interface ......................................................................................................... 151.15. PSL Control Commands on ha_jcom .................................................................................. 151.16. JEA Format for LLCMD ....................................................................................................... 191.17. Process Element Entry Format ............................................................................................ 201.18. Legal Operand Alignment and Data Placement for Atomic Operands .................................. 271.19. Atomic Opcodes .................................................................................................................. 281.20. AFU DMA Interface ('x' Denotes the DMA Port Number) ..................................................... 28

Page 6: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation viWorkgroup Specification

Standard Track

Preface1. ConventionsThe OpenPOWER Foundation documentation uses several typesetting conventions.

NoticesNotices take these forms:

Note

A handy tip or reminder.

Important

Something you must be aware of before proceeding.

Warning

Critical information about the risk of data loss or security issues.

ChangesAt certain points in the document lifecycle, knowing what changed in a document is important. Inthese situations, the following conventions will used.

• New text will appear like this. Text marked in this way is completely new.

• Deleted text will appear like this. Text marked in this way was removed from the previous versionand will not appear in the final, published document.

• Changed text will appear like this. Text marked in this way appeared in previous versions but hasbeen modified.

Command promptsIn general, examples use commands from the Linux operating system. Many of these are alsocommon with Mac OS, but may differ greatly from the Windows operating system equivalents.

For the Linux-based commands referenced, the following conventions will be followed:

$ prompt Any user, including the root user, can run commands that are prefixed with the $prompt.

# prompt The root user must run commands that are prefixed with the # prompt. You can alsoprefix these commands with the sudo command, if available, to run them.

Page 7: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation viiWorkgroup Specification

Standard Track

Document linksDocument links frequently appear throughout the documents. Generally, these links include a textfor the link, followed by a page number in parenthesis. For example, this link, Preface [vi],references the Preface chapter on page vi.

2. Document change historyThis version of the guide replaces and obsoletes all earlier versions.

The following table describes the most recent changes:

Revision Date Summary of Changes

May 18, 2016 • V1.0 Document copied to working directory

August 2, 2016 • Initial Updates for V2.0

January 16, 2017 • Clarifications and typo corrections based on feedback

Fixes to CAS Operand Alignment on Buffer Interface

July 20, 2017 • Workgroup Specification Revision 1.0

Page 8: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 1Workgroup Specification

Standard Track

1. PSL AFU InterfaceThe POWER Service Layer (PSL) to Accelerator Functional Unit (AFU) interface communicatesto the acceleration logic running on the FPGA. Through this interface, the PSL offers services tothe AFU. The services offered are cache-line oriented and allow the AFU to make buffering versusthroughput trade-offs. The interface to the AFU is composed of six independent interfaces:

• AFU Command Interface is the interface through which the AFU sends service requests to thePSL.

• AFU Buffer Interface is the interface through which the PSL moves data to and from the AFU.

• PSL Response Interface is the interface through which the PSL reports status about servicerequests.

• AFU MMIO Interface is the interface through which software reads and writes can accessregisters within the AFU.

• AFU Control Interface allows the PSL job management functions to control the state of the AFU.

• AFU DMA Interface allows the AFU to send native PCIe Writes and Reads and to receive ReadCompletion data.

Together these interfaces allow software to control the AFU state and allow the AFU to access datain the system.

1.1. AFU Command InterfaceThe AFU command interface provides the AFU logic with the ability to send commands to the PSL.The interface is a credit-based interface; the output ha_croom informs the AFU of the number ofcommands it can accept from the AFU. The number of commands allocated to the AFU mightchange based on job management policies. This signal is static and is not meant to be a dynamiccount from the PSL to AFU. It is up to the AFU to implement a flow control mechanism (i.e. acredit is consumed when the AFU presents a command on the command interface and is consid-ered released when the PSL returns a response on the response interface). The interface is asynchronous interface; Xh_valid must be valid for only one cycle per command, and the othercommand descriptor signals must also be valid during that cycle. Each command is assigned a tagby the AFU. This tag is used by the PSL during subsequent phases of the transaction to identify thecommand. AFU Command Interface lists the commands that can be sent to the PSL by the applica-tion.

Note There are references to PSL internal register mnemonics within this section. Theseregisters are mentioned to provide additional content clarity. These registers are set bysystem software during initialization or library calls to the AFU. However, the format ofthese registers is not information required by an AFU designer.

Table 1.1. AFU Command InterfaceSignal Name Bits Source Description

ah_cvalid 1 AFU A valid command is present on the interface. This signal is asserted for a singlecycle for each command that is to be accepted.

Page 9: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 2Workgroup Specification

Standard Track

Signal Name Bits Source DescriptionDesign recommendation: make this a registered interface to the PSL.

• This signal can be driven for multiple cycles. That is, different commands canbe driven back-to-back, as long as there is an adequate number of creditsoutstanding.

ah_ctag 8 AFU AFU generated ID for the request. This is used as an array address on the AFUBuffer interface and for status notification.

ah_ctagpar 1 AFU Odd parity for ah_ctag, ah_paren = ‘1’.

ah_com 13 AFU Indicates which command the PSL will execute. Opcodes are defined in PSLCommand Opcodes Directed at the PSL Cache.

ah_compar 1 AFU Odd parity for ah_com, ah_paren = ‘1’.

ah_cabt 3 AFU PSL translation ordering behavior. See ah_cabt Translation Ordering Behavior.

ah_cea 64 AFU Effective byte address for the command. Addresses for “cl” commands must besent as 128-byte aligned addresses, Addresses for write_ must be naturally alignedaccording to the given ah_csize.

Addresses for CAS commands must be aligned to either 8B or 4B (based oncommand operand size)

A command with a size which is not naturally aligned to the address provided willcause an error to be set in the system and the AFU will be disabled

ah_ceapar 1 AFU Odd parity for ah_cea, ah_paren = ‘1’.

ah_cch 16 AFU Context handle used to augment ah_cea in AFU-directed context mode.

Drive to ‘0’ in other modes.

ah_csize 12 AFU Number of bytes for partial line commands.

Read/write commands require the size to be a power of 2 (1, 2, 4, 8, 16, 32,

64, 128).

The ah_csize is binary encoded.

A command with an invalid size will cause an error to be set in the system and theAFU will be disabled

ah_cpagesize 4 AFU Page size hint. Used by the PSL for predicting page size during ERAT lookup. Thiscauses the Effective to Real address (ERAT) lookup to search this page size first fortranslation which can result in a slight performance increase. The response interfaceprovides the page size via ha_pagesize with the completion of a command so thatpage sizes can be tracked with addresses and sent on future command requestswith this signal. It is not required for the AFU to track the page sizes since each pagesize ERAT will be searched if no hint is provided, but providing it will determine thesearch order.

4'b0xxx No hint provided

4'b1000 4k

4'b1010 64k

4'b1011 2M

4'b1100 16M

4'b1101 1G

4'b1111 16G

All other values are reserved

ha_croom 8 PSL Number of commands that the PSL is prepared to accept and that must be capturedby the AFU when it is enabled on the AFU Control interface.

Page 10: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 3Workgroup Specification

Standard Track

Signal Name Bits Source DescriptionThis signal is not meant to be a dynamic count from the PSL to the AFU, itrepresents the maximum number of commands the PSL can accept from the AFU.

The value shown is maximum commands -1, so for 256 commands the value shownwill be 255.

Table 1.2. PSL Command Opcodes Directed at the PSL Cache

Mnemonic Opcode Description

Read_cl_s x‘0A50’ Read a cache line and allocate the cache line in the precise cache in the shared state. Thiscommand must be used when there is an expectation of temporal locality. Ah_csize shouldbe 128 bytes, and ah_cea should be 128-byte line aligned (ah_cea[57:63] are ignored).

Read_cl_m x‘0A60’ Read a cache line and allocate the cache line in the precise cache in the modified state.This command must be used when there is an expectation that data within the line will bewritten in the near future. Ah_csize should be 128 bytes, and ah_cea should be 128-byteline aligned (ah_cea[57:63] are ignored).

Read_pe x‘0A52’ Read the process element cache line from the context indicated by ah_cch. Thiscommand is supported only when the PSL is configured in AFU-directed mode and thePSL_SPAP_An Register is initialized. The format for the process element is specified inProcess Element Entry.

PSL returns only the WED field (bits 928:991 of the PEE) and the Master Process bit fromthe SR field (Bit 62 of the PEE) on the buffer interface the same way a normal read returnsdata. The WED field’s location in the cache line does not change (all other PEE informationwill be zeroed).

Read_pe is implied to have ABORT CABT type regardless of the value presented onah_cabt

touch_i x‘0240’ Bring a cache line into the precise cache in the IHPC state without reading data in prepara-tion for a cache line write. Ah_csize should be 128 bytes, and ah_cea should be 128-byteline aligned (ah_cea[57:63] are ignored).

IHPC - The owner of the line is the highest point of coherency but it is holding the line in anI state.

No data is returned to the AFU

touch_s x‘0250’ Bring a cache line into the precise cache in the shared state. Ah_csize should be 128bytes, and ah_cea should be 128-byte line aligned (ah_cea[57:63] are ignored).

No data is returned to the AFU

touch_m x‘0260’ Bring a cache line into the precise cache in modified state. Ah_csize should be 128 bytes,and ah_cea should be 128-byte line aligned (ah_cea[57:63] are ignored).

No data is returned to the AFU

Write_mi x‘0D60’ Write all or part of a cache line and allocate the cache line in the precise cache in modifiedstate. The line goes invalid if a snoop read hits it. This command must be used when thereis an expectation of temporal locality, followed by a use by another processor. Ah_csizemust be a power of 2, and ah_cea must be naturally aligned according to size.

Write_ms x‘0D70’ Depricated. This command will behave exactly the same as Write_mi

Ah_csize must be a power of 2, and ah_cea must be naturally aligned according to size.

push_i x‘0140’ Attempt to accelerate the subsequent writing of a line, previously written by the AFU or byanother processor (push a modified line from the PSL cache). The cacheline will be markedinvalid in the PSL cache once complete. Ah_csize should be 128 bytes, and ah_cea shouldbe 128-byte line aligned (ah_cea[57:63] are ignored).

push_s x‘0150’ Attempt to accelerate the subsequent reading of a line, previously written by the AFU or byanother processor (push a modified line from the PSL cache). The PSL will keep a sharedcopy of the cacheline once complete. Ah_csize should be 128 bytes, and ah_cea shouldbe 128-byte line aligned (ah_cea[57:63] are ignored).

Page 11: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 4Workgroup Specification

Standard Track

Mnemonic Opcode Description

evict_i x‘1140’ Force a line out of the precise cache. Modified lines are castout to system memory.Ah_csize should be 128 bytes, and ah_cea should be 128-byte line aligned (ah_cea[57:63]are ignored).

zero_m x‘1260’ Zero a cache line and allocate it into the precise cache in modified state. AxH_CSIZEshould be 128 byte and AxH_CEA should be 128 byte line aligned (AxH_CEA[57:63] areignored by the design).

cas_e_4B 0x0180> Compare cache value and op1, if equal swap in op2. Operands are 4B. AxH_CSIZEis ignored by the design for this opcode. See Section 1.1.2, “Compare and SwapCommands” [8]

cas_ne_4B 0x0181> Compare cache value and op1, if not equal swap in op2. Operands are 4B. AxH_CSIZEis ignored by the design for this opcode. See Section 1.1.2, “Compare and SwapCommands” [8]

cas_u_4B 0x0182> Swap in op2. Operands are 4B. AxH_CSIZE is ignored by the design for this opcode. SeeSection 1.1.2, “Compare and Swap Commands” [8]

cas_e_8B 0x0183> Compare cache value and op1, if equal swap in op2. Operands are 8B. AxH_CSIZEis ignored by the design for this opcode. See Section 1.1.2, “Compare and SwapCommands” [8]

cas_ne_8B 0x0184> Compare cache value and op1, if not equal swap in op2. Operands are 8B. AxH_CSIZEis ignored by the design for this opcode. See Section 1.1.2, “Compare and SwapCommands” [8]

cas_u_8B 0x0185> Swap in op2. Operands are 8B. AxH_CSIZE is ignored by the design for this opcode. SeeSection 1.1.2, “Compare and Swap Commands” [8]

Table 1.3. PSL Command Opcodes That Do Not Allocate in the PSL Cache

Mnemonic Opcode Description

Read_cl_na 0x0A00 Read a cache line, but do not allocate the cache line into a cache. This command must beused during streaming operations when there is no expectation that the data will be re-usedbefore it is cast out of the cache. Ah_csize must be 128 bytes, and ah_cea must be 128-byte line aligned.

Read_pna 0x0E00 Read all or part of a line without allocation. This command must be used for MMIO.Ah_csize must be a power of 2, and ah_cea must be naturally aligned according to size.

NoteDeprecated feature. Partial read of non-allocated line is no longer supportedand the full line will be read. Use Read_cl_na instead.

Write_na 0x0D00 Write all or part of a cache line, but do not allocate the cache line into a cache. Thiscommand must be used during streaming operations when there is no expectation that thedata will be re-used before it is cast out of the cache. Ah_csize must be a power of 2, andah_cea must be naturally aligned according to size.

Write_inj 0x0D10 Write all or part of a cache line. Do not allocate the cache line into a cache; attempt toinject the data into the highest point of coherency (HPC). Ah_csize must be a power of 2,and ah_cea must be naturally aligned according to size.

Table 1.4. PSL Command Opcodes Reserved for Scratch Pad

Mnemonic Opcode Description

reserved 0x0A10 Reserved for future scratchpad support.

reserved 0x0D30 Reserved for future scratchpad support.

Table 1.5. PSL Command Opcodes Reserved for LPC Services

Mnemonic Opcode Description

reserved 0x0A02 Reserved for future LPC services support.

Page 12: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 5Workgroup Specification

Standard Track

Mnemonic Opcode Description

reserved 0x0D02 Reserved for future LPC services support.

Table 1.6. PSL Command Opcodes for Management

Mnemonic Opcode Description

reserved 0x0100 Reserved

reserved 0x0102 Reserved for future LPC services support.

intreq 0x0000 Request interrupt service. See Section 1.1.4, “Request for Interrupt Service” [9].

restart 0x0001 Stop flushing commands after error. Ah_cea is ignored.

asbnot 0x0103 Send Accelerator Switch Board Notify. See Section 1.1.3, “Request for ASB NotifyService” [9]. AxH_CEA[56:63] should be set to a unique tag which will identify thisASB_Notify message.

NoteThe PSL does not check for duplicate tags.

Table 1.7. PSL Command Opcodes for DMA Address Translation Phase (See AFUDMA Interface [21] for additional information)

Mnemonic Opcode Description

xlat_rd_p0 0x1F00 Request a translation for a DMA read operation to be sent on Port 0.

ah_csize is ignored.

xlat_wr_p0 0x1F01 Request a translation for a DMA write operation to be sent on Port 0.

ah_csize is ignored.

xlat_rd_p1 0x1F08 Request a translation for a DMA read operation to be sent on Port 1.

ah_csize is ignored.

xlat_wr_p1 0x1F09 Request a translation for a DMA write operation to be sent on Port 1.

ah_csize is ignored.

itag_abrt_rd 0x1F02 Abort a previously requested DMA Read operation.

The ITAG being aborted must be provided on ax_cea[55:63].

Aborting an ITAG which has not been previously requested by an xlat_* command orthat has already been used by a transaction on the DMA interface will result in a Failedresponse.

ah_csize is ignored.

NoteIf an AFU implements DMA operation aborting it must reserve at leastone command slot (i.e. one croom credit) for itag aborts in order to avoiddeadlocks.

itag_abrt_wr 0x1F03 Abort a previously requested DMA Write operation.

The ITAG being aborted must be provided on ax_cea[55:63].

Aborting an ITAG which has not been previously requested by an xlat_* command orthat has already been used by a transaction on the DMA interface will result in a Failedresponse.

ah_csize is ignored.

Page 13: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 6Workgroup Specification

Standard Track

Mnemonic Opcode Description

NoteIf an AFU implements DMA operation aborting it must reserve at leastone command slot (i.e. one croom credit) for itag aborts in order to avoiddeadlocks.

xlat_rd_touch 0x1F10 Request a translation for an effective address with read access without committing to aDMA transaction. This means that ha_rditag is not valid with the response received for thiscommand. xlat_rd_touch commands have CABT set implicitly to “Abort” regardless of thevalue of ah_cabt.

ah_csize is ignored.

xlat_rwr_touch 0x1F11 Request a translation for an effective address with write access without committing to aDMA transaction. This means that ha_rditag is not valid with the response received for thiscommand. xlat_rd_touch commands have CABT set implicitly to “Abort” regardless of thevalue of ah_cabt.

ah_csize is ignored.

1.1.1. Command OrderingIn general, the PSL processes commands in a high-performance order. If a particular ordering isrequired between two commands, the AFU must submit the first command and wait for its comple-tion before submitting the second command. For example, the AFU might want to write resultsand then write a flag, indicating to other threads the data is ready. It must submit the result writecommands, wait for all of the completion responses, and then submit the flag write. This way, whenthe other threads read the flag value, they can subsequently correctly read the results.

The PSL has multiple stages of execution, each of which can have an impact on the order in whichcommands are completed.

1.1.1.1. Translation Ordering

Translation ordering is affected by the state of the ah_cabt input to the PSL. This control is animportant way to control the behavior and performance of the PSL.

ah_cabt Translation Ordering Behavior lists the translation ordering behavior.

Table 1.8. ah_cabt Translation Ordering Behavior

ah_cabt Mnemonic Description

000 Strict When a command is sent with Strict ordering the PSL will enter Strict mode. Translation of allcommands received while PSL is in Strict mode (regardless of their CABT value) will proceed in theorder in which they are provided on the AFU Command Interface. (This does not mean they will becompleted in order, only translation order is affected).

If the translation failed or should be retried at a later time an error response will be returned and allsubsequent commands that are currently in the PSL translation queue will receive the “FLUSHED”response. (See Section 1.1.1.2, “Flushed Queue Restarting” [7])

When all commands with Strict ordering have been translated, PSL will return to non-strict mode andfurther command processing will proceed based on the provided CABT encodings.

001 Abort If translation for the command results in a protection violation or fault requiring software interventionthe command will receive the “FAULT” response and an interrupt is sent. Only this command will beterminated.

010 Page Translation will be in order only for addresses which fall in the same page, accesses to differentpages will exit translation in a high performance order.

Page 14: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 7Workgroup Specification

Standard Track

ah_cabt Mnemonic DescriptionThe page size upon which PSL will base the sorting decisions is defined by an internal register.

If the translation failed or should be retried at a later time an error response will be returned and allsubsequent commands to the same page that are currently in the PSL translation queue will receivethe “FLUSHED” response.

011 Pref If translation for a command results in a protection violation or fault requiring software interven-tion the command will receive the “FAILED” response. Only this command will be terminated. Nointerrupt will be generated.

111 Spec If translation for the command results in a protection violation or an ERAT miss, the command willreceive the “FAILED” response. Only this command will be terminated. No interrupt will be generat-ed.

1.1.1.2. Flushed Queue Restarting

When a command with a CABT value of strict or page receives a translation error, the command willreturn the related error response and all subsequent commands will return a “Flushed” response.

When a command receives a translation fault while PSL is in “strict mode” (See Table 1.8, “ah_cabtTranslation Ordering Behavior” [6]), all commands in the queues will be flushed, regardless ofwhether the CABT value of the command which received the translation fault is “Page” or “Strict”.

If the command which failed translation had a “Page” CABT value, PSL will flush all subsequentcommands falling into the same page. The page size upon which PSL will base the sorting decisionsis based on an internal configuration register within the PSL.

PSL will keep flushing commands until a command with the restart opcode (See Table 1.6, “PSLCommand Opcodes for Management ” [5]) is presented on the Command Interface. Once PSLreceives a restart all subsequent commands will not be flushed and will proceed to translation asusual.

The AFU should never send a new restart command before receiving the response for a previouslyrestart sent.

Note

Recommended AFU flow in case of receiving an address translation error on acommand:

1. Immediately stop sending new commands.

2. Send a restart command.

3. Wait for response for the restart command.

4. Continue sending commands.

The flow described above ensures all commands which were affected by the failedcommand are flushed from PSL queues.

1.1.1.3. Strict Address Ordering Pages

AFU designs might need to delay accesses until prior accesses are completed if they need to inter-operate with POWER applications with pages in strict address ordering (SAO) mode.

Page 15: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 8Workgroup Specification

Standard Track

1.1.1.4. Execution OrderingAfter commands have proceeded past address translation, the PSL orders only on a cache-lineaddress basis. Commands to an address are performed after earlier commands to that address andbefore later commands to that address. Order between commands involving different addresses isunpredictable.

1.1.2. Compare and Swap CommandsThere are three general types of compare and swap commands with two different operand sizes (sixdifferent opcodes in total).

All CAS commands implement the following flow:

1. Translate EA provided in AHx_CEA

2. Read cacheline either from system memory or from PSL cache. Let A equal the 8B or 4B(depending on command) located at the address indicated by the EA

3. Read AFU operands via AFU buffer interface (See Section 1.1.2.1, “CAS operand alignment onthe AFU buffer interface” [8] for operand alignment on buffer interface)

4. Compare OP1 and A and then perform the following:

• CAS_E_*:

If OP1 is equal to A then A is replaced with OP2

• CAS_NE_*:

If OP1 is not equal to A then A is replaced with OP2

• CAS_U_*:

A is replaced with OP2 without any comparison

5. Return a response indicating the result of the compare operation (either Comp_EQ orComp_NEQ, see PSL Response Codes. Note that other responses for translation or errorconditions can be returned if the original fetch of data fails.

1.1.2.1. CAS operand alignment on the AFU buffer interfaceThe AFU must provide two operands for the CAS command. OP1 should be provided in its naturallyaligned location on the buffer interface. Addresses for CAS commands must be aligned to 4B or 8Bbased on the operand size.

OP2 is aligned based on the operation’s operand size as shown on Table 1.9, “CAS OperandAlignment on AFU Buffer Interface” [8].

Table 1.9. CAS Operand Alignment on AFU Buffer InterfaceByte (out of the 16B aligned EA address)CAS Operand Size

and ah_cea[60:61] 0 1 2 3 4 5 6 7 8 9 A B C D E F

4B, 0b00 OP1 OP2

Page 16: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 9Workgroup Specification

Standard Track

Byte (out of the 16B aligned EA address)CAS Operand Sizeand ah_cea[60:61] 0 1 2 3 4 5 6 7 8 9 A B C D E F

8B, 0b00 OP1 OP2

4B, 0b01 OP1 OP2

4B, 0b10 OP2 OP1

8B, 0b10 OP2 OP1

4B, 0b11 OP2 OP1

1.1.3. Request for ASB Notify ServiceThe asbnot command is used to generate an ASB_Notify message to the system’s AcceleratorSwitch Board. The intent of this command is to notify/wake up the application thread attached tothis context. The ASB_Notify will use the LPID:PID:TID tuple found in the Process Element Entrypointed to by the A0H_CCH input which is provided with the command and it is sent as a PostedPCIe packet. The PSL will generate a PSL Response Done when the ASB_Notify message hasbeen presented to the upstream logic. The response provides no indication of the notify’s result onthe PowerBus. The actual response from the PowerBus will be returned to the AFU over the ControlInterface (See Section 1.5.4, “AFU Control Interface for ASB Notify Response” [19]) an arbitraryamount of time later. In order to match a response to a request the ASB_Notify command must betagged with a unique 8 bit tag which is provided on A0H_CEA[56:63] along with the command.

1.1.4. Request for Interrupt ServiceThe intreq command is used to generate an interrupt request to the system. Address bits [53:63]indicate the source of the interrupt. Only values 1 - 2047 are supported. A second interrupt requestusing the same source must not be generated to the system until the first request has been serviced.The PSL generates a PSL response DONE when the interrupt request has been presented to theupstream logic. The response provides no indication of interrupt service. The PSL generates a PSLresponse FAILED, if an invalid source number is used.

1.1.5. Parity Handling for the Command InterfaceParity inputs are provided for important fields in the command interface. The command, tag, andaddress are protected by odd parity. Bad parity on any of these buses causes the PSL to return errorstatus for the command. All parity signals on the command interface are valid in the same cycle asah_cvalid.

1.2. AFU Buffer InterfaceData is moved between the PSL and the AFU through the buffer interfaces. When a commandis given to the PSL, it assumes that it can read data from the AFU or write data to the AFU withthe ah_ctag contained in the command. Data is read or written before the command is complet-ed, and it can be read or written more than once before the command is completed. There are twobuffer interfaces present, one for reading during a write operation and one for writing during a readoperation. Each read/write moves a cacheline of data (128 bytes). Requests can arrive at any timeon either interface. Each interface is synchronous, pipelined, and non-blocking. Read requests areserviced, after a small brlat fixed delay, in a pipelined fashion in the order that they are received, sothat data can be directly sent to the PCIe write stream without PSL buffering.

Page 17: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 10Workgroup Specification

Standard Track

Note

The write vs. read indication on the buffer interface is from the perspective of the PSLsince it is reading or writing buffers within the AFU. It is not to be confused with thedirection of data transfer between the AFU and the host for the overall operation withinthe system.

Note

For unaligned data transfers the data is to be sent or received aligned by the byteaddress offset of the command address. For example, if you are writing to an addressthat is 32 Bytes into the line, you should place the data starting at the 32nd Byte of thebrdata buffer (where Byte 0 is in bits 0:7).

Table 1.10. AFU Buffer InterfaceSignal Name Bits Source Description

ha_brvalid 1 PSL This signal is asserted for a single cycle, when a valid read data transfer is presenton the interface. The ha_br* signals are valid during the cycle ha_brvalid is asserted.

The buffer read interface is used for AFU write requests

• This signal can be on for multiple cycles, indicating that data is being returnedon back-to-back cycles.

ha_brtag 8 PSL AFU generated ctag that was given for the AFU write request.

ha_brtagpar 1 PSL Odd parity for ha_brtag valid with ha_brvalid.

ha_brad 6 PSL Half-line index of read data within the transaction.

Since the PSL now moves a full cacheline of data this signal will always be 6'h0.

ah_brlat 4 AFU Read buffer latency. This bus is a static indicator of the access latency of the readbuffer. It must not change while there are commands that have been submitted onthe command interface that have not been acknowledged on the response interface.

It is sampled continuously. However, after a reset, the PSL assumes this is aconstant and that it is static for any particular AFU.

0 Data is ready the cycle after ha_brvalid is asserted.

1 Data is ready the second cycle after ha_brvalid is asserted.

2 Data is ready the third cycle after ha_brvalid is asserted.

All other values are illegal

ah_brdata 1024 AFU Read data.

ah_brpar 16 AFU Odd parity for each 64-bit doubleword of read data. ah_brpar must be provided onthe same cycle as ah_brdata. A parity check fail results in a DERROR responseand SUE data written. Parity is checked across the entire data bus regardless of thetransaction length

ha_bwvalid 1 PSL This signal is asserted for a single cycle when a valid write data transfer is presenton the interface. The ha_bw* signals are valid during the cycle that ha_bwvalid isasserted.

The buffer write interface is used for AFU read requests.

• This signal can be on for multiple cycles indicating that data is being driven onback to back cycles.

ha_bwtag 8 PSL AFU generated ctag that was given for the AFU read request.

ha_bwtagpar 1 PSL Odd parity for ha_bwtag valid with ha_bwvalid.

ha_bwad 6 PSL Half-line index of write data within the transaction.

Page 18: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 11Workgroup Specification

Standard Track

Signal Name Bits Source DescriptionSince the PSL now moves a full cacheline of data this signal will always be 6'h0.

ha_bwdata 1024 PSL Data to be written.

ha_bwpar 16 PSL Odd parity for each 64-bit doubleword of ha_bwdata valid the same cycle asha_bwdata.

1.3. PSL Response InterfaceThe PSL uses the response interface to indicate the completion status of each command and tomanage the command flow control credits. Each command completion can return credits back to theAFU, so that further commands can be sent.

Table 1.11. PSL Response Interface

Signal Name Bits Source Description

ha_rvalid 1 PSL This signal is asserted for a single cycle when a valid response is present on theinterface. The ha_r* signals are valid during the cycle that ha_rvalid is asserted

• This signal can be on for multiple cycles indicating that the responses are beingreturned back to back.

ha_rtag 8 PSL AFU generated ID for the request.

ha_rtagpar 1 PSL Odd parity for ha_rtag valid with ha_rvalid.

ha_rditag 9 PSL DMA Translation Tag for xlat_* requests.

ha_rditagpar 1 PSL Odd parity for ha_rditag.

ha_response 8 PSL Response code. See PSL Response Codes.

ha_response_ext 8 PSL Extra response information received from translation logic.

Currently this signal is only valid with the Context response encode. See the Contextresponse code description for more information on possible values.

ha_pagesize 4 PSL Command translated Page size.

Provided by PSL to allow an AFU to implement a page prediction algorithm for futurecommand requests

On future command requests to addresses to the same page indentified by this sizecan use this information to present on ah_cpagesize for future commands. Thiscauses the Effective to Real address (ERAT) lookup to search this page size first fortranslation which can result in a slight performance increase.

4'b1xxx Not valid for this command type. (e.g. interrupt request)

4'b0000 4k

4'b0010 64k

4'b0011 2M

4'b0100 16M

4'b0101 1G

4'b0111 16G

All other values reserved.

ha_rcredits 9 PSL Unused - always set to 9'b000000001 to indicate 1 credit is always returned withevery response.

ha_rcachestate 2 PSL Reserved.

ha_rcachepos 13 PSL Reserved.

Page 19: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 12Workgroup Specification

Standard Track

Table 1.12. PSL Response Codes

Mnemonic Code Description

DONE 0x00 Command is complete. Any and all data requests have been made for the request to/from thebuffer interface. Data movement between the AFU and the PSL for these requests is complete.

Translation Errors which cause flushing of commands (if original request used Paged or Strict CABT)

AERROR 0x01 Command has resulted in an address translation error.

PAGED 0x0A Command address could not be translated. The operating system has requested that the AFUcome back at a later time with the request. The command has been terminated.

Context 0x0B The process element addressed by the command context handle is not valid.

ha_response_ext contains the reason for the failure:

8’h01 - Process Area Iinvalid. This is a CAPI driver issue.

8’h02 - Context Handle Index invalid. It is outside the range of configured context indexes withinthe Process Area. This is a CAPI driver issue or the AFU issued a different context handle onah_cch than the one given to the AFU via an llcmd.

8’h03 - Context Invalid. The index is within range, but software has not marked it valid. This is aCAPI driver issue or the AFU issued a different context handle on ah_cch than the one given tothe AFU via an llcmd.

The following error responses do not cause flushing of commands

DERROR 0x03 An unrecoverable data related error has been detected for this command.

PSL will signal an error towards the system and may not be able to process any furthercommands.

FLUSHED 0x06 Command follows a command that failed and is flushed. See ah_cabt Translation OrderingBehavior for additional information.

FAULT 0x07 ah_cabt was set to ABORT, and the command address could not be quickly translated. Interrupthas been sent to the operating system or hypervisor. The command has been terminated.

FAILED 0x08 Command that required translation could not be completed because:

• ah_cabt stated no interrupt to software should be sent (SPEC or PREF).

• An interrupt service request that receives this response contained an invalid source numberor used an invalid context.

COMP_EQ 0x0C CAS command completed successfully, OP1 and the corresponding value in the cacheline wereequal.

Note: This result does not indicate whether the swap was performed or not, only the comparisonresult. If the original command was CAS_E_* or CAS_U_* then the swap was performed.

COMP_NEQ 0x0D CAS command completed successfully, OP1 and the corresponding value in the cacheline werenot equal.

Note: This result does not indicate whether the swap was performed or not, only the comparisonresult. If the original command was CAS_NE_* or CAS_U_* then the swap was performed.

COMP_INV 0x0E CAS command provided with invalid address or cacheline was marked as cache inhibited. (Swapwas not performed)

1.3.1. Command/Response FlowPSL Command/Response Flow illustrates the PSL command and response flow.

Page 20: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 13Workgroup Specification

Standard Track

Figure 1.1. PSL Command/Response Flow

Page 21: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 14Workgroup Specification

Standard Track

1.4. AFU MMIO InterfaceThe MMIO interface can be used to read and write MMIO registers and AFU descriptor spaceregisters inside the AFU. The PSL is the command master. It performs a single read or write andwaits for an acknowledgment before beginning another MMIO. MMIO requests that are not acknowl-edged cause an application hang to be detected and an error condition to be reported.

Note MMIO interface requests to valid registers in the AFU must complete with no dependen-cies on the completion of any other command.

An MMIO request is sent to the AFU only when the AFU is enabled. Otherwise, an error conditionis reported. Note that the MMIO address contains a word (4-byte) address; therefore, the last 2 bitsof the true address are dropped at the interface. For an address of 0x30_0108, ha_mmad equals0x0C_0042.

1.4.1. AFU DescriptorAn AFU is required to have an AFU descriptor for System Software to recognize it. The descriptorformat is described in the CAIA Specification.

Table 1.13. AFU MMIO InterfaceSignal Name Bits Source Description

ha_mmval 1 PSL This signal is asserted for a single cycle when an MMIO transfer is present onthe interface. The ha_mm* signals are valid during the cycle that hX_mmval isasserted.

ha_mmcfg 1 PSL Asserted with ha_mmval to indicate the MMIO represents an AFU descriptorspace access.

ha_mmrnw 1 PSL 0 Write

1 Read

ha_mmdw 1 PSL 0 Word (32 bits)

1 Doubleword (64 bits)

ha_mmad 24 PSL MMIO word address. For doubleword access, the address is even.

ha_mmadpar 1 PSL Odd parity for ha_mmad valid with ha_mmval.

ha_mmdata 64 PSL Write data. For word writes, data is replicated onto both halves of the bus.

ha_mmdatapar 1 PSL Odd parity for ha_mmdata valid with ha_mmval and ha_mmrnw equal to ‘0’.

Not valid during an MMIO read (ha_mmrnw = 1).

ah_mmack 1 AFU This signal must be asserted for a single cycle to acknowledge that the writeis complete or the read data is valid.

ah_mmdata 64 AFU Read data. For word reads, data must be supplied on both halves of the bus.

ah_mmdatapar 1 AFU Odd parity for ah_mmdata, valid with ah_mmack.

1.5. AFU Control InterfaceThe AFU control interface is used to control the state of the AFU and sense change in the stateof the AFU as execution ends on the process element. This interface is also used for timebaserequests and responses and asb_notify requests. When an AFU-directed context mode is enabled,

Page 22: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 15Workgroup Specification

Standard Track

the interface indicates updates to the process elements in the scheduled process area. The interfaceis a synchronous interface. Ha_jval is valid for only one cycle per command, and the other commanddescriptor signals are also valid during that cycle. Table 1.14, “AFU Control Interface” [15] showsthe signals used for the AFU control interface.

Table 1.14. AFU Control InterfaceSignal Name Bits Source Description Interface Signal

used with Command

Job Timebase

ASBEvent

ha_jval 1 PSL This signal is asserted for a single cycle when a validjob control command is present. The ha_j* signals arevalid during this cycle.

Y Y Y

ha_jcom 8 PSL Job control command opcode. See PSL ControlCommands on ha_jcom.

Y Y Y

ha_jcompar 1 PSL Odd parity for hX_jcom valid with ha_jval. Y Y Y

ha_jea 64 PSL WED, Timebase info, llcmd info, or asb event info. Y Y Y

hX_jeapar 1 PSL Odd parity for ha_jea valid with ha_jval. Y Y Y

ah_jrunning 1 AFU AFU is running. This signal should transition to a‘1’ after a start command is recognized. It must benegated when the job is complete, in error, or a resetcommand is recognized.

When ah_jrunning transitions for 0b1 to 0b0 ah_jdoneshould also be asserted to indicate the AFU is notenabled.

Y - -

ah_jdone 1 AFU Assert for a single cycle to acknowledge a resetcommand or when the AFU is finished. The ah_jerrorsignal is valid when ah_jdone is asserted.

Y - -

ah_jcack 1 AFU Assert for a single cycle to acknowledge completion ofprocesses associated with an LLCMD notification.

In dedicated-process mode, drive to ‘0’.

Y - -

ah_jerror 64 AFU AFU error code. A ‘0’ means success. If nonzero, theinformation is captured in an error register and aninterrupt is sent.

Y - -

ah_tbreq 1 AFU Single cycle pulse to request that the PSL send atimebase control command with the current timebasevalue.

- Y -

ah_paren 1 AFU If asserted, the AFU supports parity generation onvarious interface buses. The parity is checked by thePSL.

Y Y Y

ha_pclock 1 PSL All AFU interfaces are synchronous to the rising edgeof this 250 MHz clock.

Y Y Y

Table 1.15. PSL Control Commands on ha_jcom Mnemonic Code Description

Start 0x90 Job execution in all modes. Begin running a new context. ha_jea contains the work elementdescriptor in dedicated-process mode.

Reset 0x80 Job execution in all modes. Force into a clean state, erasing all of the state from the previouscontext. This command will be sent prior to a Start command.

Timebase 0x42 Send requested 64-bit timebase value to the AFU on the ha_jea bus.

ASB_NotifyResponse

0x43 Send the system response to ASB_Notify sent by the AFU. HAx_JEA will contain the ASB_Notifytag and the system’s response. See Section 1.1.3, “Request for ASB Notify Service” [9]

LLCMD 0x45 Job execution in AFU directed mode. See AFU Control Interface for LLCMD Operations

Page 23: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 16Workgroup Specification

Standard Track

1.5.1. AFU Control Interface in the Dedicated (Non-Shared) ModeIn a non-shared mode, the hypervisor must always reset then enable the AFU through PSL registercontrols as shown in PSL AFU Control Interface Flow in Non-Shared Mode. While the AFU isenabled, the following functions are possible:

• Requests can be submitted to the PSL through the command interface.

• MMIO requests can be passed from the PSL to the AFU and must be acknowledged.

• Timebase values can be passed to the AFU.

When PSL is initialized for dedicated-process mode, the PSL fetches the process element fromsystem memory. The 64-bit ha_jea indicates the value of the work element descriptor.

Figure 1.2. PSL AFU Control Interface Flow in Non-Shared Mode

Page 24: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 17Workgroup Specification

Standard Track

1.5.2. AFU Control Interface in the AFU Directed (Shared)ModeFigure 1.3, “PSL AFU Control Interface Flow in Non-Shared Mode” [18] shows flow of commandson the accelerator control interface to execute a Job in AFU Directed Mode. As in any mode, thehypervisor must first reset the AFU through the AFU_CNTL register. While the accelerator is enabledthe following functions are possible:

• Requests can be submitted to the PSL through the command interface.

• MMIO requests can be passed from the PSL to the AFU and must be acknowledged.

• Timebase values can be passed to the AFU.

• Process element update commands (LLCMD) can be passed to the accelerator.

While the AFU is Enabled the LLCMD updates are used to add and remove Process Elements(contexts) to the AFU. The AFU indicates which context each command it sends are for on theah_cch interface of the command interface. AFU Enabled mode ends when the accelerator isfinished. The accelerator announces it is finished by asserting ah_jdone, clearing ah_jrunning anddriving any error status on ah_jerror.

Page 25: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 18Workgroup Specification

Standard Track

Figure 1.3. PSL AFU Control Interface Flow in Non-Shared Mode

1.5.3. AFU Control Interface for TimebaseThe AFU requests the latest timebase information by asserting ah_tbreq on the AFU controlinterface for one cycle. Only one request can be issued at a time. The PSL returns the timebaseinformation by asserting ha_jval = ‘1’, ha_jcom = timebase, and ha_jea = timebase value (0:63). IfTimebase is not enabled in the PSL, a Zero timebase value will be returned.

Page 26: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 19Workgroup Specification

Standard Track

1.5.4. AFU Control Interface for ASB Notify ResponseThe ASB Notify Response data indicates whether the ASB Notify command was successful or not.The response data is encoded on ha_jea as follows:

ha_jea[0] = ASB_Notify result. 1'b0 means success, 1'b1 means failure

ha_jea[56:63] = ASB_Notify tag as presented with the command request. See Section 1.1.3,“Request for ASB Notify Service” [9]

1.5.5. AFU Control Interface for LLCMD OperationsThe ha_jcom = LLCMD command notifies the AFU of a change to a process element in the AFU-directed mode. The ha_jea command carries the notification information. The format for ha_jea canbe found in Section 1.5.5.1, “AFU Control Interface for LLCMD Operations” [19]. Only one notifi-cation is presented at a time. The AFU must acknowledge this notification by asserting ah_jcack forone cycle when it has completed processing the notification. Only after an acknowledgment will thePSL be able to deliver another notification.

1.5.5.1. AFU Control Interface for LLCMD Operations

The ah_jcom=LLCMD command is used to notify the accelerator of a change to a process elementin AFU Directed mode. ha_jea caries the LLCMD information. The format for ha_jea can be foundin Table 1.16, “JEA Format for LLCMD” [19] or see the CAIA description of the PSL_LLCMD_Anregister. Only one notification will be presented at a time. Accelerator must acknowledge this notifica-tion by asserting ah_jcack for one cycle when it has completed processing the notification. Only afteran acknowledgment will PSL be able to deliver another notification.

Note

For LLCMD operations that remove, suspend or terminate an element the AFU shouldensure all oustanding commands associated with that element are complete.

Table 1.16. JEA Format for LLCMD

Bits Field Name Description

0:15 Command Command.

x‘0000’ No command.

x‘0001’ terminate_element: Terminate process element at the link provided.

x‘0002’ remove_element: Remove the process element at the link provided.

x‘0003’ suspend_element: Stop executing the process element at the link provided.

x‘0004’ resume_element: Resume executing the process element at the link provided.

x‘0005’ add_element: Software is adding a process element at the link provided.

x‘0006’ update_element: Software is updating the process element state at the link provid-ed.

All other values are reserved.

16:47 Reserved Reserved.

Page 27: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 20Workgroup Specification

Standard Track

Bits Field Name Description

48:63 PE_Handle Process element handle.

The process element handle is used as an offset into the Schedule Process Area to locatethe process context

1.5.6. Process Element EntryEach process element entry is 128-bytes in length. Process Element Entry Format shows the formatof each process element that can be read by the AFU. Only the WED and Master Process bit arevisible to the AFU.

Table 1.17. Process Element Entry FormatProcess Element Entry

Wor

d

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0 -

1 - M -

2 -

3 -

4 -

5 -

6 -

7 -

8 -

9 -

10 -

11 -

12 -

13 -

14 -

15 -

16 -

17 -

18 -

19 -

20 -

21 -

22 -

23 -

24 -

25 -

26 -

27 -

28 -

29 Work Element Descriptor (WED word 0)

30 Work Element Descriptor (WED word 1)

31 -

Page 28: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 21Workgroup Specification

Standard Track

1.6. DMA InterfaceThe AFU DMA interface provides the AFU the ability to send native PCIe transactions without theneed to go to allocate to the cache and register coherency. If PSL is configured to use two PCIeports, there will be two independent DMA interfaces available to the AFU (See Section 1.6.6, “DualPort DMA Configuration” [25]). Each DMA port supports up to 8 outstanding read requests and8 outstanding write/atomic requests. A request is no longer outstanding once its UTAG has beensignaled as sent on the hd_sent_* interface. It is the AFU’s responsibility to protect the DMA portsfrom overflowing, sending more than maximum outstanding requests will result in a PSL error.

Maximum payload supported for DMA is 512B.

Maximum read request size is 512B.

Note

DMA Read or Write requests with a size of 0

Read request completions may be fragmented (See Section 1.6.3, “Sending a DMA Read” [22])

DMA transactions must comply with PCIe address/size alignment. So the AFU must make sure therequested transaction does not cross a 4K boundary. PSL will not fragment such requests or checkfor this violation, sending a violating request will cause an error to be detected in the receiving RootComplex. The PSL DMA logic maintains PCIe ordering rules.

1.6.1. General DMA request flow (Single DMA portconfiguration)In order to send a DMA request, an AFU must acquire an intermediate translation tag (ITAG) for theEffective Address it wishes to use. The following flow describes how to acquire an ITAG.

1. The AFU sends an xlat_rd_p0 or xlat_wr_p0 command on the AFU command interface (SeeTable 1.7, “PSL Command Opcodes for DMA Address Translation Phase (See AFU DMAInterface for additional information)” [5])

2. PSL will translate the EA provided and will return an ITAG via the response interface. (SeeTable 1.11, “PSL Response Interface ” [11])

3. The AFU may now request a DMA read or write by asserting a request on the DMA interfacealong with the ITAG provided by PSL.

4. PSL will send the request and signal to the AFU that it has been sent via the hd_sent_*interface.

5. If the AFU wishes to abort a DMA request after it has received an ITAG but before it has request-ed a transaction on the DMA interface, it must send an itag_abrt_rd or itag_abrt_wr command onthe AFU command interface.

The ITAG the AFU wishes to abort must be provided on ah_cea[55:63]. Failing to do so willresult in performance degradation and may result in a PSL error.

Page 29: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 22Workgroup Specification

Standard Track

Note

An AFU implementing itag abort logic must reserve at least one command credit (i.e.croom credit) for sending abort commands. Failing to do so may result in a deadlockbetween the AFU and PSL.

1.6.2. Sending a DMA WriteThe following flow describes the steps required to send a DMA write (assuming address translationhas been performed and the AFU has acquired an ITAG):

1. AFU asserts dh_dvalid along with request user tag, size, dh_dtype == 3’d1 and data. AFU alsoprovides the ITAG for the DMA write.

2. If the size of the write is larger than 128B, the AFU asserts dh_dvalid along with dh_dtype ==3’d2 and the rest of the data (128B per cycle) for as many cycles as required.

3. After submitting eight write requests to a DMA port, the AFU may submit a ninth only if it hasseen hd_sent_utag_valid asserted for one of the outstanding write requests.

Figure 1.4. AFU DMA Interface Write Request Example

1.6.3. Sending a DMA ReadThe following flow describes the steps required to send a DMA read (assuming address translationhas been performed and the AFU has acquired an ITAG):

1. AFU asserts dh_dvalid along with request user tag, size, dh_dtype == 3’d0. AFU also providesthe ITAG for the DMA read.

2. After submitting eight read requests to a DMA port, the AFU may submit a ninth only if it hasseen hd_sent_utag_valid asserted for one of the outstanding read requests.

3. When the PSL receives a completion for a previously requested read, it will present the comple-tion information and data on the hd_cpl_* interface. (See Section 1.6.4, “Receiving a ReadCompletion” [23])

Page 30: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 23Workgroup Specification

Standard Track

Figure 1.5. AFU DMA Interface Read Request Example

1.6.4. Receiving a Read CompletionIf a read request is for 128B or less and does not cross a 128B address boundary then the requestwill be answered by a single completion. In this case all completion data will be returned in a singlecycle over the hd_cpl_* interface.

However, if the read requests more than 128B of data or if a 128B address boundary is crossed, thecompletion may be fragmented by the host PCIe port and therefore there may be several completiontransactions over the hd_cpl_* interface.

If the read requests more than 256B of data, it is guaranteed that there will be multiple completiontransactions.

1.6.4.1. Single Cycle, Single Completion Flow (128B or lesscompletions)1. AFU requests a DMA read with dh_dsize less than or equal to 128B and does not cross 128B

boundary

2. Host returns a single completion

3. hd_cpl_valid is asserted along with:

• hd_cpl_utag (same UTAG of the original read request)

• hd_cpl_type == 3’b0

• hd_cpl_size is the requested read size

• hd_cpl_laddr indicates the starting address of the first byte returned

• hd_cpl_byte_count will equal hd_cpl_size

1.6.4.2. Multi-Cycle, Single Completion Flow (Completion notfragmented by the host)1. AFU requests a DMA read and the completion is not fragmented by the host

2. Host returns a single completion

3. hd_cpl_valid is asserted along with:

Page 31: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 24Workgroup Specification

Standard Track

• hd_cpl_utag (same UTAG of the original read request)

• hd_cpl_type == 3’b0

• hd_cpl_size is the requested read size

• hd_cpl_laddr indicates the starting address of the first byte returned

• hd_cpl_byte_count will equal hd_cpl_size

4. The following valid cycle (based on clock frequency there may be gaps between step 3 and step4)

hd_cpl_valid is asserted along with:

• hd_cpl_utag (same UTAG of the original read request)

• hd_cpl_type == 3’b1

• hd_cpl_size is not valid

• hd_cpl_laddr is not valid

• hd_cpl_byte_count is not valid

1.6.4.3. Multi Completion Flow (Completion fragmented by the host)1. AFU requests a DMA read and the completion is fragmented by the host

2. Host returns a multiple completions for the request

3. For each returned completion separately hd_cpl_valid is asserted along with:

• hd_cpl_utag (same UTAG of the original read request)

• hd_cpl_type == 3’b0

• hd_cpl_size is the size of the current completion data being transfered

• hd_cpl_laddr indicates the starting address of the first byte returned

• hd_cpl_byte_count equals the amount of bytes left to return including the bytes provided inthe current completion. For the last completion fragment, byte count equals hd_cpl_size

4. The following clock cycle (if valid) hd_cpl_valid is asserted along with:

• hd_cpl_utag (same UTAG of the original read request)

• hd_cpl_type == 3’b1

• hd_cpl_size is not valid

• hd_cpl_laddr is not valid

• hd_cpl_byte_count is not valid

Page 32: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 25Workgroup Specification

Standard Track

1.6.5. Data AlignmentThe DMA interface expects data to be aligned according to a First Byte First scheme. This meansthat the first valid data byte should be located at dh_ddata[0:7], even if the address is not 128Baligned.

All PCIe data alignment rules apply to the DMA interface:

• Data may not cross a 4KB boundary

For example, if the AFU wishes to write 5 Bytes to Address x105, the data presented on the dh_datainterface should be presented as shown in the following figure:

Figure 1.6. DMA Write Data Alignment

Note

1. An ITAG is “single use”. Once a DMA transaction has been requested using an ITAG,no other transaction may use the same ITAG unless it was received as a result of afollowing xlat_* request. Re-using an ITAG without performing a translation requeston the command interface will result in an error.

2. An AFU may request any number of address translations before sending actualDMA requests. However, the PSL expects any address translated to be used in adeterministic time on the DMA I/F.

The AFU should not maintain dependencies between DMA requests. For example: Ifan AFU requests translation for EA1 and EA2, and it receives the ITAG for EA2 first.The AFU is not allowed to postpone the transmission of the DMA request for EA2until the ITAG for EA1 is received.

1.6.6. Dual Port DMA ConfigurationIf the configuration of the card supports a Dual Port DMA Configuration, a second DMA only port willbe available to the AFU (All coherent and general management traffic goes only through port 0).

The AFU is in charge of splitting its data between the two ports, and therefore must request transla-tions indicating the port on which the transaction will be sent. If 2 ports are supported, the DMAinterface described in Table 1.20, “AFU DMA Interface ('x' Denotes the DMA Port Number)” [28] isduplicated.

The general transaction flow for using 2 ports is as follows:

Page 33: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 26Workgroup Specification

Standard Track

1. The AFU requests a translation using xlat_*_p0 or xlat_*_p1 based on the port it wishes to sendthe transaction on.

2. When the AFU receives the ITAG for the requested translation it sends it over the correspondingDMA interface.

3. The two DMA interfaces are independent and can process requests or return completions inparallel.

Note• When aborting a DMA transaction (using itag_abort_*) there is no need to indicate

the port.

• Requesting a DMA transaction on one port with an ITAG that was translated for theother port will result in an error.

1.6.7. Atomic OperationsPSL supports the sending of special atomic operations by encapsulating the commands insideregular PCIe DMA posted writes.

Atomic operation data is composed of a 16B operand payload (See Section 1.6.7.2, “AtomicOperand Alignment” [27]). Atomic operations may use either 4B or 8B operands and may requirea response.

PSL does not keep track of outstanding Atomic requests, therefore it is up to the AFU to make surethat any operation requiring a completion did indeed receive one. The AFU should also identifyunexpected responses. How the AFU reports such errors is implementation specific.

1.6.7.1. Atomic Operations General FlowThe flow for sending an atomic operation is similar to the regular DMA flow:

1. The AFU must first acquire an ITAG for the EA on which the atomic operation will be performed.This is done in the same way an ITAG is acquired for a regular DMA write operation, by sendingan xlat_wr_p* on the AFU command interface. An atomic operation’s EA must be aligned to theoperand it is using.

2. Once the AFU has an ITAG for the command it can proceed to request the operation on theDMA interface. The AFU asserts dh_dvalid along with the following:

• dh_req_utag - 8 bit unique tag to identify the request

• dh_req_itag - The ITAG received by the corresponding xlat request. If the address used bythe atomic operation is not aligned to the operand size, PSL will return an error.

• dh_dtype == 3’d3 - Indicating this is an atomic op.

• dh_dsize - The size of the operands used by the atomic op. Only 4B or 8B operands aresupported. Any other values will result in an error.

• dh_data - This contains the operands to be used by the operation. See Section 1.6.7.2,“Atomic Operand Alignment” [27] for data alignment.

Page 34: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 27Workgroup Specification

Standard Track

• dh_datomic_op - This contains the opcode for the atomic operation. See Section 1.6.7.3,“Atomic Operations Opcodes” [28] for all available atomic operation opcodes.

• dh_datomic_op_le - If set, atomic command will use little endian. PSL itself does not act onthis indication, only passes it on to the host.

3. Once the request has been sent on the PCIe link, PSL will assert hd_sent_utag_valid alongwith the relevant UTAG and the status. Atomic operations are treated as DMA writes by PSL,therefore an atomic request takes up one of the 8 available buffers for DMA writes.

4. If a completion is received for the atomic operation PSL will assert hd_cpl_valid along with thefollowing:

• hd_cpl_utag - The unique tag identifying the request (matching the UTAG presented by theAFU with the original request).

• hd_cpl_type == 3’d4 (Atomic completion)

• hd_cpl_size is either 4B or 8B in length in correlation with the operands in the originalrequest.

• hd_cpl_laddr - Only bits [6:7] are valid for an atomic completion. These are bits [6:7] of theaddress used for the original atomic operation.

• hd_cpl_data - The returned atomic completion data starting at byte 0.

NoteOnly an 8 bit tag can be used for atomics versus 10 bits for DMA operations.

For atomic operations which do not require responses, a tag may be considered retiredas soon as PSL indicates that the command has been sent.

For atomic operations that require a reponse, a tag is retired only once the responseis received. It is the AFU’s responsibility to track atomic tag usage and make sure noduplicates are used.

1.6.7.2. Atomic Operand AlignmentAtomic operation operand data should begin on dh_ddata byte 0. The operand alignment within the16B of valid data is decided based on bits [60:61] of the EA used by the operation and the size of theoperands.

Table 1.18. Legal Operand Alignment and Data Placement for Atomic OperandsByteAddress 60:61

0 1 2 3 4 5 6 7 8 9 A B C D E F

OP1 - OP2 -00

OP1 OP2

01 - OP1 - OP2

OP2 - OP1 -10

OP2 OP1

Page 35: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 28Workgroup Specification

Standard Track

ByteAddress 60:61

0 1 2 3 4 5 6 7 8 9 A B C D E F

11 - OP2 - OP1

1.6.7.3. Atomic Operations OpcodesThe following lists the available atomic opcodes. Using an unsupported opcode will result in an error.For more information on how atomic commands are processed by the P9 processor, please see thePower ISA.

Table 1.19. Atomic OpcodesValue Command Expect Completion?

000000 Fetch and ADD Y

000001 Fetch and XOR Y

000010 Fetch and OR Y

000011 Fetch and AND Y

000100 Fetch and Max Unsigned Y

000101 Fetch and Max Signed Y

000110 Fetch and Min Unsigned Y

000111 Fetch and Min Signed Y

010000 Compare and Swap Not Equal Y

010001 Compare and Swap Equal Y

001000 Compare and Swap Unconditional Y

011000 Fetch and Increment Bounded Y

011001 Fetch and Increment Equal Y

011100 Fetch and Decrement Bounded Y

100000 Store ADD N

100001 Store XOR N

100010 Store OR N

100011 Store AND N

100100 Store Max Unsigned N

100101 Store Max Signed N

100110 Store Min Unsigned N

100111 Store Min Signed N

111000 Store Twin N

1.6.8. DMA Hardware InterfaceTable 1.20. AFU DMA Interface ('x' Denotes the DMA Port Number)Signal Name Bits Source Description

dxh_dvalid 1 AFU Qualifies a DMA request by the AFU.

dxh_req_utag 10 AFU Transaction request attribute:

User transaction tag

If dxh_dtype == 3’d3 (Atomic), only bits [2:9] are valid (Bits [0:1] are ignoredby PSL)

Qualified by dxh_dvalid

Page 36: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 29Workgroup Specification

Standard Track

Signal Name Bits Source Description

dxh_req_itag 9 AFU Transaction request attribute:

User transaction identifier

Qualified by dxh_dvalid

dxh_dtype 3 AFU Transaction request attribute:

3'd0 - DMA Read Request

3'd1 - DMA Write Request + up to 128B of data

3'd2 - DMA Write Data (if writing more than 128B of data)

3'd3 - Atomic Request

All other values are reserved

Qualified by dxh_dvalid

dxh_dsize 10 AFU Transaction request attribute:

Transaction size in bytes

Maximum size of 512B is supported, and size must not be equal to 0

If dxh_dtype == 3’d3 (Atomic), only the values of 10'd4 and 10'd8 are valid

Qualified by dxh_dvalid

dxh_ddata 1024 AFU Transaction request attribute:

Data alignment is first byte first

Qualified by dxh_dvalid and dxh_dtype is 3'd1 or 3'd2 or 3'd3

dxh_datomic_op 6 AFU Transaction request attribute:

Atomic operation requested

See Table 1.19, “Atomic Opcodes” [28] for possible opcodes

Qualified by dxh_dvalid and dxh_dtype is 3'd3

dxh_datomic_le 1 AFU Transaction request attribute:

Little endian used

Qualified by dxh_dvalid and dxh_dtype is 3'd3

hdx_sent_utag_valid 1 PSL Signal will be asserted for a single cycle to indicate a request has been senton the PCIe interface

hdx_sent_utag 10 PSL Sent request attribute:

Indicates UTAG of the request sent by the PSL

Qualified by hdx_sent_utag_valid

hdx_sent_utag_sts 3 PSL Sent request attribute:

3'd0 - DMA Read sent on the link

3'd1 - DMA Write or Atomic sent on the link

3'd2 - Transaction Failed

3'd3 - Transaction Flushed and was not sent on the link

All other values are reserved

Qualified by hdx_sent_utag_valid

hdx_cpl_valid 1 PSL DMA/Atomic Completion Received

Page 37: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 30Workgroup Specification

Standard Track

Signal Name Bits Source DescriptionWhen asserted indicates that completion data is valid on the hdx_cpl_*interface

hdx_cpl_utag 10 PSL DMA/Atomic Completion attribute:

Indicates the UTAG associated with the received completion data

Qualified by hdx_cpl_valid

hdx_cpl_type 3 PSL DMA/Atomic Completion attribute:

Indicates the type of response received with the current completion

3’d0 - Read completion + up to 128B of data

3’d1 - Completion data (If completion larger than 128B)

3’d2 - Completion Error (PCIe indicated completion error)

3’d3 - Completion data corrupted (PCIe poisoned bit set)

3’d4 - Atomic response data

All other values are reserved

Qualified by hdx_cpl_valid

hdx_cpl_size 10 PSL DMA Completion attribute:

Indicates the size of the current completion data being transfered over thehdx_cpl* interface

See Section 1.6.4, “Receiving a Read Completion” [23]

Qualified by hdx_cpl_valid and hdx_cpl_type is 3'd0

hdx_cpl_laddr 10 PSL DMA/Atomic Completion attribute:

Lower address bits of the received completion

See Section 1.6.4, “Receiving a Read Completion” [23]

Qualified by hdx_cpl_valid and hdx_cpl_type is 3'd0 or 3'd4

hdx_cpl_byte_count 10 PSL DMA Completion attribute:

Indicates the remaining amount of bytes required to complete the originatingread request (including the number of bytes being transfered in the currenttransaction)

See Section 1.6.4, “Receiving a Read Completion” [23]

Qualified by hdx_cpl_valid and hdx_cpl_type is 3'd0

hdx_cpl_data 1024 PSL DMA Completion attribute:

Completion data received

Data alignment is First Byte first (in bits 0:7)

Qualified by hdx_cpl_valid and hdx_cpl_type is 3'd0 or 3'd4 for an initialdata cycle. hdx_cpl_type is 3'd1 for following data cycles of a multiple cyclecompletion.

Page 38: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 31Workgroup Specification

Standard Track

2. Timing Diagram ExamplesFigure 2.1. Control Interface, Reset

Figure 2.2. Control Interface, Start

Figure 2.3. Command Interface, Read_cl_na

Figure 2.4. Buffer Interface, Write of buffer from Read_cl_na

Page 39: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 32Workgroup Specification

Standard Track

Figure 2.5. Response Interface, Read_cl_na complete

Page 40: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 33Workgroup Specification

Standard Track

3. Conformance to this SpecificationThe following lists a set of numbered conformance clauses to which any implementation of thisspecification must adhere in order to claim conformance to this specification (or any optional portionthereof): All interface signals between the PSL and AFU are required to be implemented even ifthey are driven to a constant value. This document is a first attempt at identifying which items arerequired to be supported by the PSL and which items are required by the AFU.

3.1. AFU Command Interface3.1.1. Interface Signals3.1.1.1. PSL

PSL supports all signals driven functionally as defined.

3.1.1.2. AFU

For the AFU the parity signals are optional to be driven to correct parity. AFU support of parity issignaled to the PSL via the control signal ah_paren.

ah_cabt - translation modes - it is optional which translation modes the AFU supports.

ah_cch - Only required in AFU-directed context mode, drive to 0's in other modes.

ah_cpagesize - it is optional for the AFU to provide a page size hint. If not hint is going to be provid-ed a value of 4'b0xxx should be driven

3.1.2. Command Opcodes3.1.2.1. PSL

PSL supports all command opcodes.

3.1.2.2. AFU

All opcodes are optional for the AFU.

3.2. AFU Buffer Interface3.2.1. Interface Signals3.2.1.1. PSL

PSL supports all signals driven functionally as defined.

ah_brlat: PSL required to support values of 0, 1 and 2, all other values optional.

Page 41: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 34Workgroup Specification

Standard Track

3.2.1.2. AFUFor the AFU the parity signals are optional to be driven to correct parity. AFU support of parity issignaled to the PSL via the control signal ah_paren.

ah_brlat: PSL only required to support values of 0, 1 and 2. AFU required to support one of thesevalues.

3.3. PSL Response Interface3.3.1. Interface Signals3.3.1.1. PSLPSL supports all signals driven functionally except ha_rcachestate and ha_rcachepos (which aremarked reserved).

3.3.1.2. AFUha_rcachestate and ha_rcachepos (marked reserved from the PSL) can be terminated.

ha_rditag will only contain valid response information if the DMA interface is used and xlat_*requests are sent on the command interface.

ha_pagesize information is not required to be tracked and presented on ah_cpagesize on futurecommands to the same page. This is a performance optimization only.

3.3.2. Response Codes3.3.2.1. PSLThe PSL is required to support sending all response codes.

3.3.2.2. AFUThe AFU is not required to support COMP_* responses unless it sends CAS_* commands on theCommand Interface.

3.4. AFU MMIO Interface3.4.1. Interface Signals3.4.1.1. PSLThe PSL is required to support all signals.

3.4.1.2. AFUThe AFU is required to have an AFU descriptor space accessed via ha_mmval along withha_mmcfg. It is optional for an AFU to have MMIO space (which is reported in the AFU descriptor),

Page 42: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 35Workgroup Specification

Standard Track

but it is suggested so that errors and status can be logged for the application. For the AFU the paritysignals are optional to be driven to correct parity. AFU support of parity is signaled to the PSL via thecontrol signal ah_paren.

3.5. AFU Control Interface3.5.1. Interface Signals3.5.1.1. PSLThe PSL is required to support all signals.

3.5.1.2. AFUThe optional signals are:

ah_tbreq: Can be driven to '0' if timebase is not required.

ah_jcack: Only used if the afu supports afu-directed mode. Driven to '0' in dedicated process.

3.5.2. Control Commands3.5.2.1. PSLThe PSL is required to support all commands.

3.5.2.2. AFUThe AFU will not see a Timebase command if it doesn't support sending a timebase request byasserting ah_tbreq.

The AFU will not see an LLCMD command code unless it supports AFU directed mode.

The AFU will not see an ASB_Notify Response if it doesn't send any ASB_Notify commands.

3.5.3. DMA Interface3.5.3.1. PSLThe PSL is required to support all signals.

3.5.3.2. AFUThe AFU is not required to use this inteface if it does not support DMA commands

Page 43: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 36Workgroup Specification

Standard Track

GlossaryACK

Acknowledgment. A transmission that is sent as an affirmative response to a data transmission.

AFUAccelerator functional unit.

ALUTAdaptive lookup table.

AMORAuthority Mask Override Register.

AMRAuthority Mask Register.

architectureA detailed specification of requirements for a processor or computer system. It does not speci-fy details of how the processor or computer system must be implemented; instead it provides atemplate for a family of compatible implementations.

AURPAccelerator Utilization Record Pointer.

Big endianA byte-ordering method in memory where the address n of a word corresponds to the most-sig-nificant byte. In an addressed memory word, the bytes are ordered (left to right) 0, 1, 2, 3, with 0being the most-significant byte. See little endian.

CacheHigh-speed memory close to a processor. A cache usually contains recently accessed data or in-structions, but certain cache-control instructions can lock, evict, or otherwise modify the cachingof data or instructions.

Caching inhibitedA memory update policy in which the cache is bypassed, and the load or store is performed toor from system memory. A page of storage is considered caching inhibited when the "I" bit has avalue of "1" in the page table. Data located in caching inhibited pages cannot be cached at anymemory hierarchy that is not visible to all processors and devices in the system. Stores must up-date the memory hierarchy to a level that is visible to all processors and devices in the system.

CAIACoherent Accelerator Interface Architecture. Defines an architecture for loosely coupled coherentaccelerators. The Coherent Accelerator Interface Architecture provides a basis for the develop-ment of accelerators coherently connected to a POWER processor.

CAPICoherent Accelerator Process Interface.

CAPPCoherent Attached Processor Proxy. Coherence Refers to memory and cache coherence. Thecorrect ordering of stores to a memory address, and the enforcement of any required cache

Page 44: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 37Workgroup Specification

Standard Track

write-backs during accesses to that memory address. Cache coherence is implemented by ahardware snoop (or inquire) method, which compares the memory addresses of a load requestwith all cached copies of the data at that address. If a cache contains a modified copy of the re-quested data, the modified data is written back to memory before the pending load request isserviced.

CSRPContext Save/Restore Area Pointer.

DLLDelay locked loop.

DMADirect memory access. A technique for using a special-purpose controller to generate the sourceand destination addresses for a memory or I/O transfer.

DSISRData Storage Interrupt Status Register.

DSPDigital signal processor.

EAHPSL effective address high.

EALPSL effective address low.

EAEffective address. An address generated or used by a program to reference memory. A memo-ry-management unit translates an effective address to a virtual address, which it then translatesto a real address (RA) that accesses real (physical) memory. The maximum size of the effec-tive-address space is 264 bytes.

ELFExecutable and linkable format.

ERATEffective-to-real-address translation, or a buffer or table that contains such translations, or a ta-ble entry that contains such a translation. Exception An error, unusual condition, or external sig-nal that can alter a status bit and causes a corresponding interrupt, if the interrupt is enabled.See interrupt. Fetch Retrieving instructions from either the cache or system memory and placingthem into the instruction queue.

FPGAField-programmable gate array.

HAURPHypervisor Accelerator Utilization Record Pointer.

hcallHypervisor call.

Page 45: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 38Workgroup Specification

Standard Track

HPCHighest point of coherency. Hypervisor A control (or virtualization) layer between hardware andthe operating system. It allocates resources, reserves resources, and protects resources among(for example) sets of AFUs that may be running under different operating systems.

IHPCThe owner of the line is the highest point of coherency but it is holding the line in an "I" state. Im-plementation A particular processor that conforms to the architecture but might differ from otherarchitecture-compliant implementations. For example, in design this could be the feature set andimplementation of optional features.

INTInterrupt. A change in machine state in response to an exception. See exception. Interrupt pack-et Used to signal an interrupt, typically to a processor or to another interruptible device.

ISAInstruction set architecture.

JEAJob effective address.

KBKilobyte.

LAA local storage (LS) address of an PSL list. It is used as a parameter in an PSL command.

Least-significant bitThe bit of least value in an address, register, data element, or instruction encoding. Little endianA byte-ordering method in memory where the address n of a word corresponds to the least-sig-nificant byte. In an addressed memory word, the bytes are ordered (left to right) 3, 2, 1, 0, with 3being the most-significant byte. See big endian.

LISNLogical interrupt service number. Logical partitioning A function of an operating system that en-ables the creation of logical partitions.

LPARLogical partitioning.

LPCLowest point of coherency.

LPIDLogical-partition identity.

LSbLeast-significant bit

LSBLeast-significant byte

Main storageThe effective-address space. It consists physically of real memory (whatever is external to thememory-interface controller), Local Storage, memory-mapped registers and arrays, memo-

Page 46: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 39Workgroup Specification

Standard Track

ry-mapped I/O devices, and pages of virtual memory that reside on disk. It does not includecaches or execution-unit register files.

MaskA pattern of bits used to accept or reject bit patterns in another set of data. Hardware interruptsare enabled and disabled by setting or clearing a string of bits, with each interrupt assigned a bitposition in a mask register.

MBMegabyte.

Memory coherencyAn aspect of caching in which it is ensured that an accurate view of memory is provided to all de-vices that share system memory.

Memory mappedMapped into the Coherent Attached Accelerator's addressable-memory space. Registers, localstorage (LS), I/O devices, and other readable or writable storage can be memory-mapped. Privi-leged software does the mapping.

MMIOMemory-mapped I/O.

PIDProcess ID.

PSLPOWER service layer. It is the interface logic for a coherently attached accelerator and providestwo main functions: moves data between accelerator function units (AFUs) and main storage,and synchronizes the transfers with the rest of the processing units in the system.

MMIOMemory-mapped input/output. See memory mapped.

MMUMemory management unit. A functional unit that translates between effective addresses (EAs)used by programs and real addresses (RAs) used by physical memory. The MMU also providesprotection mechanisms and other functions. Most-significant bit The highest-order bit in an ad-dress, registers, data element, or instruction encoding.

MRUSee most recently used.

MSbMost-significant bit.

PageA region in memory. The Power ISA defines a page as a 4 KB area of memory, aligned on a 4KB boundary or a large-page size which is implementation dependent. Page table A table thatmaps virtual addresses (VAs) to real addresses (RAs) and contains related protection parame-ters and other information about memory locations.

PCIePeripheral Component Interconnect Express.

Page 47: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 40Workgroup Specification

Standard Track

PLLPhase locked loop.

POWEROf or relating to the Power ISA or the microprocessors that implement this architecture.

Power ISAA computer architecture that is based on the third generation of reduced instruction set comput-er (RISC) processors. The Power ISA was developed by IBM. Privileged mode Also known assupervisor mode. The permission level of operating system instructions. The instructions aredescribed in PowerPC Architecture, Book III and are required of software that accesses sys-tem-critical resources. Privileged software Software that has access to the privileged modes ofthe architecture. Problem state The permission level of user instructions. The instructions are de-scribed in Power ISA, Books I and II and are required of software that implements applicationprograms.

PSLPOWER service layer.

PTEPage table entry. See page table.

RAMRandom access memory.

RAReal address. An address for physical storage, which includes physical memory, local storage(LS), and memory mapped I/O registers. The maximum size of the real-address space is 250

bytes.

SAOStrict address ordering.

SLBSegment lookaside buffer. It is used to map an effective address to a virtual address.

SPAScheduled processes area.

SSTPStorage segment table pointer.

Storage modelA CAPI User's Manual-compliant accelerator implements a storage model consistent with thePower ISA. For more information about storage models, see the Coherent Accelerator InterfaceArchitecture document.

SUESpecial uncorrectable error.

TAGPSL command tag.

Page 48: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 41Workgroup Specification

Standard Track

Tag groupA group of PSL commands. Each PSL command is tagged with an n-bit tag group identifier. AnAFU can use this identifier to check or wait on the completion of all queued commands in one ormore tag groups.

TGTag parameter.

TIDThread ID.

TLBTranslation lookaside buffer. An on-chip cache that translates virtual addresses (VAs) to real ad-dresses (RAs). A TLB caches page-table entries for the most recently accessed pages, therebyeliminating the necessity to access the page table from memory during load-store operations.

UAMORUser Authority Mask Override.

VA Virtual address.An address to the virtual-memory space, which is typically much larger than the real addressspace and includes pages stored on disk. It is translated from an effective address by a segmen-tation mechanism and used by the paging mechanism to obtain the real address (RA). The maxi-mum size of the virtual-address space is 268 bytes.

VHDLVHSIC Hardware Description Language.

WIMGMemory/cache attributes for PowerPC Power Architecture. Each letter of WIMG represents aone bit access attribute, specifically: Write-Through Access (W), Cache-Inhibited Access (I),memory Coherence (M), and Guarded (G).

WEDWork element descriptor.

Page 49: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 42Workgroup Specification

Standard Track

Appendix A. OpenPOWER FoundationoverviewThe OpenPOWER Foundation was founded in 2013 as an open technical membership organiza-tion that will enable data centers to rethink their approach to technology. Member companies areenabled to customize POWER CPU processors and system platforms for optimization and innova-tion for their business needs. These innovations include custom systems for large or warehousescale data centers, workload acceleration through GPU, FPGA or advanced I/O, platform optimiza-tion for SW appliances, or advanced hardware technology exploitation. OpenPOWER members areactively pursing all of these innovations and more and welcome all parties to join in moving the stateof the art of OpenPOWER systems design forward.

To learn more about the OpenPOWER Foundation, visit the organization website atopenpowerfoundation.org.

A.1. Foundation documentationKey foundation documents include:

• Bylaws of OpenPOWER Foundation

• OpenPOWER Foundation Intellectual Property Rights (IPR) Policy

• OpenPOWER Foundation Membership Agreement

• OpenPOWER Anti-Trust Guidelines

More information about the foundation governance can be found at openpowerfoundation.org/about-us/governance.

A.2. Technical resourcesDevelopment resouces fall into the following general categories:

• Foundation work groups

• Remote development environments (VMs)

• Development systems

• Technical specifications

• Software

• Developer tools

The complete list of technical resources are maintained on the foundation Technical Resources webpage.

Page 50: PSL / AFU Interface - CAPI 2cdn.openpowerfoundation.org/.../v2-psl-afu-spec-20170809.pdf · PSL / AFU Interface August 9, 2017 Version 1.0 OpenPOWER Foundation vii Workgroup Specification

PSL / AFU Interface August 9, 2017 Version 1.0

OpenPOWER Foundation 43Workgroup Specification

Standard Track

A.3. Contact the foundationTo learn more about the OpenPOWER Foundation, please use the following contact points:

• General information -- <[email protected]>

• Membership -- <[email protected]>

• Technical Work Groups and projects -- <[email protected]>

• Events and other activities -- <[email protected]>

• Press/Analysts -- <[email protected]>

More contact information can be found at openpowerfoundation.org/get-involved/contact-us.