Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET....

35
Persistent Memory over Fabrics An Application-centric view Paul Grun Cray, Inc OpenFabrics Alliance Vice Chair

Transcript of Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET....

Page 1: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

Persistent Memory over Fabrics An Application-centric view

Paul Grun Cray, Inc

OpenFabrics Alliance Vice Chair

Page 2: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Agenda OpenFabrics Alliance Intro OpenFabrics Software Introducing OFI - the OpenFabrics Interfaces Project OFI Framework Overview – Framework, Providers Delving into Data Storage / Data Access Three Use Cases

2

Page 3: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

OpenFabrics Alliance

The OpenFabrics Alliance (OFA) is an open source-based organization that develops, tests, licenses, supports and distributes OpenFabrics Software (OFS). The Alliance’s mission is to develop and promote software that enables maximum application efficiency by delivering wire-speed messaging, ultra-low latencies and maximum bandwidth directly to applications with minimal CPU overhead.

https://openfabrics.org/index.php/organization.html

3

Page 4: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

OpenFabrics Alliance – selected statistics - Founded in 2004 - Leadership

- Susan Coulter, LANL – Chair - Paul Grun, Cray Inc. – Vice Chair - Bill Lee, Mellanox Inc. – Treasurer - Chris Beggio, Sandia – Secretary (acting) - 14 active Directors/Promoters (Intel, IBM, HPE, NetApp, Oracle, Unisys, Nat’l Labs…)

- Major Activities

1. Develop and support open source network stacks for high performance networking - OpenFabrics Software - OFS

2. Interoperability program (in concert with the University of New Hampshire InterOperability Lab) 3. Annual Workshop – March 27-31, Austin TX

- Technical Working Groups

- OFIWG, DS/DA, EWG, OFVWG, IWG - https://openfabrics.org/index.php/working-groups.html (archives, presentations, all publicly available)

4

Page 5: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

OpenFabrics Alliance – selected statistics - Founded in 2004 - Leadership

- Susan Coulter, LANL – Chair - Paul Grun, Cray Inc. – Vice Chair - Bill Lee, Mellanox Inc. – Treasurer - Chris Beggio, Sandia – Secretary (acting) - 14 active Directors/Promoters (Intel, IBM, HPE, NetApp, Oracle, Unisys, Nat’l Labs…)

- Major Activities

1. Develop and support open source network stacks for high performance networking - OpenFabrics Software - OFS

2. Interoperability program (in concert with the University of New Hampshire InterOperability Lab) 3. Annual Workshop – March 27-31, Austin TX

- Technical Working Groups

- OFIWG, DS/DA, EWG, OFVWG, IWG - https://openfabrics.org/index.php/working-groups.html (archives, presentations, all publicly available)

today’s focus

5

Page 6: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

OpenFabrics Software (OFS)

uverbs kverbs

IB

ENET

IB

iWAR

P

verbs api

phy wire

RoCE

IB

ENET

OpenFabrics Software Open Source APIs and software for advanced networks Emerged along with the nascent InfiniBand industry in 2004 Soon thereafter expanded to include other IB-based networks such as RoCE and iWARP Wildly successful, to the point that RDMA technology is now being integrated upstream. Clearly, people like RDMA. OFED distribution and support: managed by the Enterprise Working Group – EWG Verbs development: managed by the OpenFabrics Verbs Working Group - OFVWG

6

Page 7: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved. 7

Historically, network APIs have been developed ad hoc as part of the development of a new network. To wit: today’s Verbs API is the implementation of the verbs semantics specified in the InfiniBand Architecture. But what if a network API was developed that catered specifically to the needs of its consumers, and the network beneath it was allowed to develop organically? What would such a resulting API look like?

Page 8: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Introducing the OpenFabrics Interfaces Project

- OpenFabric Interfaces Project (OFI) – Proposed jointly by Cray and Intel, August 2013 – Chartered by the OpenFabrics Alliance, w/ Cray and Intel as co-chairs

- Objectives - ‘Transport neutral’ network APIs - ‘Application centric’, driven by application requirements

8

“Transport Neutral, Application-Centric”

OFI Charter Develop, test, and distribute:

1. An extensible, open source framework that provides access to high-performance fabric interfaces and services 2. Extensible, open source interfaces aligned with ULP and application needs for high-performance fabric services

OFIWG will not create specifications, but will work with standards bodies to create interoperability as needed

8

Page 9: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

OpenFabrics Software (OFS)

uverbs kverbs

IB

ENET

IB

iWAR

P

verbs

OpenFabrics Software

api

phy wire

RoCE

IB

ENET

libfabric kfabric

ofi

prov

ider

prov

ider

prov

ider

fabr

ic

fabr

ic

fabr

ic

. . .

. . .

Result: 1. An extensible interface driven by

application requirements 2. Support for multiple fabrics 3. Exposes an interface written in the

language of the application

9

Page 10: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

OFI Framework

Application

hardware

MPI

OFI

Service I/F

Service I/F

Service I/F

Provider B Provider A Provider C …

API

Providers

NIC NIC NIC

OFI consists of two major components: - a set of defined APIs (and some functions) – libfabric, kfabric - a set of wire-specific ‘providers’ - ‘OFI providers’

Think of the ‘providers’ as an implementation of the API on a given wire

a series of interfaces to access fabric services

Provider exports the services defined by the API

10

Service I/F

Page 11: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

OpenFabrics Interfaces - Providers

libfabric kfabric

ofi

prov

ider

prov

ider

prov

ider

. . .

fabr

ic

fabr

ic

fabr

ic

. . .

libfabric / kfabric

ENET

sockets provider

IB

verbs provider

ENET

verbs provider (RoCEv2, iWARP)

ARIE

S

GNI provider

All providers expose the same interfaces, None of them expose details of the underlying fabric.

Current providers: sockets, verbs, usnic, gni, mlx, psm, psm2, udp, bgq

ENET

usnic provider

11

Page 12: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

OFI Framework – a bit more detail

Communication Services Completion Services Data Transfer Services

Framework

Provider

Event & Completion

Queues

Counters

Connection Management

Control Services

Discovery

Discovery

Address Vectors

Connection Management

Address Vectors

NIC

Network consumers

Triggered

Operations

Msg queues RMA

TagMatching Atomics

Msg queues RMA

TagMatching Atomics

Triggered

Operations

Event & Completion

Queues

Counters

12

Page 13: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Legacy Apps

Data Analysis Data Storage, Data Access Distributed and Parallel Computing

Msg Passing - MPI applications

- Structured data - Unstructured

data

- Sockets apps

- IP apps

Shared memory - PGAS languages

DS/DA WG

OFI Project Overview – work group structure

13

OFI Project

Application-centric design means that the working groups are driven by use cases:

Data Storage / Data Access, Distributed and Parallel Computing…

Data Storage - object, file, block

- storage class memory

Data Access - remote persistent memory

OFI WG

Page 14: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Legacy Apps

Data Analysis Data Storage, Data Access Distributed and Parallel Computing

Msg Passing - MPI applications

- Structured data - Unstructured

data

- Sockets apps

- IP apps

Shared memory - PGAS languages

OFI Project Overview – work group structure

14

OFI Project

Data Storage - object, file, block

- storage class memory

Data Access - remote persistent memory

kernel mode ‘kfabric’ user mode

‘libfabric’

Page 15: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Legacy Apps

Data Analysis Data Storage, Data Access Distributed and Parallel Computing

Msg Passing - MPI applications

- Structured data - Unstructured

data

- Sockets apps

- IP apps

Shared memory - PGAS languages

OFI Project Overview – work group structure

15

OFI Project

Data Storage - object, file, block

- storage class memory

Data Access - remote persistent memory

kernel mode ‘kfabric’ user mode

‘libfabric’

Since the topic today is Persistent Memory…

Page 16: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage, Data Access?

storage client

NIC

file system D

IMM

DIMM

user or kernel app

virtual switch

POSIX I/F load/store, memcopy…

provider

NVDIMM

NVDIMM

NVM devices (SSDs…)

Non-volatile memory (storage class memory, persistent memory)

Reminder: libfabric: User mode library kfabric : Kernel modules

Data Storage / Data Access Working Group DS

- object, file, block - storage class mem

DA - persistent memory

Key Use Cases: 1. Lustre 2. NVMe 3. Persistent Memory

16

Page 17: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – Lustre example

RPC RPC

LNET LNET

LND (xxxlnd)

LND (xxxlnd)

?? ??

RDMA OSSs MDSs

Lustre protocol

LNET

LND (xxxlnd)

?? ??

LND (xxxlnd)

RDMA

??

C C … S

??

adding a new network type may require creating a new LND

17

Page 18: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – Lustre LND Architecture

drawing courtesy of Intel

18

Page 19: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – NVMe over fabrics

PSM-based

s/w stack

uverbs

kverbs PSM2

IB

ENET

IB

iWAR P

OPA

MXM-based

s/w stack

MXM

IB

verbs psm2 mxm

OpenFabrics Software

RoC E

IB

ENET

kverbs

IB

ENET

IB

iWAR

P verbs

RoCE

IB

ENET

NVMe is a block storage protocol designed for operation with storage class memory devices

19

Page 20: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – NVMe/F today

NVMe/F is an extension, allowing access to an NVMe device over a fabric NVMe/F leverages the characteristics of verbs to good effect for verbs-based fabrics kverbs

HCA NIC, RNIC

IB

NVMe/F

VFS / Block I/O / Network FS / LNET

NIC

IP

iSCSI

sockets

kernel application

NVMe SCSI

RoCE, iWarp

20

Page 21: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – NVMe over future fabrics

kfabric

ofi

prov

ider

prov

ider

VERB

S pr

ovid

er

fabr

ic

fabr

ic

fabr

ic

. . .

. . .

kverbs

IB

ENET

IB

iWAR

P verbs

RoCE

IB

ENET

21

Page 22: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – Future Lustre LND Architecture

22

drawing courtesy of Intel

Page 23: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – enhanced APIs

kverbs

HCA NIC, RNIC

IB

NVMe/F

VFS / Block I/O / Network FS / LNET

NIC

IP

iSCSI

sockets

kernel application

NVMe SCSI

fabric

kfabric

provider

fabric-specific device

- kfabric as a second native API for NVMe - kfabric as a possible ‘LND’ for LNET A kfabric verbs provider exists today (not upstream), meaning that kfabric can be run over a conventional verbs-based fabric. No modifications are necessary Waiting for a kfabric provider to emerge

RoCE, iWarp

23

Page 24: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

A look at Persistent Memory

24

Applications tolerate long delays for storage, but assume very low latency for memory - Storage systems are generally asynchronous, target driven, optimized for cost - Memory systems are synchronous, and highly optimized to deliver the lowest imaginable latency with no

CPU stalls Persistent Memory over fabrics is somewhere in between: - Much faster than storage, but not as fast as local memory (usually)

How to treat PM over fabrics? - Build tremendously fast remote memory networks, or - Find ways to hide the latency, making it look for the most part like memory, but cheaper and remote

remote storage remote PM

local mem

tolerance for latency pS uS mS

Page 25: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Access – PM operations

NIC

user app

PM client

NIC

PM server

NVM device(s)

I/O bus mem bus

provider

For ‘normal’ fabrics, the responder returns an ACK when the data has been received by the end point. - It may not be globally visible, and/or - It may not yet be persistent Need an efficient mechanism for indicating when the data is globally visible, and is persistent

25

Page 26: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Access – key fabric requirements

app memory

EP

persistent memory

EP requester (consumer)

fi_send, fi_rma

completion

data is persistent ack-c

Responder Requester

I/O pipeline

data is visible ack-b data is received ack-a

Caution: OFA does not define wire protocols, it defines the semantics seen by the consumer

Objectives: 1. Give the client app control over commits to persistent

memory 2. Return to the client distinct indications of

• global visibility, persistence 3. Optimize completion protocols to avoid round trips

26

Page 27: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Possible converged I/O stack

remote block/file I/O

kernel application

SSD

VFS / Block Layer

HBA

local I/O

NVMe SCSI

local byte

addressable

ulp

SSD NVDIMM

byte access

PCIe mem bus PCIe

user app

remote byte

addressable

byte access

kfabric

kverbs

HCA NIC, RNIC

SRP, iSER, NVMe/F, NFSoRDMA, SMB Direct, LNET/LND,…

VFS / Block I/O / Network FS / LNET

NIC

iSCSI

remote I/O

sockets

provider

fabric-specific device

libfabric

kernel application user app

provider

* Verbs exists as a kfabric provider today

*

27

Page 28: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Discussion - Between Two Ferns

Doug Voigt - SNIA NVM PM TWG Chair Paul Grun – OFA vice Chair, co-chair OFIWG, DS/DA Are we going in the right direction? Are we aligned? Perfectly aligned? Almost aligned? Are we focusing on the right use cases? How to go forward from here?

28

Page 29: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

Thank You

29

Page 30: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Backup

30

Page 31: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

kfabric API – very similar to existing libfabric Consumer APIs • kfi_getinfo() kfi_fabric() kfi_domain() kfi_endpoint() kfi_cq_open() kfi_ep_bind() • kfi_listen() kfi_accept() kfi_connect() kfi_send() kfi_recv() kfi_read() kfi_write() • kfi_cq_read() kfi_cq_sread() kfi_eq_read() kfi_eq_sread() kfi_close() …

Provider APIs • kfi_provider_register()

During kfi provider module load a call to kfi_provider_register() supplies the kfi api with a dispatch vector for kfi_* calls.

• kfi_provider_deregister() During kfi provider module unload/cleanup kfi_provider_deregister() destroys the kfi_* runtime linkage for the specific provider (ref counted).

31

kfabric API

http://downloads.openfabrics.org/WorkGroups/ofiwg/dsda_kfabric_architecture/

Page 32: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – Current Lustre LND Architecture

ptlrpc

LNET

socklnd gnilnd socklnd

Ethernet NIC Gemini RoCEv2 NIC IB HCA

TCP/IP/Enet Aries IB/Enet InfiniBand

Ethernet NIC Gemini RoCEv2 NIC IB HCA

???

???

Adding a new network type requires introducing a new LND

OFED sockets/TPC/IP

32

Page 33: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – Current Lustre LND Architecture

ptlrpc

LNET

socklnd gnilnd o2iblnd

Ethernet NIC Gemini RoCEv2 NIC IB HCA new NIC

OFED

newlnd

TCP/IP/Enet Aries IB/Enet InfiniBand new fabric

Ethernet NIC Gemini RoCEv2 NIC IB HCA new NIC

One option is to define a new LND for each new network type

sockets/TPC/IP

33

Page 34: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Storage – Current Lustre LND Architecture

ptlrpc

LNET

socklnd gnilnd kfabric verbs provider

Ethernet NIC Gemini RoCEv2 NIC IB HCA

kfabric new net provider

Ethernet NIC Gemini RoCEv2 NIC IB HCA new NIC

Or, backport LNET to kfabric, allowing use of any kfabric provider

new NIC

sockets/TPC/IP

o2iblnd

OFED

TCP/IP/Enet Aries IB/Enet InfiniBand new fabric

34

Page 35: Persistent Memory over Fabrics An Application-centric view · 2019. 12. 21. · PSM2 MXM. IB. ENET. IB based . iWAR P iWARP OPA MXM-s/w stack IB. verbs . psm2 . mxm . OpenFabrics

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.

Data Access – PM completion semantics

By analogy, storage protocols include an ‘ending status’ phase, signaling to the client

that the data has been safely stored

No such client/server model exists for a remote persistent memory device

Hence, there is no convenient way to know if

data has been safely stored within the persistence domain

I/O TransactionSynchronization is achieved by

the underlying I/O protocol

Persistent Memory OperationSynchronization occurs in

hardware

I/O request

Memory request

acknowledge

I/O response

Data transfer

Sync point

35