Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more...
Transcript of Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more...
![Page 1: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/1.jpg)
Intel® MIC x100 Coprocessor Driver - on the Frontiers of Linux & HPC
Nikhil Rao ([email protected])
LinuxCon 2013
![Page 2: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/2.jpg)
Intel® Xeon Phi* (MIC) x100 Coprocessors
Highly-parallel Processing for Unparalleled Discovery
Groundbreaking: differences
Up to 61 IA cores/1.1 GHz/ 244 Threads
Up to 16GB memory with up to 352 GB/s bandwidth
512-bit SIMD instructions
Linux* operating system, IP addressable
Standard programming languages and tools
Leading to Groundbreaking results
Up to 1 TeraFlop/s double precision peak performance1 Enjoy up to 2.2x higher memory bandwidth than on an Intel® Xeon® processor E5 family-based server.2
Up to 4x more performance per watt than with an Intel® Xeon® processor E5 family-based server. 3
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Notes 1, 2 & 3, see backup for system configuration details.
![Page 3: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/3.jpg)
Programming Models
Offload
main()
CPU
Native
main()
Coprocessor
foo()
CPU
main()
Coprocessor
CPU Coprocessor CPU
main()
Coprocessor
CPU
main()
Coprocessor
main()
main()
Symmetric
![Page 4: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/4.jpg)
Compiler Assisted Offload
Host Only Code
float ret = 0;
#pragma omp parallel for reduction (+:ret)
for (int i = 0; i < size; i++)
{
ret += data[i];
}
ans = a[0] + a[1] + .. + a[n-1]
![Page 5: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/5.jpg)
Compiler Assisted Offload
float ret = 0;
#pragma omp parallel for reduction (+:ret)
for (int i = 0; i < size; i++)
{
ret += data[i];
}
Loop Offloaded to Coprocessor
float ret = 0;
#pragma offload target(mic) in(size) in(data:length(size))
{
#pragma omp parallel for reduction (+:ret)
for (int i = 0; i < size; i++)
{
ret += data[i];
}
}
ans = a[0] + a[1] + .. + a[n-1]
![Page 6: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/6.jpg)
Intel® Manycore Platform Software (MPSS) Stack
Host side Tools • Coprocessor FS, network configuration • Status monitoring (e.g. Temperature, Power, RAS) • Coprocessor OS state management (micctrl, mpssd) • VirtIO devices (mpssd)
Programming Models Host Platform
Tools
Driver
Coprocessor
Linux* OS
Offload Apps
Coprocessor
Linux* OS PCIe*
PCIe*
MPI* TCP/IP
![Page 7: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/7.jpg)
• Linux* OS, K1OM ABI
• Busybox filesystem
Intel® MPSS Coprocessor Environment
Programming Models Host Platform
Tools
Driver
Coprocessor
Linux* OS
Offload Apps
Coprocessor
Linux* OS PCIe*
PCIe*
MPI TCP/IP
![Page 8: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/8.jpg)
Intel® Xeon Phi™ Coprocessor Driver
Coprocessor OS Management
Virtual (VirtIO based) Device Support
Process P0 Process P1 PCIe*
PCIe* Messaging & RDMA APIs (SCIF)
![Page 9: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/9.jpg)
Coprocessor OS Boot
Host Driver
sysfs
FW ready
micctrl -b
User
Kernel
Coprocessor
![Page 10: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/10.jpg)
Coprocessor OS Boot
Host Driver
sysfs
FW ready
micctrl -b
User
Kernel
bzImage file name
RAMdisk file name
boot
Coprocessor
![Page 11: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/11.jpg)
Coprocessor OS Boot
Host Driver
sysfs
bzImage
FW ready
micctrl -b
User
Kernel
ramdisk
Coprocessor
Interrupt
![Page 12: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/12.jpg)
Coprocessor OS Boot
Host Driver
sysfs
micctrl -b
User
Kernel
Linux* Coprocessor
![Page 13: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/13.jpg)
Virtio Drivers • Virtio - framework that enables use of common
guest drivers across hypervisors
KVM
Qemu virtqueue
Guest
virtio_net.ko virtio_net.ko
![Page 14: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/14.jpg)
Virtio Drivers • Virtio - framework that enables use of common
guest drivers across hypervisors
virtqueue Guest
virtio_net.ko
lguest
lguest virtio_net.ko
![Page 15: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/15.jpg)
Virtio Drivers • Virtio - framework that enables use of common
guest drivers across hypervisors
Guest virtqueue
Coprocessor Host
mpssd virtio_net.ko
![Page 16: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/16.jpg)
Virtio Drivers • Virtio - framework that enables use of common
guest drivers across hypervisors
Guest virtqueue
Coprocessor Host
mpssd
• Key benefits
• Reuse of well designed, maintained code
• Standard, enables a simple backend
• New devices possible in the future
virtio_net.ko
![Page 17: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/17.jpg)
Device Emulation
HW
Hypervisor/Host OS
Virtio Driver
Virtio Data Path
Guest/Coprocessor OS
avail
used
virtqueue
![Page 18: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/18.jpg)
Device Emulation
HW
Hypervisor/Host OS
Virtio Driver
Buffer
Virtio Data Path
Guest/Coprocessor OS
avail
Interrupt
used
virtqueue
![Page 19: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/19.jpg)
Device Emulation
HW
Hypervisor/Host OS
Virtio Driver
Buffer
Virtio Data Path
Guest/Coprocessor OS
avail
used
virtqueue
![Page 20: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/20.jpg)
Device Emulation
HW
Hypervisor/Host OS
Virtio Driver
Buffer
Virtio Data Path
Guest/Coprocessor OS
avail
Interrupt
used
virtqueue
![Page 21: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/21.jpg)
Virtio Data Path Setup
Device Emulation (mpss daemon)
Coprocessor Host driver virtio-mic
Host OS Coprocessor OS
Virtio Bus
Device Page
![Page 22: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/22.jpg)
Virtio Data Path Setup
Device Emulation (mpss daemon)
Coprocessor Host driver virtio-mic
Host OS Coprocessor OS
Device create IOCTL
• Device page entry
– vring addresses, interrupt information
– Status notification (e.g., driver unloaded)
Virtio Bus
Device Page
Device Entry
![Page 23: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/23.jpg)
Virtio Data Path Setup
Device Emulation (mpss daemon)
Coprocessor Host driver virtio-mic
Host OS Coprocessor OS
Device create IOCTL
• Device page entry
– vring addresses, interrupt information
– Status notification (e.g., driver unloaded)
Virtio Device
Virtio Bus
Device Page
Device Entry
![Page 24: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/24.jpg)
TCP/IP
Virtio-net
virtio-pci
QEMU Network Backend
TAP
bridge
Host OS
Guest
QEMU process
kvm.ko
TCP/IP
Virtio-net
virtio-mic
Network backend (mpssd)
TAP
bridge
Host OS
Coprocessor OS
Coprocessor Driver
Data path
What’s different ?
Control path
![Page 25: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/25.jpg)
SCIF • Symmetric Communications Interface
• Goals
– Performance (PCIe* Available BW 7GB/s)
• TCP/IP host to card BW is around 400MB/s
– Abstract the PCIe* network
PCIe*
Host
Coprocessor
Coprocessor
IB* HCA
~ ~
![Page 26: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/26.jpg)
SCIF • Symmetric Communications Interface
• Goals
– Performance (PCIe* Available BW 7GB/s)
• TCP/IP host to card BW is around 400MB/s
– Abstract the PCIe* network
PCIe*
Host
Coprocessor
Coprocessor
IB* HCA
~ ~
![Page 27: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/27.jpg)
SCIF • Symmetric Communications Interface
• Goals
– Performance (PCIe* Available BW 7GB/s)
• TCP/IP host to card BW is around 400MB/s
– Abstract the PCIe* network
PCIe*
Host
Coprocessor
Coprocessor
IB* HCA
~ ~
![Page 28: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/28.jpg)
SCIF • Symmetric Communications Interface
• Goals
– Performance (PCIe* Available BW 7GB/s)
• TCP/IP host to card BW is around 400MB/s
– Abstract the PCIe* network
PCIe*
Host
Coprocessor
Coprocessor
IB* HCA
~ ~
![Page 29: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/29.jpg)
SCIF • Symmetric Communications Interface
• Goals
– Performance (PCIe* Available BW 7GB/s)
• TCP/IP host to card BW is around 400MB/s
– Abstract the PCIe* network
PCIe*
Host
Coprocessor
Coprocessor
IB* HCA
send/recv, RMA, mapped memory APIs
~ ~
![Page 30: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/30.jpg)
SCIF Endpoints & Connections
Process P0 PCIe* Process P1
Node 0
Port X Port Y
SCIF endpoint
– pipe to a PCIe* node or loopback, bound to a port ID
Exactly 2 endpoints can form a connection, SCIF data transfer/mapping APIs can only accept a connected endpoint
SCIF SCIF
Node 1
![Page 31: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/31.jpg)
• Connection
• Messaging
• Memory Registration
• Remote Memory Access (RMA)
• RMA Fencing
• Remote memory mapping (mmap)
SCIF API Functional Grouping
![Page 32: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/32.jpg)
Connection & send/recv
![Page 33: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/33.jpg)
Send/recv Implementation
Node 1
Port Y Port X
Node 0
msg0 msg1
Process P0
Process P1
Endpoint Recv Q
Endpoint Recv Q
SCIF SCIF
P0: scif_send(epd, msg0, len, flags); P1: scif_recv(epd, msg1, len, flags);
PCIe*
PCIe*
![Page 34: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/34.jpg)
Send/recv Implementation
Node 1
Port Y Port X
Node 0
msg0 msg1
Process P0
Process P1
Endpoint Recv Q
Endpoint Recv Q
SCIF SCIF
P0: scif_send(epd, msg0, len, flags); P1: scif_recv(epd, msg1, len, flags);
PCIe*
PCIe*
![Page 35: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/35.jpg)
Send/recv Implementation
Node 1
Port Y Port X
Node 0
msg0 msg1 msg1
Process P0
Process P1
Endpoint Recv Q
Endpoint Recv Q
SCIF SCIF
P0: scif_send(epd, msg0, len, flags); P1: scif_recv(epd, msg1, len, flags);
PCIe*
PCIe*
![Page 36: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/36.jpg)
Memory Registration
Process P0 Process P1
Node 0 Node 1
Port X
buf0 buf1
• SCIF RMA provides zero copy inter-process data transfer
• Registration exposes local memory for remote access
• Pins pages – Local DMA engine access
– Remote access
Port Y
PCIe*
![Page 37: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/37.jpg)
Registered Address Space (RAS)
• Offsets Reference registered memory in RMA APIs
• RAS is per connection
• Connection has 2 registered address spaces – Local & Remote
– Local RAS offset = Peer’s Remote RAS offset
node0:X
Remote RAS Local RAS
node1:Y
Remote RAS Local RAS
Connection
Process P0 Process P1
off_t scif_register(epd, addr, len, …, prot, ..);
![Page 38: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/38.jpg)
Registered Address Space (RAS)
• Offsets Reference registered memory in RMA APIs
• RAS is per connection
• Connection has 2 registered address spaces – Local & Remote
– Local RAS offset = Peer’s Remote RAS offset
node0:X
Remote RAS Local RAS
buf0
node1:Y
Remote RAS Local RAS
Connection
Process P0 Process P1
offset0
off_t scif_register(epd, addr, len, …, prot, ..);
![Page 39: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/39.jpg)
Registered Address Space (RAS)
• Offsets Reference registered memory in RMA APIs
• RAS is per connection
• Connection has 2 registered address spaces – Local & Remote
– Local RAS offset = Peer’s Remote RAS offset
node0:X
Remote RAS Local RAS
buf0
node1:Y
Remote RAS Local RAS
buf0
Connection
Process P0 Process P1
offset0 offset0
off_t scif_register(epd, addr, len, …, prot, ..);
![Page 40: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/40.jpg)
Registered Address Space (RAS)
• Offsets Reference registered memory in RMA APIs
• RAS is per connection
• Connection has 2 registered address spaces – Local & Remote
– Local RAS offset = Peer’s Remote RAS offset
node0:X
Remote RAS Local RAS
buf0
node1:Y
Remote RAS Local RAS
buf0
Connection
Process P0 Process P1
offset0 offset0
off_t scif_register(epd, addr, len, …, prot, ..);
offset1
buf1
![Page 41: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/41.jpg)
Registered Address Space (RAS)
• Offsets Reference registered memory in RMA APIs
• RAS is per connection
• Connection has 2 registered address spaces – Local & Remote
– Local RAS offset = Peer’s Remote RAS offset
node0:X
Remote RAS Local RAS
buf0
node1:Y
Remote RAS Local RAS
buf0
Connection
Process P0 Process P1
offset0 offset0
off_t scif_register(epd, addr, len, …, prot, ..);
buf1
offset1 offset1
buf1
![Page 42: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/42.jpg)
RMA int scif_writeto(epd, offset0, len, offset1, flags);
node0:X
Remote RAS Local RAS
buf0 buf1
offset1 offset0
Connection
Process P0
![Page 43: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/43.jpg)
RMA int scif_vwriteto(epd, buf0, len, offset1, flags);
node0:X
Remote RAS
buf1
offset1
Process VA
buf0
addr
Connection
Process P0
![Page 44: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/44.jpg)
RMA Fence APIs • Asynchronous RMAs allow overlap of compute &
communication
• Fence APIs allow synchronization with RMA completion
Non-blocking (polling) synchronization RAS
Tim
e
t1
t2
t3
t6
t7
RMA2
RMA1
t4
t5
RMA3
write v
off
scif_fence_signal(ep,off,v)
![Page 45: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/45.jpg)
RMA Fence APIs (contd)
scif_fence_wait(ep,m)
RAS
Tim
e
t1
t2
t3
t6
t7
RMA2
RMA1
t4
t5
RMA3
m=scif_fence_mark(ep)
Blocking Synchronization
![Page 46: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/46.jpg)
Remote Memory Mapping
node0:X
Remote RAS Local RAS
Buf0 Buf1
Process VA
Buf0
va = mmap(addr, len, prot, flags, epd, offset1);
offset1
Connection
Lowest latency path for messaging
Process P0
offset0
![Page 47: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/47.jpg)
Remote Memory Mapping
node0:X
Remote RAS Local RAS
Buf0 Buf1
Buf1
Process VA
va
Buf0
va = mmap(addr, len, prot, flags, epd, offset1);
offset1
Connection
Lowest latency path for messaging
Process P0
offset0
![Page 48: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/48.jpg)
OFED* over SCIF
• OpenFabrics Enterprise Distribution (OFED*) open-source software stack for InfiniBand* and iWARP*
• IB-SCIF driver
– Software emulated HCA
– Used within the box
– IB-SCIF driver uses kernel SCIF send/recv and RMA operations
IB uverbs
IB core
IB Verbs Library
IB-SCIF driver
SCIF
User / Kernel Mode
MPI Application
uDAPL
Host /
Coprocessor
IB-SCIF Library
![Page 49: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/49.jpg)
SCIF RMA Performance
0
1
2
3
4
5
6
7
8
Thro
ugh
pu
t (G
B/s
ec)
Transfer Size (KBytes)
Comparison of TCP and SCIF based BW
Available PCIe BW
SCIF Write DMA (Host initiated)
SCIF Write DMA (Coprocessor initiated)
TCP (Host->Coprocessor)
TCP (Coprocessor->Host)
![Page 50: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/50.jpg)
Code Status & Plans
• Patches for features below submitted, expect inclusion in 3.13
– Coprocessor OS state management
– Virtio device support
• Future patches
– DMA engine & usage in Virtio device support
– SCIF
![Page 51: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/51.jpg)
Summary
• MIC x100 Coprocessor driver is a key element of an all Linux* HPC platform
– Enables choice of programming models
• New driver features
– Virtio for PCIe* endpoints
– SCIF communication
Possibilities for reuse in your HW ? Suggestions ?
Let us know!
![Page 52: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/52.jpg)
Acknowledgements!
• Team
– Dasa Chandramouli , Bruce Chang , Bill Clifford Ashutosh Dixit,, Sudeep Dutt, Harsha Kharche, Sanath Kumar, Ravi Murty, Johnnie Peters, Evan Powers, John Wiegert, Siva Yerramreddy, Caz Yokoyama, Jianxin Xiong
• Reviewers
– PJ Waskiewicz, Eddie Dong
• Presentation – James Reinders
![Page 53: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/53.jpg)
Links
• Patches
– https://lkml.org/lkml/2013/9/5/561
• MPSS
• Software Developer's Guide
![Page 54: Intel MIC Coprocessor Driver · Intel ®Xeon processor E5 family-based server.2 Up to 4x more performance per watt than with an Intel ®Xeon processor E5 family-based server.3 Software](https://reader035.fdocuments.in/reader035/viewer/2022071501/612080f3ba8fdf4e92671c89/html5/thumbnails/54.jpg)
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Intel, Cilk, VTune, Xeon, Xeon Phi, Look Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries.
*Other names and brands may be claimed as the property of others. Copyright ©2013 Intel Corporation.