Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013,...

12
Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CI Data Sharing 1

Transcript of Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013,...

Page 1: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 1

Heterogeneous Compute Platforms:Data management

Dan TsafrirMay 2013, ICRI-CI Retreat

May 2013, ICRI-CI

Page 2: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 2

Data sharing – the problem• Sharing data between heterogeneous devices

– Oftentimes cumbersome & device-specific– In OS, apps, or both

• Programmers need to address questions like– Can the device work directly on app memory?

Or must it have its own copy of the data?

– Can the device deal with app virtual addresses?Or must the memory be mapped in some other way?

– Should the memory be pinned before passing it to the device? Or can the device withstand I/O page faults Thereby allowing memory overcommitment?

May 2013, ICRI-CI

Page 3: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 3

Data sharing – goal• Big goal

– Data sharing between heterogeneous PEs should "just work”– HW/SW interfaces should allow to keep app programmers mostly

ignorant of details– Need to develop interfaces & runtime layer that

• Abstract away details of each device,• Present to apps a simplified, efficient programming model

• Concrete goal– Focusing on MMU and IOMMU

May 2013, ICRI-CI

Page 4: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 4

Unifying MMU and IOMMU spaces

Ilya Lesokhin Muli Ben-Yehuda Assaf Schuster Dan Tsafrir

May 2013, ICRI-CI

Page 5: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 5

IOMMU in a nutshell• IOMMU vs. MMU

– IOMMU serves I/O devices that perform DMAs

– Like MMU serves processes that access virtual memory

• But– No I/O page faults (IOPFs)– If memory isn’t there => crash

May 2013, ICRI-CI

Page 6: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 6

No IOPFs – consequences• IOMMU management crippled compared to MMU

– Virtual-memory must be pre-allocated & pinned to physical-memory

• Can’t do memory overcommitment– Consider a set of uncooperative VMs with assigned NICs (SR-IOV)– Must pin their entire memory images!

• Kernel’s MMU & IOMMU management subsystems– Developed separately & used differently

• Causes numerous headaches and performance penalties– E.g., can’t use apps virtual memory space to do I/O

• Thus, to be able to unify (and get rid of above drawbacks)– Must have IOPFs

May 2013, ICRI-CI

Page 7: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 7

IOPFs support – current state of affairs • Recently defined industry spec for supporting IOPFs:

– In “PRI” (Page Request Interface)– Part of the PCI-SIG ATS (Address Translation Services) specification

• Bleeding edge I/O devices do (experimentally) support IOPFs– We are working on such experimental NICs

May 2013, ICRI-CI

Page 8: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 8

Research• Status

– Have a working environment– Handling send-IOPFs (currently NIC drops receive-IOPFs)– Measured IOPF handling (breakdown to HW and SW components)

• Next steps– Attempt to reduce overhead– Develop a strategy to handle receive-IOPFs (10 Gb/sec => 1.25 MB/ms)– Characterizing IOPFs

• How often? Performance penalty? Dropped packets?– Show I/O memory space overcommitment is possible & advantageous

• Longer term– Unify process & I/O address spaces

• Processes use their VA buffers, I/O subsystem works directly on them– Does the PRI spec make sense? Optimal? Could be improved? How?

May 2013, ICRI-CI

Page 9: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 9

Rethink the IOMMU

Moshe Malka Nadav Amit Dan Tsafrir

May 2013, ICRI-CI

Page 10: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 10

IOMMU architected similarly to MMU

May 2013, ICRI-CI

|------------------------------------------- virtual address ------------------------------------|

• Has IOTLB• Upon IOTLB miss,

=> HW walks the table

CR3

Page 11: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 11

Does this make sense?

May 2013, ICRI-CI

• We submit that it does not…• Specifically, it seems that

– Since NICs work with rings, IOTLBaccesses are completely predictable(more important than TLB becausepage-tables are un-cached)

– Since NICs map each DMA descriptorjust before using it, and un-maps itjust after, no needfor a page-tablehierarchy

– Performance can begreatly improved ifredesigning the IOMMUto take advantage of the above

Page 12: Heterogeneous Compute Platforms: Data management Dan Tsafrir May 2013, ICRI-CI Retreat May 2013, ICRI-CIData Sharing1.

Data Sharing 12

Research• Status

– Working hard towards proving all claims from previous slide– Environment: KVM/QEMU setup (10Gb/s NICs) logs all IOMMU accesses

• Future– Not just NICs (have reason to believe other I/O devices too)– Reducing overheads for virtualization (vIOMMU)– What would be the impact of unifying I/O and process spaces?

(previous project)

May 2013, ICRI-CI