Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2...
Transcript of Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2...
![Page 1: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/1.jpg)
May 2008 Screencast: OpenFabrics Concepts 1
Screencast: OpenFabrics Concepts
Jeff Squyres May 2008
![Page 2: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/2.jpg)
May 2008 Screencast: OpenFabrics Concepts 2
“Verbs” API (VAPI)
• IB/iWARP actions known as “verbs” Send verb, receive verb, etc.
• First IB VAPI was Mellanox VAPI (mVAPI) Now deprecated
• OpenFabrics has different VAPI Similar concepts, but different API
![Page 3: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/3.jpg)
May 2008 Screencast: OpenFabrics Concepts 3
No Unexpected Receives
• All messages must be “expected” • Receiver must pre-allocate resources
Pool of buffers to receive messages Pool of buffers as target for RDMA
• Unexpected message triggers an error
![Page 4: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/4.jpg)
May 2008 Screencast: OpenFabrics Concepts 4
Virtual Lanes / Service Levels
• OpenFabrics traffic divided into virtual “lanes” Virtual separation of traffic Analogous to MPI communicators (!) Can be assigned QoS-like attributes Weighting, etc.
• Service levels maps to lanes
![Page 5: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/5.jpg)
May 2008 Screencast: OpenFabrics Concepts 5
Some OpenFabrics Queues
• Queue Pair (QP) Unit of connection in OpenFabrics Think of as “sockets” for OpenFabrics Send queue + receive queue
• Completion queue Most OF verbs are non-blocking OF driver puts events on this queue to signal
when a verb has completed
![Page 6: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/6.jpg)
May 2008 Screencast: OpenFabrics Concepts 6
Registered Memory
• InfiniBand/iWARP are RDMA-based networks Directly sends / receives from RAM Without involvement from main CPU
• But… Operating system can change virtual
physical RAM mapping at any time
![Page 7: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/7.jpg)
May 2008 Screencast: OpenFabrics Concepts 7
Race Condition
1. MPI says “IB: send this buffer”
2. HCA obtains physical address
3. HCA starts sending 4. OS changes physical
mapping 5. HCA now sending
garbage!
RAM
Message
HCA
![Page 8: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/8.jpg)
May 2008 Screencast: OpenFabrics Concepts 8
Race Condition
1. MPI says “IB: send this buffer”
2. HCA obtains physical address
3. HCA starts sending 4. OS changes physical
mapping 5. HCA now sending
garbage!
RAM
Message
HCA
![Page 9: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/9.jpg)
May 2008 Screencast: OpenFabrics Concepts 9
Race Condition
1. MPI says “IB: send this buffer”
2. HCA obtains physical address
3. HCA starts sending 4. OS changes physical
mapping 5. HCA now sending
garbage!
RAM
Message
HCA
![Page 10: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/10.jpg)
May 2008 Screencast: OpenFabrics Concepts 10
Race Condition
1. MPI says “IB: send this buffer”
2. HCA obtains physical address
3. HCA starts sending 4. OS changes physical
mapping 5. HCA now sending
garbage!
RAM
Message
HCA
Message
OS moves buffer
![Page 11: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/11.jpg)
May 2008 Screencast: OpenFabrics Concepts 11
“Registering” Memory
• Solution: tell OS not to change mapping “Pinning” (“locking”) memory Guarantees that the message will stay in the
same physical location until HCA is done • “Registering” memory does two things:
1. Pinning virtual physical mapping 2. Notifying HCA of the mapping
![Page 12: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/12.jpg)
May 2008 Screencast: OpenFabrics Concepts 12
Registered Memory Problems
• Registering and unregistering is slow • OS can only support so much registered
memory at a time Pinned pages are unswappable
• Must be careful to set ulimits properly (OFED)
![Page 13: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/13.jpg)
May 2008 Screencast: OpenFabrics Concepts 13
Registered Memory Footprint
• How much registered memory does Open MPI use? A complicated answer Requires some background information first…
• For reference: Complete answer (for v1.2 and beyond):
http://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage
![Page 14: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/14.jpg)
May 2008 Screencast: OpenFabrics Concepts 14
Common MPI Trick
• MPI_SEND(buffer, …) Register the buffer Do the send Return (leaving the buffer registered)
• Rationale: next time you send from that buffer, do not pay registration cost again Great for benchmarks! Usually not great for real applications
• OMPI does not do this (…by default)
![Page 15: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/15.jpg)
May 2008 Screencast: OpenFabrics Concepts 15
Problems of User Registration
• Can run out of registered memory MPI must implement eviction policies
• Application can free buffer MPI must intercept free() or sbrk() to
unregister memory before given back to OS Extremely problematic
• So just say “No!” …except for benchmarks
![Page 16: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/16.jpg)
May 2008 Screencast: OpenFabrics Concepts 16
More Information
• Open MPI FAQ General tuning
http://www.open-mpi.org/faq/?category=tuning InfiniBand / OpenFabrics tuning
http://www.open-mpi.org/faq/?category=openfabrics
![Page 17: Screencast: OpenFabrics Concepts · 2019-05-20 · May 2008 Screencast: OpenFabrics Concepts 2 “Verbs” API (VAPI) • IB/iWARP actions known as “verbs” Send verb, receive](https://reader030.fdocuments.in/reader030/viewer/2022040506/5e3ea2994b071032c953dc48/html5/thumbnails/17.jpg)
May 2008 Screencast: OpenFabrics Concepts 17