XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & Development Ltd.

Live Migration of vGPU

Aug 2016

Xiao Zheng [email protected]

Kevin Tian [email protected]

mailto:[email protected]

mailto:[email protected]

2

Agenda

• GPU Virtualization and vGPU Live Migration

• vGPU Resources

• Design and Solution

• Current Status

• Summary

3

GPU Virtualization Usage Cases

IT

2D/3D

Office

Productivity

Desktop

CADs

Media Could

Media

Process

3D/Media Acceleration

…

Network

Remote Framebuffer

Streaming

VDI

Desktop#N

4

XENGT Architecture – Mediated Pass-through

XEN

i915

GPU

Qemu

VGA

Host LinuxVM1

VM2

GFX Driver

MPT Services

vGT

vGPUvGPU

vPCIlayout

I/O Hooks

Driver Hooks

Address space

balloon

Address space balloon

• pass-through for

performance critical

resource

• Trap and emulate for

privileged resource

• Time-shared among VMs

5

vGPU Live Migration

Live Migration: Load balance, Maintenance, Fault recovery

Unfortunately most of vGPU solutions do not support migration except GVT-g

HW

GPUGPU

GPUGPU

Hypervisor

Guest vGPU

Typically a GPU

pass-through

solution

SRIOV HW

GPU PF VF … VF

Hypervisor

Guest vGPU

Typically a GPU

SRIOV solution

GVT-g with mediated

pass-through

XEN

i915

GPU

QemuV

G

A

Host LinuxVM1VM2

GFX Driver

MPT Services

vGT

vGP

UvGP

U

vPCIlayout

I/O Hook

s

Driver

Hooks

Address

space balloo

n

Address space balloon

GVT-g architecture (Mediation) make

it possible for seamless live migration

6

Live Migration of vGPU in GVT-g

Highlight feature:

• GVT-g is Open Source project, upstream ongoing

• vGPU Live Migration follows existing hypervisor migration flow

• 3D/2D/Media graphics workload seamless migrated between Servers or Local machine

• Support Linux/Windows Guest

• Live Migration Service downtime latency < 0.3 sec (Guest RAM 2GB, assigned 512MB vGPU memory, 10Gpbs

adapter)

VM1

Office

uage

VM2

CADs

VM3*

Media

Process… VDI

VM4

Media

Process

VM3

vGPU Server1 vGPU Server2

How?

7

Demo: vGPU Live Migration with 3D workload

https://www.youtube.com/watch?v=y2SkU5JODIY

8

Agenda


• vGPU Resources


• Current Status

• Summary

9

Inside of vGPU instance

Graphics Memory

Render

Engine

Re

gis

ters

GPU

Inte

rrup

t

pass-through for performance critical resource

Trap and emulate for privileged resource

Time-shared among VMs

Graphics Memory

Render Engine

Regis

ters

vGPU

Inte

rrupt

XenGT

Graphics Memory

Render Engine

Regis

ters

vGPU

Inte

rrupt

GTT page table

10

Challenge of Migrating vGPU Instance

• When and how to migrate Graphics Memory

• When and how to migrate Guest Graphics Page Table

• When and how to migrate Render Engine State

Pre-copyGPU dirty page logging

Stop and Copy• Remove vGPU from scheduler

• Save and migrate all vGPU state

Resume/Post-Copy

Restore vGPU state

Add vGPU into scheduler

Total migration time

Service downtime

11

Migration Policies for Different vGPU Resources

Context: Render Engine

GTT page table

Graphics Memory

Registers Copy and Restore

Recreate Shadowing

Track Dirty and Copy

Recreate Shadowing

12

Agenda


• vGPU Resources


• Current Status

• Summary

13

Guest GTT Page Table Migration

VM0

VM1*VM0

HW

GTT view

Target

HW GGTT view

GTT

0 ~ 512MB

Guest VM

GTT viewGTT

0 ~ 512MB

VM2

VM2VM1

GTT = GGTT or PPGTT

GMA = Graphics Mem Address

GMA rebasing

GM address = 0xA

GM address = 0xB

• Both GGTT and PPGTT are shadowed for Guest

• GGTT required rebasing due to GGTT partition

among VMs

• Migration process actually:

A. Copy entire Guest GTT page table

B. Re-create the shadow page table for Guest

on Target side

C. Rebasing GGTT for GPU commands

Migration

GMA rebasing

VM1

Graphics Memory Address rebasing:

All vGPU cmds from Guest need to be rebased on

new address in GVT-g before send to real GPU HW

GMA rebasing

14

Guest Graphics Memory Migration

• Pre-copy: Logging dirty graphics memory pages

• Stop-and-Copy: Migrate contents to target

• Resume/Post-copy: Recreate GTT page table

based on target mfn

Problem: Intel® GPU page table entities has no Dirty or

Accessed flags to track dirty pages

Solution:Copy all used graphics memory to target.

GTT

0 ~ 512MB

Guest VM

GTT view

Source host

Physical Mem

Target host

Physical Mem

Source mfn

Target mfn

VM1

Migrate page content

15

Render Engine State Migration

Server1

Render Engine

Render Context4

Render Context3

Render Context2

HW in idle

Context1 completed

Context0 completed

GGTT address

vGPU Guest submitted CTX

Context

in queue

Migration happens at this point

CTX N Render Context required to be migrated

o Intel® GPU HW is context based

o CTX locates in GGTT memory

o Render engine state is contained

within CTX

Server2

Render Engine

HW in idle

16

Agenda


• vGPU Resources


• Current Status

• Summary

17

Current Status

• Experimental support both KVMGT and XENGT

• Platforms: Intel® 5th /6th Generation Intel® Core™ Processors

• Benchmarks covered:

Windows guest: Heaven, 3Dmark06, Trophic, Media encoding/decoding, Linux guest: lightsmark, 2D

• Quality: 12hours overnight testing, migrating every 30sec

• Timing: (Guest RAM 2GB including 512MB Graphics memory, 10Gbps adapter)

Service downtime ~0.3sec

Total migration time: ~1.7sec

Pre-copyGPU dirty page logging

Stop and Copy• Remove vGPU scheduler

• Save and migrate all vGPU state

Resume/Post-Copy

Restore vGPU state

Add vGPU into scheduler

Total migration time ~1.7sec

Service downtime ~0.3sec

18

Summary

• Need 3D/2D/Media workload in virtualization?

GVT-g is the choice

• Need GPU virtualization with migration support?

GVT-g is the choice

19

Resource Links

• Project webpage and release: https://01.org/igvt-g

• Project public papers and document: https://01.org/group/2230/documentation-list

• Intel® IDF: GVT-g in Media Cloud: https://01.org/sites/default/files/documentation/sz15_sfts002_100_engf.pdf

• XenGT introduction in summit in 2015: http://events.linuxfoundation.org/sites/events/files/slides/XenGT-

Xen%20Summit-REWRITE%203RD%20v4.pdf

• XenGT introduction in summit in 2014: http://events.linuxfoundation.org/sites/events/files/slides/XenGT-

LinuxCollaborationSummit-final_1.pdf

https://01.org/igvt-g

https://01.org/group/2230/documentation-list

https://01.org/sites/default/files/documentation/sz15_sfts002_100_engf.pdf

http://events.linuxfoundation.org/sites/events/files/slides/XenGT-Xen Summit-REWRITE 3RD v4.pdf

http://events.linuxfoundation.org/sites/events/files/slides/XenGT-LinuxCollaborationSummit-final_1.pdf

20

Notices and Disclaimers

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE,

EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS

GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR

SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR

IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR

WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT

OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT

INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.

Intel may make changes to specifications and product descriptions at any time, without notice.

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product

to deviate from published specifications. Current characterized errata are available on request.

No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a

computer with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an

Intel TXT-compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more

information, visit http://www.intel.com/technology/security

Intel, Intel logo, Xeon, and Xeon Inside are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United

States and other countries.

*Other names and brands may be claimed as the property of others.

Copyright © 2016 Intel Corporation. All rights reserved.

http://www.intel.com/technology/security

XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & Development Ltd.

Technology

Transcript of XPDS16: Live Migration of vGPU - Xiao Zheng, Intel Asia-Pacific Research & Development Ltd.