Post on 04-Dec-2014
description
All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
Jun Kamada <kama@jp.fujitsu.com>Akio Takebe <takebe_akio@jp.fujitsu.com>
FUJITSU LIMITED
Improvement of the PCI pass-through
All Rights Reserved, Copyright (C) FUJITSU 2007 - 20092
� Background
� Why SCSI ?
� pvSCSI and PCI pass-through
� Part 1: Current status of pvSCSI enhancements
� Part 2: The booting guest with PCI pass-through
AgendaAgenda
All Rights Reserved, Copyright (C) FUJITSU 2007 - 20093
BackgroundWhy SCSI ? (1/2)
backup restore
load
unload
Storage
Tape drive Tape cartridge
Safety
box
move
preserve
Backup to tape is a fundamental functionality for reliability and availability.
� Free to move (to safe place)
� Long term preservation
Tape drive is usually controlled by SCSI functionality.
SCSI support on guest VM is highly desired in virtualized environment. (Issuing SCSI command from guest VM)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 20094
BackgroundWhy SCSI ? (2/2)
� In data center, reliability and availability (e.g. hardware snapshot, tapebackup) are provided by SCSI feature.
� These servers are consolidated into a server in virtualized environment.
SCSI support on guest VM is mandatory.(Issuing SCSI command from guest VM)
DB
ServerBackup
Server
snapshotData
FileData
File
DBMS
Storage (RAID)
Tape Drive
SCSI commandSCSI command
Load, unload, reset
Hardware snapshot
LAN
SANSAN
DataCenter
All Rights Reserved, Copyright (C) FUJITSU 2007 - 20095
BackgroundpvSCSI and PCI pass-through
� We have developed the pvSCSI driver andwill continue to enhance it.
� Report current status of the enhancement. (Part 1)
� On the other hand, we have needs to provide …
� Reliability with hardware assist. (e.g. PCIe AER, …)
� Seamless move between P and V.
We are focusing on SAN/PXE boot using VT-d/IOMMU.
� Report enhancements of guest BIOS in order to provide SAN/PXE boot. (Part 2)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 20096
Current status of pvSCSI enhancements
Part 1Part 1
All Rights Reserved, Copyright (C) FUJITSU 2007 - 20097
LUN
1:0:1:3
Dom0
LUN LUN…
Physical SCSI tree(s)
LUN LUN
Virtual SCSI tree
0:0:0:1 0:1:2:3 2:0:0:12:0:0:0
Physical SCSI host(host=0)
Virtual SCSI host(host=2)
Physical LUNs Virtual LUNs
Guest Domain
LUN LUN
Virtual SCSI tree
2:0:0:12:0:0:0
Virtual SCSI host(host=2)
Virtual LUNs
(3) Add
The pvSCSI driver for Xen 3.3.0 provides:
� LUN(Logical Unit Number) pass through
� LUN hot-plug
Arbitrarymapping
(4) AppearImmediately
(1) Add
(2)Attach
Current implementation (Xen 3.3.0)Current implementation (Xen 3.3.0)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 20098
Current implementation provides completely virtualized (arbitrarily mapped) SCSI tree to guest domain.
� Some kind of SCSI commands (REPORT_LUN, EXTENDED_COPY, …) should be emulated on backend.
(They depend on physical topology of SCSI tree.)
� A lot of work is needed in order to Implement emulation logic for all the commands, so current implementation supports only mandatory commands.
Does not support full SCSI functionality. :-(
It can provide flexibility, but …
Issue of current implementationIssue of current implementation
All Rights Reserved, Copyright (C) FUJITSU 2007 - 20099
1. Implement all emulation logics step by step.
� Hard work.� Cannot support some vendor specific commands, maybe.
2. “Add” new mode in order to attach whole HBA toguest domain. (It allows bypassing “SCSI commandemulation” on backend driver.)
� Easy to implement. (Details will be shown in following slide.)� Can support all vendor specific commands.
We took second approach.
How to solve the issueHow to solve the issue
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200910
Dom0
LUN LUN
Physical SCSI tree
LUN LUN
Virtual SCSI tree
0:0:0:1 0:1:2:3 2:1:2:32:0:0:1
Physical SCSI host(host=0)
Virtual SCSI host(host=2)
Physical LUNs Virtual LUNs
Guest Domain
LUN LUN
Virtual SCSI tree
2:1:2:32:0:0:1
Virtual SCSI host(host=2)
Virtual LUNs
(2)Attach
(1)Create
Additional implementation provides:
� Host (HBA: Host Bus Adaptor) pass through
Same ID(underline only)
Posted implementation (1/2)Posted implementation (1/2)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200911
� Backend Driver
� LUN/Host mode identification flag for each virtual SCSI tree� Emulation bypassing logic (if the flag shows “Host mode”)
� Frontend Driver
� No need to modify
� xend
� User interface (in order to specify “Host mode”)� LUN scan logic (provides shorter processing time by using“lsscsi” command, if exist. (Community’s request))
Following are modifications actually needed
Posted implementation (2/2)Posted implementation (2/2)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200912
� We posted a series of patches on last week and, they were already merged into the unstable tree. (Thanks!)
� Please try and evaluate them. Many commentsare appreciated.
Thanks
Conclusion (Part 1)Conclusion (Part 1)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200913
The booting guest with PCI pass-through
Part 2Part 2
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200914
IntroductionWhat is problem of booting from PCI pass-
through?
IntroductionWhat is problem of booting from PCI pass-
through?
qemu
guestdom0
Emulation disk
(boot disk)
Pass through disk
(data disk)
qemu
guestdom0
Pass through disk
(boot disk & data disk)
Before After
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200915
Contents of Part2Contents of Part2
�What are required for SAN/SAS boot
� Details of the requirements
� Status of the requirements
� Sample
� Other challenge (PXE boot)
� Some concerns
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200916
What are required for SAN/SAS boot?What are required for SAN/SAS boot?
BCV: Boot Connection Vector. It’s typically used by SCSI controller.
• int 0x13 handler of pass through device
(support calling convention of BCV style)
• BIOS function
• PMM(POST Memory Manager) service
• PnP runtime function
• IPL/BCV table
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200917
Details of the requirementsDetails of the requirements
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200918
Calling PCI expansion ROM(1/3)Calling PCI expansion ROM(1/3)
What is BCV style?
BCV is a pointer that points to code inside the Expansion ROM. By using the code, PCI cards supporting the boot spec of BCV style can hook INT 0x13 at the device initialization. Then BIOS can access the harddisk connected to the PCI cards by using the special INT 0x13 handler.
IDE disk handler
CD-ROM handler
FDD handler
PnP device handler
BIOSBIOS needs to read MBR of a boot disk.
int 0x13
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200919
0xc0000
0xea000
1. Hvmloader map the Expansion ROM to 0xc0000
ROM header
PCI data structure
Image
3. rombios jump to Entry point for INIT function after supplying ax register with bus:dev:function number.
Calling PCI expansion ROM (2/3)How to initialize Expansion ROM
Calling PCI expansion ROM (2/3)How to initialize Expansion ROM
2. Hvmloader,rombioschecks some data
PnP Expansion Header
jmp <address>
0xaa55
…
0h signature
2h Image size
3h Entry point for INIT function
6h reserved
18hPointer to PCI data Structure
PnP Expansion Header
1Ah
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200920
Calling PCI expansion ROM(3/3)How to initialize Expansion ROM
Calling PCI expansion ROM(3/3)How to initialize Expansion ROM
0xc0000
0xea000
ROM header
PCI data structure
Image
PnP Expansion Header
jmp <address>
0xaa55
…
0h signature
2h Image size
3h Entry point for INIT function
6h reserved
18hPointer to PCI data Structure
PnP Expansion Header
1Ah
(0000h is none)
$PnP…
0h signature… …
06h offset of next header
09h checksum
BCV16h
……
………
…
………
Next PnP Expansion Header
4. Jump to BCV for hooking INT 0x13h
Code to hook INT 0x13h
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200921
PMM service?PMM service?
� The PMM provides memory allocation only during POST.
� PCI expansion ROM use PMM service. For example, PCI expansion ROM need a memory block to decompress their code and to allocate data area only used during initialization.
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200922
PnP runtime function?PnP runtime function?
� PnP runtime functions are used by O/S and application program. It allow them to access BIOS features. (Get version, the number of device, …)
� PCI expansion ROM may check only PnP Installation Check structure to determine if the system has a Plug and Play BIOS.
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200923
IPL/BCV table?IPL/BCV table?
� IPL Table/IPL priority
� IPL Table/IPL priority decide in which order devices will be selected for booting.
� In the case of xen, they are configured like “boot=cda” in a guest configuration file.
� BCV Table/BCV priority
� BCV Table/BCV priority decide in which order devices will be selected for installing INT 0x13 handler.
� The order would affect the boot order.
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200924
Example of Boot OrderExample of Boot Order
HDD disk
Network
CD-ROM
Floppy
IDE diskIPL table BCV table
IPL priority
1
2
3
4
2
4
1 3BCV priority
2
2 1
1
Floppy
Network
CD-ROM
4
1 2
1
3
Additional PCI card
IDE disk
Additional PCI card
(e.g. SCSI card)
Boot Order
IPL BCV
1
2
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200925
Status of the requirementsStatus of the requirements
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200926
Calling convention
�Supported the calling convention of BCV style in BIOS Boot spec
BCV cover not only PCI device but also ISA device.
But IOMMU does not support ISA devices.
So we supported only the calling convention of BCV style for PCI devices.
Status(1/3)Status(1/3)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200927
BIOS function
�PMM would be needed in some PCI cards.
�PnP runtime function would be not called (in my experience). But we need to support dummy PnP runtime function because some Expansion ROM may check only supporting PnP runtime function.
PMM has already been supported by Kouya Shimura
The dummy PnP runtime function is easy to support.
In Bochs community, Sebastian Herbszt has already posted the patch.
Status(2/3)Status(2/3)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200928
BCV table/BCV priority
How to support BCV priority for pass-though device on Xen?
A) If without Emulation disks, boot a pass-through device.
B) If we specify a pass-through device as a bootable, the expansion ROM of only the device is loaded.
For example, pci= [ “bb:dd.ff,boot=1” ]
C)Enhance the IPL table. If pass-through device is specified in boot order, the pass-through device of boot=1 option is selected as a boot device.
For example, boot=“p”.
Status(3/3)Status(3/3)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200929
SampleSample
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200930
Other challenge(PXE boot with PCI pass-through)
Other challenge(PXE boot with PCI pass-through)
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200931
Issue of PXE bootIssue of PXE boot
� Expansion ROM of Ethernet
� Almost PnP devices of ethernet don’t have Expansion
ROM image on themselves.
� So we try to use gPXE for booting from a pass-through devices.
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200932
Try PXE boot with gPXETry PXE boot with gPXE
Configuration & Hack
� Comment out checking the device number in hvmloader.
� Don’t specify emulation nic.
� Only specify a nic of pass through device.
� Recompile gPXE with the driver of the device and remake eb-roms.h
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200933
Result…Result…
� gPXE may not support the NIC cards
� gPXE may check device-id/vendor-id and so on inside itself.
� Need more debug…
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200934
Some concernsSome concerns
� Lack of I/O port space� Boot device use I/O port but I/O port is only 64k.
�MMIO problem� See docs/misc/vtd.txt (Assigning devices to HVM
domains)
� Dependency of Multifunction device� Some Multifunction device don’t work when we pass
the single function to guest.
� pci.hide option� If we use many pass-through devices, pci.hide option
will be very long…
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200935
Any question?
Q&AQ&A
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200936
This work was partly funded by Ministry of Economy,
Trade and Industry (METI) of Japan as the Secure
Platform project of Association of Super-Advanced
Electronics Technologies (ASET).
All Rights Reserved, Copyright (C) FUJITSU 2007 - 200937
Thank you