Virtual Machine Scheduling for Parallel Soft Real-Time Applications
description
Transcript of Virtual Machine Scheduling for Parallel Soft Real-Time Applications
Virtual Machine Scheduling for Parallel Soft Real-Time Applications
Like Zhou, Song Wu, Huahua Sun, Hai Jin, Xuanhua Shi
Services Computing Technology and System LabCluster and Grid Computing Lab
School of Computer Science and TechnologyHuazhong University of Science and Technology
Outline
• Introduction• Motivation• Design• Implementation• Evaluation• Discussion and Future Work• Conclusion
Introduction• Many soft real-time applications use parallel
programming models to utilize hardware resources better and possibly shorten response time
• More and more cloud services including such parallel soft real-time applications (PSRT applications) are running in virtualized environment
cloud-based live transcoding computer visiondistributed real-time
stream computing
Introduction• When running in virtualized environment, PSRT
applications do not behave well and only obtain inadequate performance
Outline
• Introduction• Motivation• Design• Implementation• Evaluation• Discussion and Future Work• Conclusion
Motivation
0 1 2 3 4 5 6 7 8
CPU0
CPU1
CPU2
CPU3
Time9 10
1 1
22
2
21
1
2
3 3
3
0 1 2 3 4 5 6 7 8
CPU0
CPU1
CPU2
CPU3
Time9 10
1
1
2
2
2
2
1
1 3
3
3
3
0 1 2 3 4 5 6 7 8
CPU0
CPU1
CPU2
CPU3
Time9 10
1
1
2
2
2
2
1
1 3
3
3
3
soft real-time scheduling
co-scheduling
parallel soft real-time scheduling
1st
1st
1st
2nd
2nd
2nd
3rd
3rd
How to design and implement the parallel soft real-time scheduling algorithm which addresses soft real-time constraints and synchronization problems simultaneously?
Outline
• Introduction• Motivation• Design• Implementation• Evaluation• Discussion and Future Work• Conclusion
Overall Design
Address Soft Real-Time Constraints
• How to handle soft real-time constraints of event-driven soft real-time applications?
PCPU0
PCPU0 VCPU0VCPU0 RT-
VCPU0RT-
VCPU0 VCPU1VCPU1 VCPU2VCPU2 VCPU3VCPU3
receive external events
PCPU0
PCPU0
RT-VCPU0
RT-VCPU0 VCPU0VCPU0 VCPU1VCPU1 VCPU2VCPU2 VCPU3VCPU3
descheduled
PCPU0
PCPU0 VCPU0VCPU0 VCPU1VCPU1 RT-
VCPU0RT-
VCPU0 VCPU2VCPU2 VCPU3VCPU3
real-time boost under over
Address Soft Real-Time Constraints
• How to handle soft real-time constraints of time-driven soft real-time applications?
VCPU0VCPU0 VCPU1VCPU1 VCPU2VCPU2 RT-VCPU0RT-VCPU0 VCPU0VCPU0 VCPU1VCPU1 VCPU2VCPU2
VCPU0
VCPU0
VCPU1
VCPU1
VCPU2
VCPU2
RT-VCPU0
RT-VCPU0
VCPU0
VCPU0
VCPU0
VCPU0
VCPU0
VCPU0
VCPU0
VCPU0
VCPU1
VCPU1
VCPU1
VCPU1
VCPU1
VCPU1
VCPU1
VCPU1
VCPU2
VCPU2
VCPU2
VCPU2
VCPU2
VCPU2
RT-VCPU0
RT-VCPU0
RT-VCPU0
RT-VCPU0
RT-VCPU0
RT-VCPU0
has RT-VMs no RT-VMs
Address Soft Real-Time Constraints
• How to calculate time slice?S the length of time slice the scheduler used
LTS long time slice
NR the total number of RT-VCPUs in the system
NV the number of VCPUs per PCPU
L the expected latency of soft real-time applications
WCSL worst case scheduling latency
Calculate WCSL:
L and WCSL must meet:
Calculate S:
Address Soft Real-Time Constraints
• How to determine the approximate value of the expected latency?– We use the VoIP test of MyConnection Server (MCS) to
conduct an experiment
– The time slice with the value of 5ms is good enough to guarantee the quality of VoIP while minimizing the impact on other applications
– The value of L is calculated as 15ms
Time slice Upstream jitter
Upstream packet loss
Packet discards
MOS
30ms 7.8ms 9.5% 3.0% 1.0
15ms 6.7ms 7.2% 0.9% 1.0
5ms 5.0ms 0.1% 0% 4.0
3ms 4.8ms 0% 0% 4.0
Solve Synchronization Problems
• How to handle synchronization problems?
PCPU1
PCPU1 VCPU1VCPU1 VCPU2VCPU2 VCPU0VCPU0
PCPU0
PCPU0
RT-VCPU0
RT-VCPU0 VCPU0VCPU0 VCPU1VCPU1 VCPU2VCPU2 VCPU3VCPU3
PCPU2
PCPU2
RT-VCPU2
RT-VCPU2 VCPU1VCPU1 VCPU0VCPU0 VCPU3VCPU3
PCPU3
PCPU3
RT-VCPU1
RT-VCPU1 VCPU2VCPU2 VCPU3VCPU3
real-time boost under over
RT-VCPU3
RT-VCPU3VCPU3VCPU3
VCPU2VCPU2
VCPU0VCPU0VCPU1VCPU1
soft interrupt
• How to address the VCPU migration problem?
Solve Synchronization Problems
PCPU0
PCPU0 VCPU0VCPU0 RT-
VCPU0RT-
VCPU0 VCPU1VCPU1 VCPU2VCPU2 VCPU3VCPU3
real-time boost under over
PCPU1
PCPU1
RT-VCPU1
RT-VCPU1 VCPU0VCPU0 VCPU1VCPU1 VCPU2VCPU2 VCPU3VCPU3
steal
RT-VM
VCPU migration problem
affinity exchange RT-VCPU0
RT-VCPU0
0 1 0 1CPU affinity
RT-VCPU1
RT-VCPU1
1 0 1 0CPU affinity
Parallel Soft Real-Time Scheduling
Calculate time slice
Schedule all runnable VCPUs of a RT-VM
Outline
• Introduction• Motivation• Design• Implementation• Evaluation• Discussion and Future Work• Conclusion
Implementation• Poris: parallel soft real-time scheduler
– User interface:• add a field named type to csched_dom• add a field named latency to csched_dom• add a new command xm sched-rt
– Modification to the Credit scheduler (sched_poris):• add a new priority (CSCHED _PRI _TS _RT) as the real-
time priority• modify event processing• modify the VCPU and PCPU operating functions• manage CPU affinity and modify csched_schedule() to
co-schedule all runnable VCPUs of a RT-VM
Outline
• Introduction• Motivation• Design• Implementation• Evaluation• Discussion and Future Work• Conclusion
Experiment Setup• Hardware and VM configuration
• Software– Hypervisor: Xen-4.0.1– OS: CentOS 5.5 distribution with the Linux-2.6.31.8 kernel
• Interfering configuration– CPU-intensive interfering configuration: all interfering VMs run CPU-intensive
workloads– mixed interfering configuration: some interfering VMs run CPU-intensive
workloads, and some run I/O-intensive workloads
Name Hardware configuration VM configuration
Machine I a dual-core 2.6GHz Intel CPU, 2GB memory, 500GB SATA disk and 100Mbps Ethernet card
2VCPUs, 256MB memory and 10GB virtual disk
Machine II two quad-core 2.4GHz Intel Xeon CPUs, 24GB memory, 1TB SCSI disk and 1Gbps Ethernet card
8VCPUs, 1GB memory and 10GB virtual disk
Experiments• Does Poris guarantee the QoS of VoIP applications?
– Experiments with MyConnection Server
• Is Poris suitable for client-side virtualization?– Experiments with Media Player
• Does Poris surpass other schedulers?– Experiments with PARSEC Benchmark
• What is the impact of Poris on non-real-time workloads?– Experiments with Non-real-time Workloads (Kernel
compilation, Postmark, Stream benchmark)
• Upstream jitter
MyConnection Server
(a) CPU-intensive interfering configuration
(b) Mixed interfering configuration
MyConnection Server• VoIP test results
Interfering configuration
Scheduler Upstream jitter
Downstream jitter
Packet discards
MOS
CPU-intensive Credit 11.9ms 9.6ms 4.4% 1.0
Poris 4.6ms 1.2ms 0.0% 4.0
Mixed Credit 10.4ms 6.5ms 2.0% 1.0
Poris 4.5ms 0.6ms 0.0% 4.0
61.34% ↓ 87.5% ↓
56.73% ↓ 90.77% ↓
Media Player• Play low resolution video
(a) CPU-intensive interfering configuration
(b) Mixed interfering configuration
71.19% ↑
40.68% ↑
Media Player• Play high resolution video
(a) CPU-intensive interfering configuration
(b) Mixed interfering configuration
135.94% ↑
95.31% ↑
PARSEC Benchmark
blackscholes bodytrack canneal dedup facesim ferret50
60
70
80
90
100
110N
orm
aliz
ed e
xecu
tion
time
(%)
CreditRSPSPoris
fluidanimate freqmine raytrace streamcluster swaptions vips x26450
60
70
80
90
100
110
Nor
mal
ized
exe
cutio
n tim
e (%
)
CreditRSPSPoris
The performance of Poris is up to 44.12% better than
Credit, 41.28% better than RS, and 28.02% better than
PS.
Non-real-time workloads
Kernel compilation Postmark
Stream benchmark
Because Poris promotes the priorities of RT-VCPUs temporarily and uses dynamic time slices, the interferences of Poris on non-real-time workloads are slight and acceptable.
Poris even increases the performance of some types of non-real-time workloads, such as I/O-intensive workloads.
Outline
• Introduction• Motivation• Design• Implementation• Evaluation• Discussion and Future Work• Conclusion
Discussion and Future Work• Determining VM type and expected latency
– provide APIs to programmers– analyze runtime characteristics of applications
• Many applications running in a VM– use previous techniques to identify real-time applications
• Supporting multiple VMs running the same PSRT applications– co-schedule multiple VMs by analyzing the communication
patterns of VMs running the same PSRT applications
Outline
• Introduction• Motivation• Design• Implementation• Evaluation• Discussion and Future Work• Conclusion
Conclusion• We identify the scheduling problems in virtualized
environment, and find existing CPU scheduling mechanisms do not fit for PSRT applications
• We propose a novel parallel soft real-time scheduling algorithm
• We implement a prototype in the Xen hypervisor based on the algorithm, named Poris
• We verify the effectiveness of Poris through various applications. The experimental results show that Poris can improve the performance of PSRT applications significantly
Thank you!
System Virtualization
Hypervisor
Hardware
VM
Applications
Guest OS
VM
Applications
Guest OS
Credit Scheduler• CPU resources (or credits) are distributed to VCPUs of
VMs in proportion to their weight
• three kinds of VCPU priorities: boost, under, and over
• VCPUs with the same priority are scheduled in FCFS manner
• supports SMP platforms well