XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

31
Software Center Performance Evaluation of Live Migration based on Xen ARM PVH for Energy-efficient ARM Server 2013-10-24 Jaeyong Yoo, Sangdok Mo, Sung-Min Lee, ChanJu Park, Ivan Bludov, Nikolay Martyanov Software R&D Center Samsung Electronics

description

Electricity charge for operating data centers is reaching approximately 27% of total operation cost. For this reason, ARM servers have been getting more attention for future energy-efficient data centers and the performance of ARM processors keeps increasing (i.e., almost 3GHz). For efficiently utilizing ARM cores, ARM PVH has been introduced in Xen 4.3, and based on this, we have implemented live migration feature and evaluated on top of dualcore ARM board. More specifically, we choose multimedia streaming workload, measure the maximum concurrent clients, and calculate clients per watt (CPW) as the performance metric. From this, we have found out that even dualcore ARM processor (with virtualization) gives higher CPW (7 CPW) over x86 case (6 CPW). In addition we could reduce the energy consumption around 70% (4-to-1 consolidation for low-loaded servers) by using server consolidation.

Transcript of XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Page 1: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Performance Evaluation of Live Migration based on Xen ARM PVH for Energy-efficient ARM Server

2013-10-24

Jaeyong Yoo, Sangdok Mo, Sung-Min Lee, ChanJu Park, Ivan Bludov, Nikolay Martyanov

Software R&D Center

Samsung Electronics

Page 2: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Contents

• Motivation

• Live Migration in Xen ARM PVH – Design and Implementation

• Performance Evaluation 1. Streaming service with ARM vs. x86 2. Streaming server consolidation with live migration 3. Streaming service with quad-core ARM board

• Concluding Remark

Page 3: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Motivation

Page 4: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Energy Problem in Datacenters

• Datacenters eat up magnificent amount of electricity

Ref: Jaroslav Rajić, ``Evolving Toward the Green Data Center,’’ http://stack.nil.si/ipcorner/GreenDC/#chapter2

Electricity (27%)

Service (13%)

Engineering & Installation (19%)

Power Equipment (17%)

Cooling Equipment (6%)

Space (17%)

Racks (3%)

Datacenter operation cost

Page 5: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

ARM Servers for Future Green Data Center • Economical choice

– Significant advantage in compute/watt

• Vendors of ARM Server Soc – AMD: Seattle (64-bit ARM server processor, 2H 2014) – Calxeda: ECX-1000 – Applied Micro: X-Gene

• OS for ARM Servers

– Linaro LEG – Redhat deploys ARM-Based Servers for Fedora Project

Calxeda Energy Core ECX-1000

AMD Seattle: 64-bit ARM server

Applied micro X-Gene

Page 6: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

ARM Servers for Future Green Data Center • Economical choice

– Significant advantage in compute/watt

• Vendors of ARM Server Soc – AMD: Seattle (64-bit ARM server processor, 2H 2014) – Calxeda: ECX-1000 – Applied Micro: X-Gene

• OS for ARM Servers

– Linaro LEG – Redhat deploys ARM-Based Servers for Fedora Project

Calxeda Energy Core ECX-1000

AMD Seattle: 64-bit ARM server

Applied micro X-Gene

Further energy efficiency maximization:

Server consolidation by virtualization

Page 7: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Design and Implementation of Live Migration in Xen ARM PVH

Page 8: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Overall Architecture

• Components for Live Migration in Xen ARM PVH

Dom0 DomU

libxl

libxc

xl

Kernel Kernel

streaming server

apache mysql

Hypervisor

VCPU

sa

ve/re

store

dirty

-page

dete

cting

get d

irty-

bitm

ap

HVM

conte

xt

save/re

store

Mem

ory d

ata

sa

ve/re

store

ARM-migrate

suspend /resume

Mem

ory

map

get/se

t

Legend

Newly Impleme

nted

Existing module

Hardware (Arndale)

Cortex-A15 Dualcore 1.7 GHz, 2GB Memory, SATA3, USB3.0

libvirt perform-migrate

Modified module

Page 9: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Sequence of Live Migration

xl xc memory get map

memory restore

dirty detection

dirty bitmap

HVM save

VCPU save

DomU Suspend

xl xc memory set map

memory save

HVM restore

VCPU restore

DomU resume

migrate- receive domain

- save domain -restore

get/set memory map

start dirty- paging

store dirty- pages

get dirty bitmap save/restore memory contents

loop until stop-condition

suspend domU

last-dirty pages

save/restore HVM

save/restore VCPU

resume DomU

migration destination migration source

Page 10: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Major Hypercalls for Live Migration

Functions Hypercalls Description

Memory Migration XENMEM_get/set_memory_map • Save/restore physical memory map of DomU

XEN_DOMCTL_shadow_op • Enable dirty-page detection • Get dirty-page bitmap

XENMEM_add_to_physmap_range • Access the domU’s memory from dom0

VCPU Migration XEN_DOMCTL_get/setvcpucontext • Save/restore the vcpu registers

HVM Migration XEN_DOMCTL_get/sethvmcontext • Save/restore the hvm contexts (e.g., timer, interrupt controller)

Implemented Hypercalls for Enabling Live Migration Feature in Xen ARM PVH

Page 11: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Dirty-page Tracing: Get-dirty Bitmap

hypercall param from toolstack: dirty-page bitmap

libxc

ARM-migrate

Dirty-page detecting

get dirty-page bitmap

Temporary dirty-page storing

Filling up the dirty-page bitmap

dirty pages

XEN_DOMCTL_ shadow_op (peek dirty-

pages)

candidates:

1. Embedded in page table (use un-used bits in PTE)

2. Linked list of PFNs 3. Bitmap of PFNs

Page 12: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Dirty-page Tracing: Dirty-page Detection

Level 1

Level 2

Level 3

Xen-side for Xen itself Xen-side

for domu

domu kernel

Level 1

Level 2

Level 3

Level 1

Level 2

Level 3

guest VA

IPA

MA

Guest page table

p2m: physical to machine page table

Xen page table

Page 13: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Dirty-page Tracing: Dirty-page Detection

Level 1

Level 2

Level 3

Xen-side for domu

domu kernel

Level 1

Level 2

Level 3

Level 1

Level 2

Level 3

guest VA

IPA

MA

PTE

w=0

write bit=0/1

Xen page table

Guest page table

Xen-side for Xen itself

Page 14: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Dirty-page Tracing: Dirty-page Detection

Level 1

Level 2

Level 3

Xen-side for domu

domu kernel

Level 1

Level 2

Level 3

Level 1

Level 2

Level 3

guest VA

IPA

MA

PTE

w=0

write bit=0/1

write request

fault traped by

xen

Xen page table

Guest page table

Xen-side for Xen itself

Page 15: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Implementation Choice

• Manual walking of p2m table

• Virtual-linear page table

Page 16: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Manual Walking of p2m Table

Level 1

Level 2

Level 3

Xen-side for Xen itself

Xen-side for domu

Level 1

Level 2

Level 3

IPA

MA

physical memory (a.k.a. machine memory)

create a mapping to Xen

(3 times)

PTE PTE

Superpage checking

w bit modification

Page 17: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Virtual-linear Page Table

• Consider third-level page table as a continuous memory block in virtual address space

ref: http://www.technovelty.org/linux/virtual-linear-page-table.html

physical memory (a.k.a. machine memory)

virtual memory

※ virtually continous third-level page table (8GB DomU requires 16MB third-level page table)

3lvl PT #1

3lvl PT #2

3lvl PT #5

Level 1 Lev

el 2 Level 3

※ guest’s third-level page table

Xen page table

Page 18: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Virtual-linear Page Table

• Consider third-level page table as a continuous memory block in virtual address space

physical memory (a.k.a. machine memory)

virtual memory

※ virtually continous third-level page table (8GB DomU requires 16MB third-level page table)

3lvl PT #1

3lvl PT #2

3lvl PT #5

Level 1 Lev

el 2 Level 3

※ guest’s third-level page table

Xen page table for given IPA, with some arithmetic, calculate the Xen VA and just read

it!

ref: http://www.technovelty.org/linux/virtual-linear-page-table.html

Page 19: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Evaluation

Page 20: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Experiment Environment (Hardware/Software)

power source Power meter (Yokogawa WT3000)

220v power clients 1G switch

x86 HW

Arndale board

Linux Linux

Linux

xen

Streaming Server

Streaming Server

Streaming Server Exp. Platform 1 Exp. Platform 2

Exp. Platform 2

• x86 hardware – 8 cores (i7-2600 3.4GHz) – Intel 1Gbps NIC – 4GB memory

• ARM

– Arndale board – 2 cores – 1Gbps Network card (USB 3.0) – SSD mSATA – 2GB memory

• Xen source: Xen 4.4 staging • Domain kernels:

– Dom0: Linaro kernel 3.11 – DomU: Linaro kernel 3.9

• Streaming server: – ffserver (RTSP streaming)

Page 21: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Experiment Environment (Hardware/Software)

power source Power meter (Yokogawa WT3000)

220v power clients 1G switch

x86 HW

Arndale board

Linux Linux

Linux

xen

Streaming Server

Streaming Server

Streaming Server Exp. Platform 1 Exp. Platform 2

Exp. Platform 2

• x86 hardware – 8 cores (i7-2600 3.4GHz) – Intel 1Gbps NIC – 4GB memory

• ARM

– Arndale board – 2 cores – 1Gbps Network card (USB 3.0) – SSD mSATA – 2GB memory

• Xen source: Xen 4.4 staging • Domain kernels:

– Dom0: Linaro kernel 3.11 – DomU: Linaro kernel 3.9

• Streaming server: – ffserver (RTSP streaming)

Note: Major evaluations are performed within mobile-featured ARM board.

Performance evaluation of server-featured ARM board is presented at the end of the slides.

Page 22: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Experiment Environment (Scenarios)

Test case 1: Streaming service with ARM vs. x86

Saturate the streaming server to get the maximum number of

streaming clients

Test case 2: Streaming server consolidation with live migration

10% of the maximum number of streaming clients

Measurement 1:

Maximum number of streaming clients for each test platform

Measurement 2:

Energy-efficiency comparison for each test platform

Measurement 1:

Energy-efficiency comparison for each test platform

Measurement 2:

Streaming server consolidation within xen-virtualized servers

Measurement 3:

Total live migration time, service downtime

Measurement 4:

Dirty-page detection time, dirty-page get-bitmap time,

total dirty-page counts

Test case 3: Streaming with quad-core ARM board

Maximum clients with varying number of ARM cores

(in-progress)

Page 23: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Case 1: Streaming Service ARM vs. x86 (Maximum capacity of ARM virtualized Server)

• Max streaming clients with varying number of VMs – Dual-core ARM board

– Single VCPU for each VM

Number of VMs

Per VM Memory

Max Streaming Clients

Watt

1 512MB around ~110 14.8

2 512MB around ~80 12.6

3 256MB around ~90 14.5

4 256MB around ~80 11.8

Finding: ARM cores are major bottleneck

point

Page 24: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Case 1: Streaming Service ARM vs. x86 (Energy-efficiency comparison to x86 hardware)

• Compare with the best case of ARM* virtualization

OS Total memory in server

Max Streaming Clients

Watt Client/Watt Required memory

x86 with Linux

4GB ~750 121.5 W 6.17 CPW ~ 2.4GB

ARM with native Linux

2GB ~200 11.7 W 17.09 CPW ~ 707MB

ARM with virtualization

512MB ~110 14.8 W 7.43 CPW ~ 340MB

* Dual-core ARM CPU Finding:

Even dual-core ARM with virtualization show higher CPW than x86

Page 25: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Case 2: Streaming Server Consolidation of ARM virtualized server

• Scenario: – 4 ARM boards, each running a 256MB VM – Each VM has 10 clients – Consolidate all VMs to one ARM board, and turn off other 3

ARM boards

Watts before consolidation

Watts after consolidation

Energy saving percentage

2 to 1 consolidation

2 x 8w = 16w 8.6w 46% saving

[extrapolated] 3 to 1

3 x 8w = 24w 8.9w 63% saving

[extrapolated] 4 to 1

4 x 8w = 32w 9.4w 71% saving Finding:

Server consolidation can significantly save energy consumption

Page 26: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Case 2: Live Migration Performance

• Migrate a VM at a time – With different domU memory size (128MB, 256MB, 512MB)

• Measurements: – Live migration time

• Whole time for live migration

– Total dirty pages • Number of dirtied pages during the time of live migration

Page 27: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Case 2: Live Migration Performance

• Number of dirty-pages in iterations

configuration for stop-condition

max iter: 29

max_mem_factor: 3 min_dirty_per_iter: 50

Page 28: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Case 2: Service downtime due to live migration • Service downtime

– The time that VM is not responding to outside interaction – Measurement method:

• flood-ping to migrating domain • time difference between packets send from the migrating domain

Page 29: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Case 2: Performance of dirty-page detection • Measure the elapsed time of two major functions

– dirty-page detection

– dirty-page collection

Page 30: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Case 3: Quad-core ARM board (In-progress)

• ARM board: 4 ARM cores with 8GB memory

Number of VMs

Per VM Memory

Max Streaming Clients

Watt CPW

1 1GB ~ 120 17.0 W 7.06 CPW

2 1GB ~250 18.5 W 13.51 CPW

3 1GB ~300 18.9 W 15.87 CPW

OS Total memory Max Streaming Clients

Watt Client/Watt

x86 with Linux

4GB ~750 121.5 W 6.17 CPW

• x86 case: (see slide 24)

Page 31: XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

Software Center

Concluding Remark

• ARM server is a good candidate for green data centers – Even ARM mobile processors with virtualization

results in better CPW compared to x86 – Virtualization in ARM servers can leverage the

energy efficiency by server consolidation

• Pass-through to DomU could significantly increase the performance