VirtuaLinux -...

1
C LUSTER V IRTUALIZATION VirtuaLinux separates three environments, tar- geted to different classes of administrators: The physical cluster, including the physical devices, which are insulated and made trans - parent (hardware technician). The privileged cluster, i.e. the environment that provides services and interfaces to the virtual clusters (skilled OS administrator). The virtual clusters that are sets of virtual machines. They can run any OS and con- figuration in such a way Grid and Be- owulf style virtual clusters can co- exist simultaneously (standard OS administrator). DISKLESS C LUSTER + EXTERNAL SAN Using EVMS and iSCSI, the architecture pro- vides a flexible, high-level description of the underlying hardware that frees the adminis - trator from the traditional, rigid allocation of resources. Performance and scalability are achieved by means of the direct access of cluster blades to the external SAN via a switched Infiniband network (10-20Gb/s). Storage reliability is implemented within the external SAN, which can exploit a redundant array of disks (disks that can be mounted on the blades are usually quite slow and fragile). V IRTUALINUX ARCHITECTURE MASTERLESS C LUSTER CONFIGURATION VirtuaLinux has no master, all nodes have a symmetric configuration. Critical OS services are categorized and made redundant by either active or passive replication in such a way they are, at each point in time, cooperatively implemented by the running nodes. Any blade of the cluster can be hot-swapped with no impact on cluster operation. VirtuaLinux Virtual Clustering with no single point of failure http://sourceforge.net/projects/virtualinux VirtuaLinux is a Linux meta-distribution that allows the creation, deployment and administration of virtualized clusters with no single point of failure. VirtuaLinux architecture supports diskless configurations and provides an efficient, iSCSI based abstraction of the SAN. Clusters running VirtuaLinux exhibit no master node, thus boosting resilience and flexibility. DESIGNERS, DEVELOPERS, AND CREDITS University of Pisa: Marco Aldinucci, Marco Danelutto, Francesco Polzella, Gianmarco Spinatelli, Massimo Torquati, Marco Vanneschi - Eurotech HPC: Manuel Cacitti, Alessandro Gervaso, Pierfrancesco Zuccato. VirtuaLinux has been developed at the HPC lab. of Computer Science Dept. - University of Pisa and Eurotech HPC, a division of Eurotech Group. VirtuaLinux project has been supported by the initiatives of the LITBIO Consortium, founded within FIRB 2003 grant by MIUR, Italy. REFERENCES M. Aldinucci, M. Torquati, M. Vanneschi, A. Gervaso, P. Zuccato, M. Cacitti. VirtuaLinux Design Principles. Technical Report TR-07-13, Computer Science Department, University of Pisa, Italy, June 2007. http://compass2.di.unipi.it/TR/Files/TR-07-13.pdf.gz M. Aldinucci, M. Danelutto, M. Torquati, F. Polzella, G. Spinatelli, M. Vanneschi, A. Gervaso, M. Cacitti, P. Zuccato. VirtuaLinux: virtualized high-density clusters with no single point of failure. Intl. PARCO 2007: Parallel Computing, September 2007, to appear. The Eurotech Group: http://www.eurotech.com - HPC lab. of Uni. Pisa Computer Science Dept.: http://www.di.unipi.it/groups/architetture/ - LITBIO: http://www.litbio.org V IRTUALINUX F EATURES AND T OOLS VirtuaLinux provides: A bootable DVD. Distribution independence: Virtual Clusters can run any distribution (currently included virtual clusters: Ubuntu Edgy 6.10 and CentOS 4.4 for x86_64). An install facility to setup and configure the included vir - tual machine images in a diskless, masterless fashion. A recovery facility able to reset a misconfigured (physical) node to factory status. A toolkit to manage Virtual Clusters, which can be dy - namically created, destroyed and moved. User and developer documentation. Marco Aldinucci University of Pisa, Italy [email protected] 26-29 June 2007, Dresden, Germany Pierfrancesco Zuccato Eurotech HPC, Italy [email protected] SUMMARY VirtuaLinux allows the coexistence of many Linux distributions and configurations in the same cluster; moreover, virtual clusters can be saved, restored and moved. Upgrading and testing of the cluster is greatly simplified and requires less time. In addition, fault-tolerance is obtained by a combination of architectural, software and hardware strategies that implement a design with no single point of failure. DISK ABSTRACTION L AYER ARCHITECTURE E UROTECH C LUSTERS E QUIPPED WITH V IRTUALINUX volumes blade 1 blade 2 blade m sda1 sda2 sda3 container_0 segments containers disks segm1 segm2 R_def Ra_1 Ra_2 Ra_m ... segm3 R_shared regions file systems /dev/evms/default /dev/evms/node_1 /dev/evms/node_2 /dev/evms/node_m /dev/evms/shared ext3 ext3 ext3 ext3 OCFS2 snap_1 snap_2 snap_n ... snapshots sdb EVMS SAN ... ... Rb_1 Rb_2 Rb_m /dev/evms/swap_1 swap /dev/evms/swap_2 swap /dev/evms/swap_m swap cluster ... sda sdb diskless SMP blade N1 diskless SMP blade N2 diskless SMP blade Nm iSCSI SAN - set of RAID disks (e.g. Linux openfiler) actively or passively replicated OS services S1, S2, ... (e.g. TFTP, DHCP, NTP, LDAP, DNS, ...) EVMS volumes (disk abstraction layer) Virtual Cluster "tan" (e.g. 1 VM x blade, 4 CPUs per VM) Virtual Cluster "red" (e.g. 2 VMs x blade, 1 CPU per VM) Xen hypervisor (VMM) & Linux kernel iSCSI EVMS IBoIP shared volume VM tan_1 VM tan_2 VM tan_m ... S4 active S2 active S1 active S3 active S5 primary S6 primary S4 active S2 active S1 active S3 active S5 backup S6 backup S4 active S2 active S1 active S3 active S5 backup S6 backup iSCSI EVMS IBoIP private volume N2 shared volume iSCSI EVMS IBoIP private volume Nm shared volume Infiniband 10x switch Giga Ethernet switch e.g. 192.168.0.0/24 e.g. 192.168.1.0/24 e.g 10.0.1.0/24 VM red_1 VM red_3 VM red_k e.g 10.0.2.0/24 VM red_2 VM red_4 VM red_k-1 private volume N1 ... LINUX OS SERVICES All standard Linux services are made fault-tolerant via either active or passive replication: Active: Services are started in all nodes; a suitable configura - tion enforces load balance on cli - ent requests. E.g. NTP, DNS, TFTP, DHCP. Passive (primary-backup): Linux HA with heartbeat is used as fault detector. E.g. LDAP, IP gateway. V IRTUAL CLUSTERS (VC) Each virtual node (VM) of a VC is a virtual machine that can be config- ured at creation time. It exploits a cluster-wide shared storage. Each VC exploits a private network and can access the cluster external gateway. VMs of a VC can be flexibly mapped onto the cluster nodes. VCs can be dynamically created, destroyed, suspended on disk. KERNEL BASIC F EATURES All standard Linux modules. Xen hypervisor, supporting Linux paravirtualization, and Microsoft Windows via QEMU binary transla- tion (experimental). Network connectivity, including Infiniband userspace verbs and IP over Infiniband. iSCSI remote storage access. OCFS2 and GFS shared file sys - tems. DISKS ABSTRACTION LAYER A set of private and shared EVMS volumes are mounted via iSCSI in each node of the cluster: A private disk (/root) and a OCFS2/ GFS cluster-wide shared SAN are mounted in each node. EVMS snapshot technique is used for a time and space efficient cre- ation of the private remote disk. A novel plug-in of EVMS has been designed to implement this fea - ture.

Transcript of VirtuaLinux -...

Page 1: VirtuaLinux - calvados.di.unipi.itcalvados.di.unipi.it/storage/paper_files/2007_ICS_VirtuaLinux.pdf · University of Pisa: Marco Aldinucci, Marco Danelutto, Francesco Polzella, Gianmarco

Cluster Virtualization

VirtuaLinux separates three environments, tar-geted to different classes of administrators:The physical cluster, including the physical devices, which are insulated and made trans-parent (hardware technician).The privileged cluster, i.e. the environment that provides services and interfaces to the virtual clusters (skilled OS administrator). The virtual clusters that are sets of virtual

machines. They can run any OS and con-figuration in such a way Grid and Be-

owulf style virtual clusters can co-exist simultaneously (standard OS administrator).

Diskless Cluster + external sanUsing EVMS and iSCSI, the architecture pro-vides a flexible, high-level description of the underlying hardware that frees the adminis-trator from the traditional, rigid allocation of resources. Performance and scalability are achieved by means of the direct access of cluster blades to the external SAN via a switched Infiniband network (10-20Gb/s).Storage reliability is implemented within the external SAN, which can exploit a redundant array of disks (disks that can be mounted on the blades are usually quite slow and fragile).

• Virtualinux arChiteCture

Masterless Cluster Configuration

VirtuaLinux has no master, all nodes have a symmetric configuration.Critical OS services are categorized and made redundant by either active or passive replication in such a way they are, at each point in time, cooperatively implemented by the running nodes. Any blade of the cluster can be hot-swapped with no impact on cluster operation.

VirtuaLinuxVirtual Clustering with no single point of failure

http://sourceforge.net/projects/virtualinux

VirtuaLinux is a Linux meta-distribution that allows the creation, deployment and administration of virtualized clusters with no single point of failure. VirtuaLinux architecture supports diskless configurations and provides an efficient, iSCSI based abstraction of the SAN. Clusters running VirtuaLinux exhibit no master node, thus boosting resilience and flexibility.

Designers, DeVelopers, anD CreDitsUniversity of Pisa: Marco Aldinucci, Marco Danelutto, Francesco Polzella, Gianmarco Spinatelli, Massimo Torquati, Marco Vanneschi - Eurotech HPC: Manuel Cacitti, Alessandro Gervaso, Pierfrancesco Zuccato. VirtuaLinux has been developed at the HPC lab. of Computer Science Dept. - University of Pisa and Eurotech HPC, a division of Eurotech Group. VirtuaLinux project has been supported by the initiatives of the LITBIO Consortium, founded within FIRB 2003 grant by MIUR, Italy.

referenCesM. Aldinucci, M. Torquati, M. Vanneschi, A. Gervaso, P. Zuccato, M. Cacitti. VirtuaLinux Design Principles. Technical Report TR-07-13, Computer Science Department, University of Pisa, Italy, June 2007. http://compass2.di.unipi.it/TR/Files/TR-07-13.pdf.gzM. Aldinucci, M. Danelutto, M. Torquati, F. Polzella, G. Spinatelli, M. Vanneschi, A. Gervaso, M. Cacitti, P. Zuccato. VirtuaLinux: virtualized high-density clusters with no single point of failure. Intl. PARCO 2007: Parallel Computing, September 2007, to appear. The Eurotech Group: http://www.eurotech.com - HPC lab. of Uni. Pisa Computer Science Dept.: http://www.di.unipi.it/groups/architetture/ - LITBIO: http://www.litbio.org

Virtualinux features anD tools

VirtuaLinux provides:A bootable DVD.Distribution independence: Virtual Clusters can run any distribution (currently included virtual clusters: Ubuntu Edgy 6.10 and CentOS 4.4 for x86_64).An install facility to setup and configure the included vir-tual machine images in a diskless, masterless fashion.A recovery facility able to reset a misconfigured (physical) node to factory status.A toolkit to manage Virtual Clusters, which can be dy-namically created, destroyed and moved.User and developer documentation.

••

Marco AldinucciUniversity of Pisa, [email protected]

26-29 June 2007, Dresden, Germany

Pierfrancesco ZuccatoEurotech HPC, Italy

[email protected]

suMMary

VirtuaLinux allows the coexistence of many Linux distributions and configurations in the same cluster; moreover, virtual clusters can be saved, restored and moved. Upgrading and testing of the cluster is greatly simplified and requires less time. In addition, fault-tolerance is obtained by a combination of architectural, software and hardware strategies that implement a design with no single point of failure.

Disk abstraCtion layer arChiteCture euroteCh Clusters equippeD with Virtualinux

volumes

blade1

blade2

bladem

sda1 sda2 sda3

container_0

segments

containers

disks

segm1 segm2

R_def Ra_1 Ra_2 Ra_m...

segm3

R_sharedregions

file systems

/dev/evms/default

/dev/evms/node_1 /dev/evms/node_2 /dev/evms/node_m /dev/evms/shared

ext3 ext3 ext3 ext3 OCFS2

snap_1 snap_2 snap_n...snapshots

sdb

EVMS

SAN

...

...

Rb_1 Rb_2 Rb_m

/dev/evms/swap_1

swap

/dev/evms/swap_2

swap

/dev/evms/swap_m

swap

cluste

r

...

sda sdb

diskless SMP blade N1 diskless SMP blade N2 diskless SMP blade Nm

iSCSI SAN - set of RAID disks (e.g. Linux openfiler)

actively or passively replicated OS services S1, S2, ... (e.g. TFTP, DHCP, NTP, LDAP, DNS, ...)

EVMS volumes (disk abstraction layer)

Virtual Cluster "tan" (e.g. 1 VM x blade, 4 CPUs per VM)

Virtual Cluster "red" (e.g. 2 VMs x blade, 1 CPU per VM)

Xen hypervisor (VMM) & Linux kernel

iSCSI EVMS IBoIPiSCSI EVMS IBoIPiSCSI EVMS IBoIPiSCSI EVMS IBoIP

sharedvolume

VMtan_1

VMtan_2

VMtan_m

...

S4active

S2active

S1active

S3active

S5primary

S6primary

S4active

S2active

S1active

S3active

S5backup

S6backup

S4active

S2active

S1active

S3active

S5backup

S6backup

iSCSI EVMS IBoIP

privatevolume N2

sharedvolume

iSCSI EVMS IBoIP

privatevolume Nm

sharedvolume

Infiniband 10xswitch

Giga Ethernetswitch

e.g. 192.168.0.0/24

e.g. 192.168.1.0/24

e.g 10.0.1.0/24

VMred_1

VMred_3

VMred_k

e.g 10.0.2.0/24

VMred_2

VMred_4

VMred_k-1

privatevolume N1

...

linux os serViCes

All standard Linux services are made fault-tolerant via either active or passive replication:

Active: Services are started in all nodes; a suitable configura-tion enforces load balance on cli-ent requests. E.g. NTP, DNS, TFTP, DHCP. Passive (primary-backup): Linux HA with heartbeat is used as fault detector. E.g. LDAP, IP gateway.

Virtual Clusters (VC)Each virtual node (VM) of a VC is a virtual machine that can be config-ured at creation time. It exploits a cluster-wide shared storage.Each VC exploits a private network and can access the cluster external gateway.VMs of a VC can be flexibly mapped onto the cluster nodes.VCs can be dynamically created, destroyed, suspended on disk.

kernel basiC features

All standard Linux modules.Xen hypervisor, supporting Linux paravirtualization, and Microsoft Windows via QEMU binary transla-tion (experimental).Network connectivity, including Infiniband userspace verbs and IP over Infiniband.iSCSI remote storage access.OCFS2 and GFS shared file sys-tems.

••

••

Disks abstraCtion layer A set of private and shared EVMS volumes are mounted via iSCSI in

each node of the cluster:A private disk (/root) and a OCFS2/GFS cluster-wide shared SAN are mounted in each node. EVMS snapshot technique is used for a time and space efficient cre-ation of the private remote disk.A novel plug-in of EVMS has been designed to implement this fea-ture.