VIRTUALISATION

15
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

description

VIRTUALISATION OF HADOOP CLUSTERS

Transcript of VIRTUALISATION

Page 1: VIRTUALISATION

VIRTUALISATION OF HADOOP CLUSTERS

Dr G Sudha SadasivamAssistant ProfessorDepartment of CSE

PSGCT

Page 2: VIRTUALISATION

Introduction• Physical machine can have a number of smaller

virtual machines (VMs), each running a separate operating system instance.

• Challenges– partitioning of a machine – concurrent execution of multiple operating systems – Isolation of virtual machines from one another– Support heterogeneity of applications– Low performance overhead

• Xen is a virtual machine monitor for x86 that supports execution of multiple guest operating systems hypervisor, kernel and user space applications

Page 3: VIRTUALISATION

Objective• Automation of creation and deletion of a virtual

cluster for hosting Hadoop using Xen• A large physical cluster can be simulated on few

physical machines

Steps• Input user configuration by editing configuration files.• Generates user specified number of VM running

Hadoop.• Users can manage the Hadoop file system • Users can submit jobs for each physical machine.

Page 4: VIRTUALISATION

Need for virtualisation• Ability to recover from software problems quickly by

saving a copy of guest image.• High availability by relocating guests when a server

machine in inoperable.• Dynamic load balancing by migrating guests from server

machines.• Consolidation of many services in one physical machine

and administer them independently in VM.• Usage of abundant computational power on the physical

machine. Minimisation of cost.• Switch between applications on different OS using

hypervisors.

Page 5: VIRTUALISATION

HADOOP CLUSTER CONFIGURATION

Host node is configured as master (NN) and also acts as slave (DN) Guest node (DN) is configured as slave

Page 6: VIRTUALISATION

Master is the HostOS which acts as job tracker/Name node. Slave is the GuestOS which acts as task tracker/Data node.

Page 7: VIRTUALISATION

• Installation of Xen kernel• Creation of Guest OS• Configuration of Guest OS • Installation of Java Development Kit• Extraction and Configuration of Hadoop

Cluster• Creating OS image for new Guest Machines• Creation and removal of other Virtual

machines, copy the OS images

Steps in implementing

Page 8: VIRTUALISATION

Automated Creation of a Hadoop Virtual cluster

XML file has configuration details of new VM

Page 9: VIRTUALISATION

Automated Shut down of Hadoop Virtual cluster

Page 10: VIRTUALISATION

Advantages of automated virtualization in Hadoop

1.Effective isolation of the datanode from the load on the machine caused by other processes makes the datanode more responsive/reliable.

2.The availability of multiple virtual machines on each machine lowers the granularity of scheduling units thus making it possible to schedule multiple task trackers on the same machine and to improve the overall utilization of the whole clusters.

3.The snapshot a virtual cluster makes it possible to re-activate the same cluster in the future and start to work from the snapshot. (rollback)

Page 11: VIRTUALISATION

Enhancements

1.Providing a graphical console for monitoring and managing virtual cluster.

2.Creation and Migration of virtual machine for the purpose of load balancing.

3.Enabling snapshot of the virtual machine. For checkpointing

4.Providing Intelligent Monitoring System which could detect the failure of a virtual machine in the cluster and restarts the particular virtual machine increasing the reliability.

Page 12: VIRTUALISATION

Performance of Physical vs Virtual clusters

0

5E+10

1E+11

1.5E+11

2E+11

2.5E+11

1 2 3 4 5

Number of nodes

Tim

e in

nse

c

Physical clusters Virtual Clusters

4 6 8 10 12

Page 13: VIRTUALISATION

1.00E+09

1.00E+10

1.00E+11

1 2 3 4 5

Number of nodes

Tim

e in

nse

c

7 Nodes Data nodes – 6 Virtual nodes

Name node –1 physical node

Master as a Physical Node

Page 14: VIRTUALISATION

7 Nodes Data nodes – 1 physical node + 5 Virtual nodes

Name node –1 virtual node

1.00E+09

1.00E+10

1.00E+11

1 2 3 4 5

File size in MB

Tim

e in

nse

cMaster as a Virtual Node

Page 15: VIRTUALISATION

Performance with varying number of Virtual nodes

5.74E+10

5.76E+10

5.78E+105.80E+10

5.82E+10

5.84E+10

5.86E+105.88E+10

5.90E+10

5.92E+10

4 6 8 10 12

File Size in MB

Tim

e in

Nan

oseo

nd

Six Virtual Nodes Four Virtual Nodes