Virtualization in the Trenches With VMware

24
Virtualization in the trenches with VMware, Part 1: Basics and benefits By Sam Webb | Last updated 25 days ago IT in the enterprise is as much about technology as it is about people, processes, and business needs. In a five-part series, we will cover some of the challenges faced when trying to design and deploy a virtualization platform for a sizable enterprise and migrate its infrastructure into the cloud. This usually ends up being a far larger undertaking than imagined, partly due to technical challenges, but mostly due to having to make careful selections at every step of the way. For reasons I'll cover in a moment, this series focuses on VMware. There are some fantastic alternatives to VMware out there, but VMware was the package that best suited my own company's needs. However, much of the discussion in this series can easily be applied to other virtualization platforms. Product considerations When talking about enterprise environments, it is important to remember that one rarely deals with logical situations. That is to say, while you may both logically and intuitively know the right answer to the technical challenge you face, the reality of the situation usually ends up being that those in management—and often the accounting department—get the final say in what route you take. You are often left with either older or inadequate technology as your project resource.

Transcript of Virtualization in the Trenches With VMware

Page 1: Virtualization in the Trenches With VMware

Virtualization in the trenches with VMware, Part 1: Basics and benefitsBy Sam Webb | Last updated 25 days ago

IT in the enterprise is as much about technology as it is about people, processes, and business needs. In a five-part series, we will cover some of the challenges faced when trying to design and deploy a virtualization platform for a sizable enterprise and migrate its infrastructure into the cloud. This usually ends up being a far larger undertaking than imagined, partly due to technical challenges, but mostly due to having to make careful selections at every step of the way.

For reasons I'll cover in a moment, this series focuses on VMware. There are some fantastic alternatives to VMware out there, but VMware was the package that best suited my own company's needs. However, much of the discussion in this series can easily be applied to other virtualization platforms.

Product considerations

When talking about enterprise environments, it is important to remember that one rarely deals with logical situations. That is to say, while you may both logically and intuitively know the right answer to the technical challenge you face, the reality of the situation usually ends up being that those in management—and often the accounting department—get the final say in what route you take. You are often left with either older or inadequate technology as your project resource.

Page 2: Virtualization in the Trenches With VMware

And then there are the typical enterprise hardware refresh cycles and the associated headaches with trying to both introduce, and transition to, a new platform. For example, a core switching platform upgrade and customer port migration can end up taking up to two years, depending on the scale and uptime requirements. This starts to be a serious problem if the enterprise works on a three-year lifecycle refresh for hardware. Nor does this take into account business requirements such as waiting 6 to 12 months before adopting a technology or hardware platform, or choosing to run one full software version behind the current stable release, as some financial services companies opt for. That easily ends up causing various headaches, especially with something fast-moving like virtualization software.

For my own company's enterprise virtualization rollout, the biggest question when choosing the virtualization platform isn't about performance or features—it's about support. Who do you call at 4am when everything is down and not coming back up on its own, upset customers are calling in, overnight jobs are failing, and you have about four hours to get every service back up and running before the office denizens pour in to start their day? The choice then becomes about being able to hold a third party accountable in front of your customers and managers for the unexpected downtime. So having that third party contractually obligated to fix your issue as soon as possible is the driving force here. This factor is a significant driver for open-source projects offering a commercial variant with paid-for support. The simple truth is that most enterprises will not touch a lot of technology without there being a strong support contingent backing it up, especially when it comes to Tier 1 applications like Exchange Server, SAP, SQL Server, etc. Obviously, there are exceptions here (e.g. DNS and Web servers). However, it's an entirely different story trying to justify running software with very expensive support contracts on top of a free

Page 3: Virtualization in the Trenches With VMware

platform such as CentOS.

On that note, the best supported virtualization platform for commodity x86 hardware has been VMware's ESX/vSphere platform, in use by 100 percent of the Fortune 100, 98 percent of the Fortune 500, and 96 percent of the Fortune 1000. Add in some aggressive marketing about not only fully supporting Tier 1 applications for virtualization, but also full support and certification of the VMware platform by the actual manufacturers of said applications, and we had a winner in management's eyes.

However, a reputation like that comes at a price for both the software and the support. This price can lead companies to try to cut corners, and not purchase hardware and software for a 1:1 disaster recovery environment, a development/testing/staging/user acceptance environment, and a lab environment for training. It's scarily common for companies to skip out on some of the aforementioned, and in some cases, skip out on all of them, meaning that everything needs to be done to the live, in-production cluster, with a dash of hope and a pinch of prayer. I have seen one company take this to the extreme: they bought dozens upon dozens of blades, yet a few months into the project realized that they had spent their entire budget on the hardware and production software licenses and could not afford even a single lab license to ensure that software platform updates were successful. They then realized that, due to business process demands, most blades would stay powered off for a number of months until customer demand would ramp up to require the additional capacity, thereby justifying the expenditure.

Virtualization benefits: live migration, high availability, and fault tolerance

There are two core benefits of virtualization that both improve the computing experience and also add another layer of complexity to it: consolidation and resiliency. Consolidation is easy to understand—it is literally multiple operating system instances being tenants on one physical server platform, storage platform, and the networking environment. Provided that there are adequate hardware resources to meet the requirements of the guest OSes, the benefits of virtualization here are fantastic and direct, especially in the multi-core CPU era. Resiliency, on the other hand, is about having enhanced survivability added to the running guest OS, typically through features like vMotion (a tool for migrating VMs between hosts), high availability (HA), and now fault tolerance (FT).

vMotion enables the live migration of a virtual machine across physical hosts, with no downtime or interruption in service. While not that exciting for a small computing environment, it's a fantastic advancement for environments that, for example, need to guarantee up to 99.99 percent uptime of either a service or server. (99.99 percent, or "four nines" uptime means approximately 52.56 minutes of downtime per calendar year, or 4.32 minutes per 30-day month.) When you consider that a single reboot can threaten your uptime rating for a month, it comes as no surprise that most enterprise apps end up being clustered, either at the application level, the server level, at the application tier level, or all of the above.

Page 4: Virtualization in the Trenches With VMware

Even with clustering and redundancy, any kind of planned maintenance work usually needs to be done in the middle of the night, often on the weekends, or whenever application usage is at its lowest. A simple software update becomes a skilfully timed display of coordination between the application administrators, system administrators, networking administrators (to ensure that the load balancers are properly removing the unavailable system(s) from the live server pools), and so forth. Any kind of hardware maintenance is typically quite expensive on the downtime, unless you are using very high-end hardware—the kind that can accept hot-plugging in additional CPUs, RAM modules, and add-in cards. In this kind of environment, the ability to migrate a running VM from one host to another without any downtime in order to perform hardware maintenance is fantastic, and leaves everyone happy, from customers to managers. In short, live migration has been the defining feature of virtualization, and you can look at other features as either an extension or a complement to moving live VMs between physical hosts. Being able to perform a live migration is the first benchmark of the maturity of a virtualization platform today.

The high availability feature is not quite as it sounds, in the sense that it isn't clustering per se; it merely grants the ability to configure automated VM restarts should either the guest OS unexpectedly halt or a physical host in a cluster of servers fail. Should the latter be the case, the VMs are reassigned among the surviving hosts and powered on in order to minimize downtime before personnel reach the incident and begin manual recovery. This can be tricky to fully utilize, because uneven VM resource allocations can wreak havoc on what can be powered back on and where, once the lost overall compute resources are taken into account.

On the other hand, a feature called "fault tolerance" provides something more intuitively useful to maintaining uptime: a copy of a running VM is kept running on another physical host, and everything about the two VMs are synchronized. That is to say, CPU states, memory states, and disk reads are streamed to the backup VM to ensure that execution is kept identical, down to the individual CPU instruction. If the primary physical host suddenly fails, the backup VM immediately becomes the active VM, with no interruption in service or availability. Having this level of availability on tap is very important for Tier 1 (mission critical) applications, where waiting for a reboot onto another physical host takes too long. While twice the computing resources are necessary for this to work, sometimes that's a very small price to pay compared to the losses incurred should a critical VM be down even for just a few minutes. Mind you, any type of failure within the guest OS or application won't be prevented by FT; it will instead be instantaneously experienced by the backup VM.

Page 5: Virtualization in the Trenches With VMware

Storage and backups

The fact that a virtual machine's disks typically show up as files on a storage array makes several important backup options instantly. First, simply copying the right files from your storage array will get you duplicates of the guest's disks and all of their data. Copying the virtual machine configuration files will get you a full clone of the VM, providing you take a few moments to update configuration settings to ensure that you have no conflicts with the original.

Then there are VM snapshots, which involve taking a complete picture of a guest OS along with its memory contents at a specific point in time. This snapshot lets you restore the VM to its state at that checkpoint in a matter of minutes or seconds. This is very important in many common scenarios, such as applying critical OS patches. If there's any incompatibility between the patch and your applications, or if your application no longer starts up, you simply click a button and roll back to the last known working state of the VM. This is available at both the virtualization platform level and the storage platform level. Today's storage array vendors tend to offer full integration with the virtualization layer. For example, if you take a snapshot of the whole disk array, the guest OSes will be notified to momentarily quiesce disk writes so that the snapshot is coherent.

The live migration for VMs discussed above also applies to storage (Storage vMotion on VMware)—files that make up the contents of a virtual machine can be moved from one storage array to another storage array while the virtual machine is powered on, with no pauses or downtime. The implications of this are staggering when you consider large environments with stringent uptime requirements. Do you need to perform risky maintenance on the storage array but happen to have another storage system available? Not a problem—simply migrate all of the VMs off of that particular storage array, and you're good to go. Did you just buy a new storage system and you'd like to migrate your VMs to it? No problem—start migrating each VM to the new storage once it's properly configured and available to the host. The list

Page 6: Virtualization in the Trenches With VMware

goes on. When you combine live migration with Storage vMotion, it becomes possible to take a live, running guest OS instance and transparently migrate both its storage, and the physical host it is running on, without any service impact. (Mind you, there are a few limitations, especially with needing to migrate to compatible CPU feature sets, but that's becoming more relaxed with each software release and CPU generation.)

Finally, there is a relatively new feature of storage arrays called deduplication. As the name implies, it means removing duplicate blocks of data by way of referencing just one block multiple times. What this means in practice is that if you have, say, a hundred nearly identical files, all of the content that's identical is only physically stored once, leading to fantastic storage space savings. Applying this to a virtualization environment, where you tend to have a lot of redundant data in the form of the actual base OS, you can save up to about 90% in disk usage. In practice, that number will be lower, but even a space gain of 40% is quite significant when you take into account that enterprise storage is many, many times more expensive than commodity hardware.

While the above may have read a bit like a marketing brochure, virtualization does deliver the promised goods, hence its wide-spread and ever-growing adoption. That said, properly implementing and integrating a full-fledged virtualization solution is wrought with pitfalls and potential mistakes at every level: choosing the hardware, setting up networking, setting up the platform itself, and day-to-day management. Then there are larger-scale endeavors like converting an entire datacenter from physical to virtual, also known as a P2V conversion. We'll cover each of those areas specifically in upcoming articles, with Part Two covering selection of the storage, networking, and hardware platforms.

In part one of this series, we looked at selecting an enterprise virtualization platform, and at some of the benefits gained. Now we're going to look at some of the challenges involved in selecting hardware to run

Page 7: Virtualization in the Trenches With VMware

it on, and in the process we'll discuss storage, networking, and servers/blades.

The real challenge here is not so much using and managing the hardware that you already have, but picking new technologies to ensure that you get the appropriate price/performance ratio, the necessary support options, and the needed availability and recoverability. You must also ensure that your choices will be sustainable for at least two years, if not three or more. Finally, there's the very real consideration of power usage and heat dissipation, as the hosting industry has been moving toward charging based on power and heat instead of physical space usage for a number of years now. But first, a quick primer on storage.

Part 1: Virtualization basics and benefits

When it comes to choosing the storage type, or platform, for a virtualization environment, there are five basic options, each with their own strengths, weaknesses, target usage scenarios, and price points. First, we'll cover the available types in order to introduce the major technologies, then we'll talk about several things to keep in mind with respect to availability, reliability, and recovery.

The first type of storage to look at is local storage, either in the form of higher performing SAS or SCSI disks, or cheaper SATA disks. To make this short, local storage should only be considered as a last resort, or for very specific deployments. Although the price/performance ratio of using local storage is excellent, employing it removes most of the benefits of virtualization, because all of the high-end features are automatically disabled when using local storage. A further detriment to using local storage is the introduction of an additional single point of failure per each node.

The second type of storage to consider is Networked File System, or NFS, which is a venerable networked storage system that started out in the UNIX world. It runs on top of your existing networking infrastructure, delivering solid performance for a reasonable price, depending on what hardware and software is being used to provide it. VMware ESX/vSphere is able to fully utilize NFS-based storage to enable its high-end functions, such as Dynamic Resource Scheduling [DRS], HA, and vMotion. Using multiple NFS mounts and enabling IP-based load-balancing can greatly increase throughput, as long as the underlying networking infrastructure can deliver. However, because NFS runs over a network and provides file-level access, as opposed to block-level access like all other kinds of storage, availability becomes an issue. This is especially important due to VMware only using NFS over TCP, as opposed to UDP. To be more specific, if the NFS end-point suffers any kind of an outage spanning more than about 30 seconds, some or all of the virtual machines residing on that mount point can crash and potentially suffer data loss. The same applies if your NFS appliance suffers an outage and fails over—the sessions need to be restarted, which can take a bit of time depending on how smooth the failover is.

Fibre Channel [FC] Storage Area Networks [SANs] are the de facto enterprise storage platforms, and are the third type of storage to consider. These types of storage can scale from one or two servers with a Host Bus Adapter [HBA] card in each, to hundreds or thousands of servers concurrently accessing the

Page 8: Virtualization in the Trenches With VMware

same SAN through redundant HBAs, each with redundant ports and redundant fabric paths. Think of a SAN much like a regular IP-based network, except that in this case it carries just storage data through a dedicated protocol. That is to say, you can segment and firewall a SAN in very similar ways to a regular network, and you can get multiple concurrently active traffic paths.

Fibre Channel offers dedicated high-throughput storage, with excellent performance, reliability, and redundancy. However, all of that comes at a steep price, once you consider that each server must have at least one, though typically two, HBA cards to connect it to the storage switches; each port is expensive at the source and destination, and that does not even take into account the overhead of running the fiber optic cables throughout the datacenter. Running the virtualization platform on a SAN has been the de-facto way of doing that for quite a number of years now, though that is quickly changing as newer technologies offer nearly all of the performance and features of Fibre Channel for a fraction of the price. That being said, if the highest possible performance is required, and price is no object, then Fibre Channel is the way to go, especially when you consider that you can purchase storage arrays with 256GB (yes, gigabytes) or more of cache.

iSCSI is a relatively new technology, the fourth on our list, offering most of the benefits and features of Fibre Channel, but at a reduced price point. Instead of using dedicated FC networking and cabling, iSCSI merely piggy-backs on top of your existing gigabit and 10-gigabit networking equipment. iSCSI works by making a point-to-point connection from the client (the Initiator in iSCSI parlance) to the storage array (the Target) over IP, which means that it can be switched and routed just like any other IP traffic. That being said, properly using iSCSI as the storage back-end for your virtualization project requires quality networking equipment and skilled network administrators who can properly configure and tune it to deliver the best bang for your buck. Rigorous quality of service [QoS] configurations are needed to deliver good performance; otherwise, heavy storage traffic can choke out data traffic, and vice versa. Another important feature of iSCSI is that you can connect either through a software initiator, suffering a small performance penalty for the overhead, or through dedicated Network Interface Cards [NICs] that offer iSCSI off-loading in order to do hardware acceleration for the protocol.

The final technology to consider is Fibre Channel over Ethernet [FCoE]. This technology is currently taking the storage world by storm because it promises performance potentially higher than regular Fibre Channel, but for a reduced per-port cost. It also boasts a number of other benefits, such as less cabling, less over-provisioning, and the ability to carry both Ethernet frames and Fibre Channel frames over the same cabling and network infrastructure. FCoE uses a special type of add-in card called a Converged Network Adapter [CNA], which transports both Fibre Channel storage traffic as well as regular Ethernet network traffic, at speeds of up to 10 Gigabits per second. Although specific FCoE-friendly network gear must be acquired, the costs of it are easily offset by the performance provided, as well as the built-in forklift upgrade to 10 Gigabit Ethernet, which ensures that it's an investment that will last for years to come.

Key considerations

Page 9: Virtualization in the Trenches With VMware

The initial key considerations for selecting any storage system are that it be highly available, redundant, and fault tolerant. It is unacceptable to have a storage-level fault grind hundreds or thousands of virtual machines to a halt. High-end storage systems support active-active availability, ensuring both load distribution as well as instantaneous failover should one of the storage nodes suffer a catastrophic failover; this ensures that no downtime is experienced by the systems accessing the shared storage. Having some kind of high availability in your storage platform is a must for any kind of production use.

The second key consideration is whether the storage platform supports a feature called deduplication. Deduplication is a fairly recent development that's rapidly growing in popularity, especially in virtualized environments, where it brings massive cost and storage usage savings in most typical scenarios. It works by analyzing the disk blocks making up the stored data, and by employing advanced algorithms to efficiently remove duplicated content at the block level, leading to space savings of up to 90% in some cases. The reason this works so well is because the majority of the guest OS data in a VM will be identical to the data in every other VM of that type. Not having to store the same file hundreds of times for every VM leads to huge space savings, and it also yields a performance boost due to array-level caching.

Page 10: Virtualization in the Trenches With VMware

The third key consideration in selecting storage hardware is the price/performance ratio. After satisfying the first two points, you must ensure that your solution meets your basic Input/Output Per Second [IOPS] requirements, as well data transfer throughput and latency needs. These needs are easy to meet in most cases, but it is a good idea to calculate the general IOPS throughput of the disk spindles and shelves a system offers; you can then calculate/estimate the IOPS needs of the whole virtual environment, multiply the latter by an easy 4-5x for real-world values once you consider the RAID write penalties, and finally compare the resulting estimate of your needs to your estimate of the hardwares IOPS throughput. An important thing to keep in mind is that even though the price of enterprise-grade Solid State Drives [SSDs] is rather steep, the performance of your average SSD is many, many times that of a regular rotational hard disk when looking at IOPS. (~5,000 to ~20,000 for an SSD, vs. about 175 for a 15k RPM SAS disk.)

Networking gear, servers, and processors

Realistically, most enterprises already have quite a bit of networking gear that's happily humming along, not offering them much reason to go on an upgrading spree. It is here that virtualization can cause headaches. Virtualization because it really benefits from 10GigE, and if you stick with just GigE, you're looking at quite a few ports to allocate. If you choose to head the direction of blades, you're looking at an easy 8 to 16 ports for uplinks from each enclosure, and that's only on the active path—allocate an identical amount for redundancy. Then include another two to four ports for management purposes. If you go the route of individual rackmount servers, then you're looking at something like the following:

Page 11: Virtualization in the Trenches With VMware

• Two to four ports for the production uplink bond, VM-side

• Two to four ports for the internal uplink bond, VM-side

• Two to four ports for the management uplink bond, VM-side

• Two to four ports for the virtualization platform's management and services

Depending on how many dozens of host servers there are, entire switches will be quickly saturated. You'll either have to plug straight into large multi-blade chassis-based switches in order to simplify management and get good port density, or are you're looking at a potential management mess of sprawling smaller 1U and 2U switches. With that in mind, the migration to 10GigE starts to make sense, even if only because you can now handle an entire server or blade enclosure's uplinks with just two to six or so ports. If you go the route of FCoE to combine your storage and networking, then 10GigE is a given, and the price premium becomes a great investment over time. Without migrating to 10GigE, blades make the most sense in terms of port usage; however, as we'll see next, blades have their own tradeoffs when it comes to power and heat.

When deciding on server hardware, you'll first have to choose whether to go with blade servers or regular rackmount servers. It is safe to say that blades now equal rackmount servers in computing power and memory, especially since you can get four-socket blades with eight to 12 core CPUs, and up to 1TB (yes, terabyte) of RAM per blade, just like a nice big 4U rackmount. Aside from a price penalty, the only thing you lose out on with blades is built-in storage and expansion slots, which are really not that important when it comes to virtualization.

The real tradeoff with blades comes in the areas of power usage and heat dissipation, because of blades' very high density per rack unit. You'll typically need three-phase power for blades, and you'll only being able to fit about two, maybe three, enclosures per cabinet, simply because they produce so much heat. What you gain in savings with respect to physical space and number of uplink ports, you lose when it comes to increased cooling and bigger power circuits per cabinet.

Another choice, though less important the last couple of years, is processor architecture; specifically, whether to go with Intel or AMD-based CPUs. This used to be a very important choice due to sizeable price/performance disparities between the two, but the gaps have now closed substantially, especially since the industry moved to many-core CPUs. The choice now involves getting better understanding of your virtualization workload, and whether it'll benefit from slightly fewer physical cores per die but each core having Hyper Threading (as is the case with Intel Nehalem-based Xeon CPUs), or whether it'll benefit from a higher physical core count available with AMD-based Opteron CPUs. Provided that you're selecting the latest generation of either brand, you're looking at an amazing price/performance ratio, reduced power dissipation, and also a host of virtualization-assisting instructions and CPU features, which all help offload virtualization tasks into dedicated hardware, thereby bringing the overhead performance penalty easily down to the 2-3% mark, if not less. That is the key reason to avoid older generations of processors—older CPUs lack progressively more virtualization-specific instructions and capabilities, forcing the virtualization layer to be done in software, with an increasing performance penalty.

The final important consideration is memory configuration. With the latest AMD server processors supporting DDR3 RAM just like their Intel equivalents, the decision is no longer about choosing memory latency versus throughput as it was for quite a few years. Now you must consider the number of memory channels per CPU, and the number of DIMM slots per channel, and most importantly whether there is a memory down-clocking if you overpopulate DIMM slots trying to get the most memory possible into a

Page 12: Virtualization in the Trenches With VMware

system. For example, some Xeon-based servers ship with four DIMM slots per memory channel instead of the typical three; if you populate only the first three, you get the full memory speed, but if you populate the fourth for added capacity, the next slower memory clocking is used.

Part three in this series will cover various networking-related issues that arise when you deploy and integrate a virtualization platform into an already-existing heterogeneous networking environment.

• News

• Guides

• Reviews

• Raising Your Tech IQ

• Upgrade to a Premier Subscription

• Customize ▾

• OpenForum

• Login/Join

Uptime

Virtualization in the trenches with VMware, Part 3: Networking in the enterpriseBy Sam Webb | Last updated 14 days ago

Some of the biggest challenges faced in large enterprise environments are bandwidth usage, the allocation and use of IP addresses, and security. In this installment of our series on virtualization, we'll look at how virtualization intersects with each of these three issues in turn.The bandwidth problem is simple: consolidation means increasing bandwidth requirements beyond

Page 13: Virtualization in the Trenches With VMware

what a single or even several GigE links can provide. On the IP address sprawl issue, the sheer size of most computing environments means that many addresses are needed, and the acquisition of other businesses creates either address overlap or address space waste. A small network can easily get away with two or three small address ranges, and that's being generous. Managing a network spanning the globe, with tens of thousands of addresses in use in various subnets is no small feat. Finally, there are varying security concerns, from the very real concerns about segregating traffic in multi-tenant environments to handling local host-level authentication and password management. With that in mind, it can be a real challenge attempting to deploy and integrate a virtualization environment into an already well-developed network. In this part of a five-part series we continue to look at the challenges and some of the steps of deploying a virtualization solution in the enterprise.

Part I: Introducing virtualization to the enterprise

Part II: Storage, networking, and blades: virtualization hardware choices

Most likely the first challenge posed by virtualization is increased bandwidth usage due to the consolidation on a physical server with a limited set of uplinks. This issue is even further exacerbated by the even higher density in the case of blades. It is common to see a blade enclosure using an active, eight-port bond to provide upstream bandwidth to all of the VMs running on it, and either come close to or actually saturate the 8Gbps of bandwidth available. Likewise, it is common to see a four-port, 4Gb or 8Gb Fibre Channel (FC) uplink module fully utilized, as well, on heavier workloads. This rapidly creates a need to migrate to 10GigE, and potentially to Converged Network Adapters (CNAs), such that both Ethernet-based traffic and FC-based traffic can be carried through the same ports, in the form of Fibre Channel over Ethernet (FCoE). When your virtualization install starts to saturate your network, you'll have to invest in some new networking infrastructure. The first option is to buy dedicated 10GigE equipment so that all of the VM hosts or blade enclosures are able to get ample bandwidth amongst themselves. The second option, in the case of FCoE, involves acquiring either FCoE-aware equipment that also has dedicated FC ports to act as a bridge to your legacy FC network, or also investing in an FCoE-aware storage array system so that you can have an end-to-end converged network platform, running on new equipment at a nice, speedy minimum of 10Gbps. Aside from the obvious cost versus the raw speed gain tradeoffs, there are several additional gains to be considered from moving to CNAs or some 10GbE NICs:

• CNAs are, by design, virtualizable themselves, in the sense that a single physical CNA can be made to show up as, for example, 4x 2.5Gbps NICs; some can do even more partitions, depending on the underlying technology.

• Some CNAs support overprovisioning: for example, you can allocate 6Gbps to Ethernet traffic and 6Gbps to FC traffic, and when one is idle the other can consume additional bandwidth up to its specified limit.

• Most CNAs possess hardware offloading for additional protocols, such as iSCSI.

• CNAs can greatly reduce cabling needs, and they also let you use copper-based cabling instead of fiber optic cabling.

• Additional virtualization-friendly and assistive capabilities, much like the latest generations of CPUs, enabling greater offloading from the hypervisor down to the physical NIC.

While these benefits are easy to see on paper and in case studies, actually making the business case is trickier. There are tight budgets to consider, network infrastructure upgrade cycles, new equipment and technologies requiring staff training in order to provide configuration and support from the operations level up to the engineer level, and sometimes more tangible challenges like empty port availability, empty rack space for new equipment, and even free cable conduit room to run a plethora of new patch cables. Sometimes the networking department is looking forward to moving up to 10GigE and to acquiring the latest and greatest in networking. Other times the networking department is in such a state that anything not essential is still being assigned a 100Mbps port and it takes a day to find an empty patch panel to

Page 14: Virtualization in the Trenches With VMware

have the cable run implemented. These scenarios ring true regardless of whether it's a small or large business, or even whether it's an IT business, such as a managed hosting company.Host and guest challengesYou might think that actual guest OS IP usage would be a challenge, but it's really just equivalent to just deploying regular servers. No, the real challenge that arises from all of those new VMs is actually posed by the additional management network segments and address blocks needed by both VMware and by the actual hardware platform underneath.Backtracking a little, a typical network setup for the server segment in an enterprise would be to have:

A public network segment. It doesn't need to be world-routable IP addresses, just a segment that can access the outside world.

• Sometimes a private network segment for inter-server communication on a dedicated interface.

• A backup/management port and network segment that will carry both backup traffic, such as from NetBackup, and management traffic like SSH, SNMP, VNC, and so forth.

• Any other special segments, either for special needs or connections to external businesses and services.

So far, so good. However, the challenge comes from the separation between the host and guest platforms. You really don't want your guest VMs to be able to access any of the networks that VMware is using for its purposes and management.Looking at a typical VMware deployment, you will end up with needing a set of network segments much like the list below:

• One network segment dedicated to the vCenter Server (the central management brain) and the Service Console of each ESX/ESXi host. You'll also run any special services throughout his port, such as host-based backups, monitoring software, etc. This segment is ideally pretty large, at least a /24, because of server sprawl.

• For good redundancy, you'll want a second, separate, network segment dedicated to a backup Service Console on each ESX/ESXi host, as well as an additional vCenter Server interface. Again, identical to the above in scope.

• A client-facing management port on the vCenter Server(s) that your management clients connect to. You can typically plot this into an already in-use address range.

• If you've planned for scalability down the road, a separate network segment CAN talk to the dedicated database(s) for vCenter Server. Thankfully, this can be completely private, but then you need to really be mindful of your switching environment to avoid conflicts and collisions.

• Other dedicated networks for heartbeat and HA services for vCenter Server and SQL Server, providing that you've planned ahead for scalability and redundancy, so that you are not left with address allocation issues some months or years down the road.

If you decide to go with blade servers, then you'll end up in one of two camps. If you're in the first camp, then you end up needing just two or three IP addresses for the hardware management and administration module pair. An example of this would be if you have IBM blades. Or, you could end up needing:

• A set of IP addresses for the hardware management module pair.

• A set of IP addresses for the various network modules for management.

• An IP address for the iLo (remote console management) port of every single blade.

You'd see a scenario like this if you used HP blades and networking. Neither is right nor wrong—they're merely different design decisions, leading to vastly different networking requirements. Once you consider that you'll easily need half a dozen or more network segments, it starts to become clear that good IP address management, as well as subnetting, firewalling, and so forth, are critical in order to be able to

Page 15: Virtualization in the Trenches With VMware

deploy and integrate something as sizeable as a virtualization platform.Staying secureThe final networking-related issue to be mindful of is a pretty complicated and interconnected one: security. In the single OS instance per single physical server world, network security is easier to maintain: simply assign the access-layer switch ports into respective VLANs (think separate and segregated network segments being carried on the same physical wires and through the same physical switch ports) and trunk those into the nearest firewall to do the rest. Anything trying to fool that process would have to get past the access switch, as well as the firewall, in order to access any system not in the same assigned network segment. Once you factor in virtualization, and the fact that you're looking at dozens to hundreds of VMs per physical server, you're guaranteed that there will be some VMs that are not supposed to talk to other VMs on that host. In a multitenant virtualization environment, such as that of a cloud provider or managed hosting provider, you're guaranteed to have VMs that aren't supposed to talk to one another. This means that security is now a top priority, and ensuring proper traffic segregation and firewalling is an absolute must. Thankfully, the same mechanism employed with stand-alone servers works with virtualization, with one change: VLAN tags are now applied by the hypervisor to all of the VM traffic from the network demarcation point of each VM, namely their NIC(s).Virtualized switches and host-level firewallsThe demarcation point move described above was once a challenge for both the networking administrators and the virtualization administrators: the work of assigning ports into VLANs was now moved to the server side instead of staying on the network side. So you were previously left with either having the virtualization admins learn that aspect of networking, or having the networking admins learn enough about the virtualization platform to do the required work on an ongoing daily basis. (The same thing actually used to happen with management of storage systems, as well, because the host servers were just carved out whole disk array slices and the virtualization administrators had to manage the storage pools themselves. Being in charge of the virtualization platform meant having a good understanding not just of the hypervisor, and various heterogeneous guest OSes, but also learning and performing network and storage administration.)However, in recent times the landscape has evolved back to its more natural, compartmentalized state, with the server admins looking after the servers, the network admins looking after the networks, the storage admins looking after the storage, and so forth. Greatly helping with this process have been two advances, both on the networking side. First, Cisco Systems introduced a full-featured virtualized switch (Cisco Nexus 1000V) running embedded inside the hypervisor, thereby allowing the network administrators to once again manage every network and security aspect themselves, right from the individual VM ports and upstream. Second, further developments also enabled a full firewall to be deployed at the hypervisor level, in tandem with the Nexus 1000V. Concordantly, several vendors introduced their own dedicated firewalling appliances, but as VMs that would sit in front and filter all of the traffic as needed.AuthenticationA final point to touch on the networking side is authentication at the host level. While it is common practice to either create a dedicated Active Directory [AD] database and dedicated user accounts for vCenter Server centralized management and authentication, or to use the pre-existing corporate AD infrastructure and simply create a new group that has access to the virtual infrastructure, it is a far rarer practice for the actual VM host servers. This creates a very typical scenario where the actual ESX hosts are managed at the CLI level by either logging in directly as root, or as your own user account and then su-ing to root. None of the switches and firewalls in front of the VMs in the world are going to do everything if the host server is accessible through a (typically) flat management network segment, directly as root, with no central authentication employed. This means that is very easy to overlook securing that access point. Performing regular password updates and user account maintenance becomes a headache as the virtualization environment grows, and often gets forgotten about. It's shockingly common to find actual active user accounts of previous employees left on the Service Consoles of various ESX servers,

Page 16: Virtualization in the Trenches With VMware

even months or years after they are no longer with the company.

Virtualization in the trenches with VMware, Part 4: Performance tuning your virtualized enterprise setupBy Sam Webb | Last updated 5 days ago

In part one of this series, we looked at selecting an enterprise virtualization platform, and at some of the benefits gained. In part two we looked at some of the challenges involved in selecting hardware to run the platform on, and we also discussed storage, networking, and servers/blades. Part three took a closer look at networking issues, and in this current installment, we'll give some practical, nuts-and-bolts advice for how to tune your VMware enterprise setup.Normally, this would be the part in the series where we'd go through a painstaking, step-by-step explanation of how to install the virtualization platform of choice, complete with screenshots and other aids. However, we'll skip most of that, for two reasons. First, this series is focused on VMware, and VMware provides many thousands of pages of documentation on how to do installs. Second, actual real-life use cases tend to be more relevant than simple tables and guidelines in a large PDF. The latter is especially true because in large IT environments you've got to deal with legacy issues, and with the problem of multiple people concurrently working on or supporting a platform. But once you have a virtualization environment in use, your next task is to address scalability, because platform scalability ends up being very important once virtualization catches on and VM sprawl begins. First, we're going to talk about the heart of a VMware vSphere-based virtualization infrastructure, which is vCenter Server.

Part I: Basics and benefits

Part II: Storage, networking, and blades

Page 17: Virtualization in the Trenches With VMware

Part III: Networking in the enterprise

Our general advice goes like this: take a look at the resources that the vCenter server has, and give it 50 percent to 100 percent more, depending on what can be spared. If it is possible to assign 4 vCPUs and 8GB of RAM to the server, or to physically install those resources in an actual stand-alone server, then do so in order to ensure that future growth won't be artificially limited by a lack of vCenter resources. This is especially key if you end up trying to integrate additional products later, or if you decide to start automating things via scripting, such as through the VMware PowerCLI or the Perl API. Likewise, consider giving the SQL Server instance host additional resources, especially storage, so that for the first month or two of the deployment process you are able to set the logging to the highest level for all VMs and servers, to aid in troubleshooting. This is important, especially when you realize later on that your database is growing by up to 100GB a month due to the extra metrics now being logged. Along the same vein, if there is a realistic chance of the environment growing to span thousands of VMs, then we recommend ensuring high-availability for both the vCenter Server and the SQL Server instances, typically through clustering. Thanks to vMotion and Storage vMotion you are not so limited by the initial physical hardware resources available on either the server or storage side, yet limits to vCenter are tricky to fix, because, for example, you cannot hot-add resources to it like you can with another running VM. Suffice it to say, vCenter is the heart of the infrastructure at this point.As a side note, it is important to be mindful of the additional licensing costs from enabling clustering and HA at the application level on both the vCenter and SQL Server sides. That is, beyond the additional SQL Server license and additional Windows Server licenses, only the Enterprise version of Windows Server 2003 and 2008 support Microsoft Clustering Services [MSCS], which drives up the software cost of the equation. That being said, however, when you're looking at an environment containing thousands of guests, that cost becomes negligible compared to the availability and resiliency benefits gained by employing clustering for high availability.Practical tuningFrom now on, we're going to assume that you have a functional install deployed and in use, perhaps even in production. This is where both careful performance monitoring, and subsequent tuning, come in. The additional vCenter logging details mentioned above can really help here, because they will provide useful trending data over any period of time. If the vCenter performance counters do not provide sufficient detail, then the console-based esxtop tool is available to capture thousands of metrics every few seconds for as long as necessary. Reading the main page for esxtop is a must, as it provides far more details than the regular Unix top program. If the performance problems encountered are either sporadic or are happening overnight, when live monitoring is not really feasible, it is possible to run esxtop in a batched mode. When run in batch mode, esxtop collects metrics to be analyzed later by either the esxplot utility available from VMware, or through the Windows Performance Monitor, perfmon. In order to collect some metrics overnight, you would run something like the following on the Service Console:# nohup esxtop -b -a -d # -n ### | gzip -9c > esxtop.csv.gzHere, -d is the frequency: lowest is 2 seconds, with 5 seconds being a good value to ensure that you are not spiking the Service Console's CPU too much. Next, -n is the optional parameter to specify how many times to collect metrics. If you omit it, esxtop will collect metrics until its process is forcibly killed. For example, if you wanted to collect exactly 24 hours worth of metrics at a 5 second interval, specifying "-d 5 -n 17280" would get you exactly that. Piping the output into gzip with maximum compression is highly recommended, as the uncompressed CSV file will easily be one to two GB in 24 hours. Between the vCenter Performance metrics and esxtop, you are set to get a very detailed, graphable view of both your VMs and the host servers. Now we will look at the four major categories where performance troubleshooting and tuning come into play, namely CPU, memory, network, and storage, with storage being the most common culprit, often due to misconfiguration.

Page 18: Virtualization in the Trenches With VMware

The first, and probably easiest to troubleshoot are CPU-related issues. When you look at CPU usage metrics, it is usually quite obvious when there are temporary spikes or periods of sustained high load. Setting a lower limit, or reservation, for a CPU can be useful, but is rarely necessary, and can cause issues later on as insufficient resources are available to power on additional VMs. Setting an upper limit, on the other hand, can be far more useful, if you know your application(s) well, as it enables you to present a slower CPU to the VM, such as a 1.5GHz CPU instead of a 3GHz CPU. With the advent of vSphere, any number of vCPUs between one and eight are now allowed, including odd numbers, like three or seven. Unless your workload's performance scales linearly with the addition CPU cores, it's a bad idea to overprovision on the CPU side because there is an additional memory overhead for each vCPU that is assigned to a VM. With hot-adding CPUs now supported by a number of guest OSes, it is best to start at 1 vCPU per VM and simply monitor performance to figure out where to head from there.The second, and much trickier issue to troubleshoot, is memory. A memory-starved VM is easy to spot, because the guest will begin swapping heavily and its performance will plummet. A VM with too much memory is likewise easy to spot, because the vSphere Client will show you a lot of unused memory in the Performance tab. The tricky part comes in when there is a memory limit assigned to a VM, and the ballooning driver has just kicked in, forcing the guest OS to swap. The following PowerCLI script will generate a report containing info about the assigned memory, memory limit, ballooned memory, and swapped memory of each VM that that particular vCenter Server has access to. Because of the Where {} block, it will only display VMs that have either the ballooning driver engaged, or have been swapped by ESX:Get-VM | Sort Name | % { $vm = $_ | Get-View; $vm | Where { $vm.Summary.QuickStats.BalloonedMemory -gt "0" ` -OR $vm.Summary.QuickStats.SwappedMemory -gt "0" }| select ` @{ Name = "VM Name"; Expression = { $vm.Name }},` @{ Name = "Ballooned Memory"; ` Expression = { $vm.Summary.QuickStats.BalloonedMemory }},` @{ Name = "Swapped Memory"; ` Expression = { $vm.Summary.QuickStats.SwappedMemory }}} | Format-Table -AutoSize

Page 19: Virtualization in the Trenches With VMware

Moving on to the networking aspect, there are two main areas where things can go awry: from the ESX server to the upstream switches, and internally on the ESX server to and between VMs. As far as things going wrong on the route to the upstream switches is concerned, checking the following list of things will give you a good start:

• Uplink bandwidth saturation

• Uplink bonding problems (you are using bonding on your upstream links, right?)

• Uplink VLAN configuration issues

• Any regular Layer 3 issues, especially if there are firewalls involved

On the VM side of things, the same list above applies, with the addition of checking that the right port groups are assigned, and that they have active uplinks. Further, the standard fare applies, like checking that ARPs are visible, attempting to do packet dumps to see who is trying to talk to whom, collecting metrics on both the guest and from the Service Console, and so forth. It is also quite important to upgrade the guests' NIC(s) to the VMXNET3 driver, to enable speeds up to 10GbE, as well as many virtualization-centric performance upgrades.Finally, the storage end of things tends to be where the most problems live, but these issues are often ignored and the blame is simply assigned to vSphere as being the slow culprit. The important thing to keep in mind here is that there is no single answer that holds true for every environment, or, really, even more than one environment when you get right down to it. Every environment is unique, running a unique collection of VMs, with a unique overall collection and combination of software running on those VMs. The first thing to look for when troubleshooting storage problems is link or path saturation in the connection between the hosts and the storage infrastructure. This is straightforward to observe, and VMware will happily provide metrics and statistics for each HBA card or network link in use, in the case of

Page 20: Virtualization in the Trenches With VMware

iSCSI or NFS. The next most visible culprit is IO latency to the storage system. This can be easily checked through the use of esxtop, by pressing either "d" to switch to disk/HBA view or "u" to switch to LUN view, then pressing "s" and "2" in order to change the output frequency to refresh every 2 seconds. The field you are looking for is DAVG/cmd, which is the average response time in ms, per command sent to the device in question. Generally speaking, a latency of 15-30 ms is pretty high and starts causing performance problems, and anything higher than 30 ms is causing serious performance issues for all VMs stored there. A lesser known cause of latency between the guest OS and the storage platform is actually filesystem fragmentation, especially on NTFS-based filesystems, because higher fragmentation means more IO needed for every single read and write. Not only is the VM hosting the guest with a highly fragmented drive suffering from a performance penalty, but so is any other VM whose disks reside on the same volume, because IO tends to be a limited resource.As a very important side note, Solid State Drives [SSDs] are truly game-changing for any market that they are available in, especially enterprise storage. As mentioned before, this is due to the incredibly high IOPS values (about 6,000 vs. 175), and also due to the greatly improved access time, which has gone from being about 100,000 times slower than main memory to just about 100 times slower. With some careful planning, it is possible to place all of the VM swap files on SSD storage, thereby allowing huge memory consolidation ratios because the main penalties of access speed and IOPS are now acceptable.In the final part of this series, we will take a look at and discuss some of the ramifications and caveats of trying to do a large-scale Physical-to-Virtual conversion.Virtualization in the trenches with VMware, Part 5: Physical-to-virtual conversion in the enterprise

By Sam Webb | Last updated a day ago

In part one of this series, we looked at selecting an enterprise virtualization platform, and at some of the benefits gained. In part two we looked at some of the challenges involved in selecting hardware to run the platform on, and we also discussed storage, networking, and servers/blades. Part three took a closer look

Page 21: Virtualization in the Trenches With VMware

at networking issues, and in Part 4 we gave some practical, nuts-and-bolts advice for how to tune your VMware enterprise setup. In this final installment, we look at the issue of physical-to-virtual conversion, and give tips on best practices.

The biggest challenge facing a physical-to-virtual (P2V) migration in an enterprise setting is not actually technical—though there is a technical challenge, as well. Rather, the actual challenge is the timing, paperwork, and ample red tape to you'll have to face on a system-by-system basis, as well as a general cultural clash against the status quo.

In any sufficiently large environment, there are multiple tiers of service, ranging from mission-critical to development to lab systems, each with different uptime expectations, and different levels of expendability. So it's a good idea is to begin at the bottom, with the least important systems, and work your way up. The benefit of this approach is that any kind of early failure in the process won't be cared about too much, and because you'll have more experience with the P2V process by the time you migrate the mission-critical systems. But before we get to talking about various P2V migration strategies, we have to look at a very large reality in most enterprise environments: legacy systems, and red tape.

Legacy and bureaucracy

Unless there is a very pressing and necessary regulatory or business need, there will always be plenty of legacy systems running not just on out-of-date and unsupported hardware, but also that are old and unsupported operating systems. It's shockingly common to find systems as old as Windows 2000 Server or Red Hat Linux 7.3 in use in as late as 2010; the same goes for old and unsupported middleware and databases and most other applications. About the only thing you can expect to see regularly and consistently patched are the front-end, web-facing systems, as well as the software running on them, because a firewall is simply not enough to protect them when there are remotely exploitable software bugs.

With that in mind, taking a long hard look at rebuilding those systems from scratch in the new virtualized environment starts to sound like a great idea, even if it involves extra time and resources. The gains from having a new, up-to-date OS, especially for the most neglected systems (infrastructure), are well worth the time invested in the upgrade. Thankfully, virtualization aids this process greatly, through being able to use system image templates and cloning. Once you've created your ideal, patched OS install, you can simply duplicate it as many times as necessary, and just update the hostname/networking settings/etc. for every new system deployed from it. Choosing this road instead of simply importing the old systems into virtual containers will provide many benefits along the way, including stability, security, potential software vendor support, and so on.

As for red tape, it manifests in many different ways, ranging from more sensible ITIL -based Change Management implementations that ensure sanity checks are performed before any work is conducted on a production system, to less reasonable requirements, such as having to spend 30 minutes filling out

Page 22: Virtualization in the Trenches With VMware

excruciatingly detailed forms in order to simply reboot a server (much less actually performing a change on it), and then needing the approval of multiple department heads, before finally proceeding two weeks down the road.

In situations such as the latter, trying to arrange for the downtime for dozens to hundreds, if not more, systems can be an insurmountable job, unless there is a task force dedicated to handling only process. Thankfully, that is where Project Managers and the like come in, and any P2V attempt that touches on multiple systems should have a Project Manager taking care of ensuring that all of the paperwork and resource scheduling is properly taken care of.

Start at the bottom

As mentioned before, it is best to start the P2V process with test systems, in order to iron out any issues with both environments, and to provide familiarity with the software used. In order to P2V a system, you have three basic choices: first, if it is a Windows-based system, you can have the option to do a "hot clone" of the system, which means that a software driver is installed into the running OS, and through the use of the Windows VSS driver, snapshots are taken of all of the data on the system's disks. Once the snapshots are taken, you can simply power on the virtualized instance for a running, one-to-one copy. This creates the least amount of downtime, and you have an identical duplicate, except that the proper drivers for the virtualized environment have been inserted into the guest OS.

The second option more commonly available is to do a "cold clone" of the system, which means that the system is powered down, then booted with a special CD that contains software provided either by the virtualization platform, or by a 3rd party specializing in P2V cloning; this software creates and imports a system image for you. Using specialized software like this also opens up greater system compatibility, because it can be used for various Linux distributions, including Red Hat Enterprise Linux, SuSE Enterprise Linux, Ubuntu, and so forth. Finally, the last option, more commonly employed than you'd think, is using either straight disk imaging software or even the venerable Unix "dd" command, to create a direct one-to-one disc sector mapping and restore that onto the virtual disk(s).

The first two options are obviously the easiest, because they handle the job of inserting new drivers into the existing OS images before booting them up for the first time as VMs. Providing that this goes well, the total effort for a successful P2V is greatly reduced, and you're off to a great start. In practice, especially in a heavy non-Windows environment, the third option becomes increasingly necessary, or is the only game in town. This leads to a process outline something like the following, for most Unix/Linux-based OSes:

1. Boot up the target system with a rescue CD.

2. Create a system image and store it either on a network share or a portable drive.

3. If necessary, transfer the system image off of a portable drive onto a networked server.

4. Boot up the "receiving" VM with a rescue CD.

Page 23: Virtualization in the Trenches With VMware

5. Restore the system image into a running VM.

6. "chroot" into the restored image.

7. Re-install the boot loader because the disk size will now be slightly off.

8. Update key configuration files, (networking can depend on MAC addresses).

9. If necessary, acquire new drivers, rebuild initrd disk, and so forth.

10. Turn of unnecessary system services, such as printer drivers, bluetooth, etc.

11. Turn off key applications from running on startup, such as databases.

12. Reboot VM without the rescue CD image connected and cross fingers.

13. Troubleshoot as necessary until the base OS is able to boot and get to a login prompt.

14. Install the virtualization platform's guest additions, for better performance.

15. Re-enabled previously disabled key applications.

16. Final reboot into what will be the new virtualized instance

17. Once everything is verified to be working fine, create backup or clone of the VM

While not ideal, this process will be able to import nearly all OSes that are in some way supported by the virtualization platform. This is especially true for Linux-based OSes, as well as for any OS other than the Windows family.

Resource allocation

It is important to touch on the topic of what hardware resources are allocated during this process, because new hardware (2009 and onward) tends to be quite overpowered for a lot of simple server uses, which means that you should not do a one-to-one resource mapping for the target VM container. For example, dual-core is typically the smallest configuration available for a server now, which can be overkill already depending on the target use.

With that in mind, there is always a resource penalty when assigning multiple vCPUs to a virtual machine because of incurred overhead. Therefore, it is a good idea to revisit the number of vCPUs and RAM needed by the OS and application(s) that are about to be running in the new VM, to ensure that resources aren't wasted. This process will also help drive up the consolidation ratio per host, with some virtualization software being able to handle as many as 25 running VMs per CPU, with up to a couple hundred running VMs per single physical host. If, however, you provision your new VMs, with, say, 4 vCPUs and copious amounts of RAM, you will be left with not just a much, much smaller consolidation ratio, but also inefficient resource utilization leading to performance problems across the board. Not that there is a very specific issue with paring down the hardware for an existing Windows-based install, which is outlined below:

Some P2V caveats

Speaking of caveats about embarking on a P2V project, there a number of them. While not an exhaustive

Page 24: Virtualization in the Trenches With VMware

list, this is a good start to keep in mind:

• It is extremely difficult to convert a multi-CPU (SMP) Windows-based VM to be single processor based [UP], because of the way the Windows HAL driver works.

• The boot loader will typically need to be re-installed on Unix/Linux-based systems.

• Some default OS networking configurations, such as in Red Hat Enterprise Linux, are dependent on each NIC's MAC address and will fail to work if the MAC changes.

• Depending on the difficulty of the P2V conversion, it can take up to eight hours or more, per system, of downtime to complete a transition, which can be simply unacceptable.

• The clock requirements of software running inside the VM may be too strict to properly handle the jitter and time skew that happen because of host load spikes.

• There is still software that is not be supported by its makers if it is running in a virtualized environment, which leads to support contracts being broken by a P2V move.

• Some configurations, especially those involving clustering and the use of a Quorum disk, can be very difficult to P2V properly, due to the additional shared hardware that needs to be available between the different VMs.

• Some software licensing is tied down to a unique hardware ID, be it a MAC address, or some other kind of burned-in information, which is typically not portable into the VM.

• Some software licensing requires the use of a USB dongle, which is not available inside of a VM.

• Some software can take advantage of, or even require, many more CPUs than current virtualization affords.

• Software requiring 3D hardware acceleration won't work on server-grade hypervisors, though desktop virtualization solutions will support both DirectX and OpenGL 3D acceleration.

• Any type of less common hardware add-in card or port, such as FireWire, E1/T1, ATM, etc, will not work because neither the host nor the guest OS will support them

The benefits of performing a large-scale P2V conversion are pretty clear: server consolidation leading to reduced space and power usage and decreased heat output; easier centralized management; new redundancy and high-availability options (including the ability to restore or clone entire VMs at a time); and the opportunity to rebuild the OS of legacy systems, to just name a few. It is very possible to consolidate even a dozen racks' worth of servers into a single rack, or just two blade chassises. With gains like that, it is no wonder that virtualization is quickly becoming entrenched in the enterprise.