PerfMetricResults 08-16-2015 15h08m29s

Performance Report SummaryThis report provides a summary of the collected performance data.

Information ValueOldest sample time Aug 16 2015 2:12PMFinal sample time Aug 16 2015 3:07PMNumber of machines specified 1

10

Machines that could not be reached 0

Number of machines that had a 100% success rate for data samplingNumber of machines that had a partial success rate for data sampling

This report provides a summary of the collected performance data.

Performance Data Normalization and the Time Series Placement Algorithm

Introduction

95th Percentile of a Performance Metric

Performance Data Normalization

In MAP performance data is used and aggregated in the Server Consolidation and the Private Cloud Fast Track scenarios. The following sections give more details on how these changes work.

When you collect performance data with MAP, a variety of performance metrics are sampled every 5 minutes for the included machines. Consider the metric %CPU utilization for a hypothetical machine Guest1. The sequence of %CPU utilization samples taken from Guest1 over time might look like the following where each pair is the elapsed time expressed as Hours:Minutes:Seconds since data collection began followed by the %CPU utilization:

(00:00:00, 25.5), (00:05:00, 36.2), (00:10:00, 24.4), (00:15:00, 41.33), (00:20:00, 57.41), ..., (47:55:00, 29.6), (48:00:00, 33.7)

When you have a sequence of %CPU utilization samples over time like this, one natural question to ask is "What was the %CPU utilization of Guest1 over the entire time span?" This, correspondingly, begs the question of how you aggregate this sequence of numbers into a single number representing the %CPU utilization for the entire time span; this is where aggregates like average, max and 95th percentile come into the picture. Prior to MAP 6.0, the average or max aggregates were used when reporting performance metrics or when using performance metrics to place guests in the Server Consolidation scenario. However, for capacity planning exercises like the Server Consolidation scenario, a better aggregation method is to use a percentile aggregation with the 95th percentile being the typical choice. The 95th percentile of a sequence of %CPU utilization samples like the above is defined as the minimum sample S for which 95% of the samples in the sequence are less than or equal to S. Typically this will mean that 5% of the samples are greater than S. Why is this a good aggregation choice for capacity planning? If you plan enough capacity for the 95th percentile of a sequence of %CPU utilization samples over time, then this means 95% of the time you will have enough CPU capacity to service the observed load. Correspondingly, 5% of the time your systems may be over utilized, but this is a reasonable tradeoff between hardware costs and the fraction of time when responsiveness is degraded. A similar observation can be made for other resources such as disk IO, memory, and network, whose utilization varies over time.

For reasons that will become clear in the following section on the Time Series Placement Algorithm, we have to normalize the performance data collected from different machines so that sequences of performance metrics collected from different machines can be added together. Suppose we have two machines Guest1 and Guest2 with the following sequences of network utilization values in Mbps (megabits/sec):

Guest1: (00:00:00, 3.5), (00:05:00, 11.1), (00:10:00, 5.4), (00:15:00, ?.??), (00:20:00, 3.71), ..., (47:55:00, 1.19), (48:00:00, 15.0) Guest2: (00:00:15, ?.?), (00:05:35, ??.?), (00:10:47, 2.7), (00:16:03, 7.12), (00:21:13, 1.04), ..., (47:58:22, 1.19), (48:03:35, ??.?)

The two sequences are lined up such that the samples that are closest in time from Guest1 and Guest2 are stacked one right on top of the other. Notice, however, that the samples are not taken at exactly the same time. Due to all sorts of variables in the environment and the resource limits of the machine running MAP, performance metrics cannot be sampled from all machines simultaneously on a rigid 5 minute schedule. Moreover, notice that some of the samples are marked with question marks like "?.??" to indicate that these samples were unavailable because the target machine was offline, or MAP was not collecting performance data for that machine at the time. Clearly, the raw performance data has some rough edges. So what if we want to add these two sequences of network utilization metrics together to get the combined utilization of Guest1 and Guest2? This is where performance data normalization comes in. Without going into exhaustive detail on how the normalization works, here is the basic idea:

1. The total time span over which normalized data exists is taken to be the minimum time for which raw performance data exists for some machine up to the maximum time for which raw performance data exists for some machine (call this time span Tmin to Tmax). 2. Tmin to Tmax is chopped up into 10 minute intervals starting at Tmin to give the sequence of times Tmin, Tmin+10, Tmin+20, ..., Tmin+NNN. The NNN in the last value in the sequence is the minimum multiple of 10 such that Tmin+NNN is greater than or equal to Tmax. 3. For each of these 10 minute intervals, if 1 or more samples of a performance metric exists for a machine in the bounds of that 10 minute interval then those samples are aggregated into a single normalized sample for the 10 minute interval. If no samples exist for a machine in the 10 minute interval, then an aggregate of samples nearby in time are used generate a normalized sample to fill in the hole in the data. So after normalization the sequences of network utilization data for Guest1 and Guest2 might look like this:

Guest1: (Tmin, 9.05), (Tmin+10, 5.4), (Tmin+20, 4.1), (Tmin+30, 13.3), (Tmin+40, 7.50), ..., (Tmin+NNN-10, 11.7), (Tmin+NNN, 8.69) Guest2: (Tmin, 2.70), (Tmin+10, 6.3), (Tmin+20, 1.8), (Tmin+30, 11.2), (Tmin+40, 10.1), ..., (Tmin+NNN-10, 2.31), (Tmin+NNN, 1.19)

Since Guest1 and Guest2 now have normalized samples for the same set of times and there is a sample for each time, it becomes obvious how to add the two sequences of resource utilization values together; namely, you just add together each pair of numbers at the same normalized times.

Data Quality Considerations


Guest1: (00:00:00, 3.5), (00:05:00, 11.1), (00:10:00, 5.4), (00:15:00, ?.??), (00:20:00, 3.71), ..., (47:55:00, 1.19), (48:00:00, 15.0) Guest2: (00:00:15, ?.?), (00:05:35, ??.?), (00:10:47, 2.7), (00:16:03, 7.12), (00:21:13, 1.04), ..., (47:58:22, 1.19), (48:03:35, ??.?)

The two sequences are lined up such that the samples that are closest in time from Guest1 and Guest2 are stacked one right on top of the other. Notice, however, that the samples are not taken at exactly the same time. Due to all sorts of variables in the environment and the resource limits of the machine running MAP, performance metrics cannot be sampled from all machines simultaneously on a rigid 5 minute schedule. Moreover, notice that some of the samples are marked with question marks like "?.??" to indicate that these samples were unavailable because the target machine was offline, or MAP was not collecting performance data for that machine at the time. Clearly, the raw performance data has some rough edges. So what if we want to add these two sequences of network utilization metrics together to get the combined utilization of Guest1 and Guest2? This is where performance data normalization comes in. Without going into exhaustive detail on how the normalization works, here is the basic idea:

1. The total time span over which normalized data exists is taken to be the minimum time for which raw performance data exists for some machine up to the maximum time for which raw performance data exists for some machine (call this time span Tmin to Tmax). 2. Tmin to Tmax is chopped up into 10 minute intervals starting at Tmin to give the sequence of times Tmin, Tmin+10, Tmin+20, ..., Tmin+NNN. The NNN in the last value in the sequence is the minimum multiple of 10 such that Tmin+NNN is greater than or equal to Tmax. 3. For each of these 10 minute intervals, if 1 or more samples of a performance metric exists for a machine in the bounds of that 10 minute interval then those samples are aggregated into a single normalized sample for the 10 minute interval. If no samples exist for a machine in the 10 minute interval, then an aggregate of samples nearby in time are used generate a normalized sample to fill in the hole in the data. So after normalization the sequences of network utilization data for Guest1 and Guest2 might look like this:

Guest1: (Tmin, 9.05), (Tmin+10, 5.4), (Tmin+20, 4.1), (Tmin+30, 13.3), (Tmin+40, 7.50), ..., (Tmin+NNN-10, 11.7), (Tmin+NNN, 8.69) Guest2: (Tmin, 2.70), (Tmin+10, 6.3), (Tmin+20, 1.8), (Tmin+30, 11.2), (Tmin+40, 10.1), ..., (Tmin+NNN-10, 2.31), (Tmin+NNN, 1.19)


Given the explanations in the previous two sections on how the 95th percentile aggregate and performance data normalization works, a natural question to ask is how you know that the 95th percentile aggregates of the normalized performance data accurately represent the real world behavior of your systems; this is where data quality considerations come into play. If you think about the definition of the 95th percentile aggregate---the minimum sample S for which 95% of the samples in the sequence are less than or equal to S---, then it becomes clear that this aggregate is not very useful for small data sets. For example, what does it mean for 95% of 8 samples to be less than or equal to one of these 8 samples? MAP can still compute the 95th percentile aggregate in this case because it uses a deterministic algorithm to compute the value (the computed value will be or be close to the max value), but the statistically interesting properties of the 95th percentile only show up for much larger data sets. This means you should plan to collect performance data for at least 2 days before you can expect to get good values from the 95th percentile aggregate. That said, you should not hesitate to collect performance data for shorter periods of time when doing test runs or familiarizing yourself with the MAP tool. Another data quality issue to consider is what happens when MAP normalizes the performance data and fills in values for times at which machines are missing values by using aggregates of other samples nearby in time. Filling in these "holes" is necessary so that we can add the sequences of performance metrics from different machines together as described in the previous section, but if there is a large percentage of missing values (say more than 5%), then this may significantly distort the statistical properties of the normalized performance data compared to the real world behavior. What does this mean in terms of how you use MAP? Here are some rules of thumb:

1. When you collect performance data in MAP, you want to collect performance data for the same period of time for all machines. 2. After collecting the performance data, generate the Performance Metrics report and look at the CollectionStatistics worksheet. For good results, you should only use machines in the consolidation scenarios (Server Consolidation and Microsoft Private Cloud Fast Track) that have a high success percentage when collecting performance data.

One consequence of the first rule of thumb is that the results will not be as accurate if you use the functionality in the Performance Metrics wizard that allows you to append the collection of new performance data to existing data. The primary purpose behind this feature is to let you continue collecting performance data from a set of machines if it was interrupted unexpectedly for some reason (for example, the MAP machine collecting the performance data rebooted after applying an update). If, for example, you use this feature to collect performance data from one group of machines on Monday and Tuesday and then another group of machines on Thursday and Friday, then the normalized time period over which data was collected is Monday through Friday. This means that 3 of the 5 days will be missing data for all of the machines and the "holes" in the data that the normalization process fills in will be at least 60% of the normalized data. This obviously is not a desirable state of affairs, so using the append functionality of the Performance Metrics wizard in this fashion is not a recommended practice.

The Time Series Placement Algorithm

Infrastructures

Given the explanations in the previous two sections on how the 95th percentile aggregate and performance data normalization works, a natural question to ask is how you know that the 95th percentile aggregates of the normalized performance data accurately represent the real world behavior of your systems; this is where data quality considerations come into play. If you think about the definition of the 95th percentile aggregate---the minimum sample S for which 95% of the samples in the sequence are less than or equal to S---, then it becomes clear that this aggregate is not very useful for small data sets. For example, what does it mean for 95% of 8 samples to be less than or equal to one of these 8 samples? MAP can still compute the 95th percentile aggregate in this case because it uses a deterministic algorithm to compute the value (the computed value will be or be close to the max value), but the statistically interesting properties of the 95th percentile only show up for much larger data sets. This means you should plan to collect performance data for at least 2 days before you can expect to get good values from the 95th percentile aggregate. That said, you should not hesitate to collect performance data for shorter periods of time when doing test runs or familiarizing yourself with the MAP tool. Another data quality issue to consider is what happens when MAP normalizes the performance data and fills in values for times at which machines are missing values by using aggregates of other samples nearby in time. Filling in these "holes" is necessary so that we can add the sequences of performance metrics from different machines together as described in the previous section, but if there is a large percentage of missing values (say more than 5%), then this may significantly distort the statistical properties of the normalized performance data compared to the real world behavior. What does this mean in terms of how you use MAP? Here are some rules of thumb:

1. When you collect performance data in MAP, you want to collect performance data for the same period of time for all machines. 2. After collecting the performance data, generate the Performance Metrics report and look at the CollectionStatistics worksheet. For good results, you should only use machines in the consolidation scenarios (Server Consolidation and Microsoft Private Cloud Fast Track) that have a high success percentage when collecting performance data.

One consequence of the first rule of thumb is that the results will not be as accurate if you use the functionality in the Performance Metrics wizard that allows you to append the collection of new performance data to existing data. The primary purpose behind this feature is to let you continue collecting performance data from a set of machines if it was interrupted unexpectedly for some reason (for example, the MAP machine collecting the performance data rebooted after applying an update). If, for example, you use this feature to collect performance data from one group of machines on Monday and Tuesday and then another group of machines on Thursday and Friday, then the normalized time period over which data was collected is Monday through Friday. This means that 3 of the 5 days will be missing data for all of the machines and the "holes" in the data that the normalization process fills in will be at least 60% of the normalized data. This obviously is not a desirable state of affairs, so using the append functionality of the Performance Metrics wizard in this fashion is not a recommended practice.

In versions of MAP prior to 6.0, a single aggregate (max or average depending on the resource) was computed for the entire time span over which performance data was collected for a machine for each resource type (CPU, memory, etc.). These numbers were used to determine if there was room for the machine on a Hyper-V host in the Server Consolidation scenario. This approach is not ideal because it does not deal well with machines whose resource utilization is uneven over time. For example, if the machine Guest1 is used by people in North America and Guest2 is used by people in Asia, then these machines are likely to have inverted resource usage profiles over time and would fit well together on the same Hyper-V host. Analogously, if two machines served the same geographic region then their resource usage profiles over time would likely be similar and they may not be a good fit together on the same Hyper-V host. However, using an average metric over the entire time span for which performance data was collected misses these subtleties. In addition to serving different geographic regions, there are myriad other reasons that machines could have usage profiles over time that fit well together or not. These observations led to the method introduced in MAP 6.0 to determine if there is room for a machine on a Hyper-V host while taking time into account. The previous section described how MAP normalizes the raw performance data, and how this enables adding together the sequences of normalized resource utilization metrics for two or more machines over time. This ability to add sequences of normalized resource utilization metrics is at the heart of the Time Series Placement Algorithm introduced in MAP 6.0. While running the algorithm, suppose that MAP has determined that machines Guest1, Guest2, ..., Guest5 will fit on Hyper-V host Host1. How do we know if Guest6 will fit on Host1 as well? What the algorithm does is add together the sequences of normalized resource utilization metrics for Guest1, ..., Guest6 for each resource we care about (CPU, memory, etc.) and determines if Host1 has enough capacity in each resource dimension. What is enough capacity? Host1 has enough capacity if the 95th percentile aggregate of the sum of the normalized sequences for Guest1, ..., Guest6 is less than or equal to the total capacity of Host1 for that resource dimension. There are numerous other subtleties to the Time Series Placement Algorithm, not the least of which is determining which potential host Host1, ..., HostN provides the "best" fit for the next candidate machine GuestM. That said, the ability to sum up sequences of normalized resource utilization metrics and take the 95th percentile of the summed sequence is the fundamental insight behind how MAP takes time into account when making consolidation suggestions in the Server Consolidation and Microsoft Private Cloud Fast Track scenarios.

MAP 6.0 introduces the notion of an "infrastructure" used in the Microsoft Private Cloud Fast Track scenario, and optionally in the Server Consolidation scenario. Basically an infrastructure is the resource enclosure in which a group of Hyper-V hosts is provisioned; i.e., a server rack with associated disk (SAN) and network resources along with the Hyper-V hosts. This allows you to run consolidation scenarios that are more reflective of how an organization may be buying server resources; namely, in units of pre-provisioned server racks with everything necessary to run a "private cloud" rather than buying the individual components themselves with further assembly required. So how is the infrastructure level taken into account during the Time Series Placement Algorithm when determining if a candidate machine will fit on a Hyper-V host residing in a particular infrastructure? Basically MAP just does the same thing at the infrastructure level that it does at the host level: it sums up the normalized resource utilization sequences for all the guests in the infrastructure across all hosts and determines if the 95th percentile aggregate of this generated sequence exceeds the SAN or network capacity of the infrastructure.

What Do the Numbers in the Reports Represent?The previous sections provide the basic information concerning how the raw performance data is normalized and aggregated and how this data is used in the placement algorithm underlying the Server Consolidation and the Microsoft Private Cloud Fast Track scenarios. This, however, does not explain which numbers are showing up where in the various reports; this section provides these remaining details.

PerfMetricsResults-<date>.xlsx PlacementMetricsSummary - The numbers in this worksheet are derived from the normalized performance data. ProcessorUtilization, MemoryUtilization, NetworkUtilization, PhysicalDiskUtilization, LogicalDiskUtilization - The numbers in these worksheets are the averages of the raw performance data, not the normalized performance data.

ServerVirtRecommendation-<date>.xlsx ConsolidationRecommendations - The numbers in this worksheet are derived from the normalized performance data with the details for placed guests, hosts and infrastructures being as follows: Placed Guests - All metrics are the 95th percentile of the normalized performance data except for Disk Space Utilization which uses the maximum value of the normalized performance data. Finally, the CPU utilization values for the placed guests have also been rescaled to the CPU configuration of the host machine. Hosts - All metrics are the 95th percentile of the summed sequences of the normalized performance data of the guests placed on that host except for Disk Space Utilization which is just a sum of the maximum values for the guests. The utilization values for memory and CPU for the hosts also include reserves of 1 GB and 5% CPU for the host itself. Infrastructures - All metrics are the 95th percentile of the summed sequences of the normalized performance data of the guests placed on all the hosts in the infrastructure except for Disk Space Utilization which is just a sum of the maximum values for the guests. IMPORTANT NOTE - Because the 95th percentile of the sum of sequences of performance metrics from the placed guests is not the same as the sum of the 95th percentile of each of those sequences, adding up the guest utilization values on this worksheet will not give you the value of the host utilization except in the case of disk space usage which does not use the 95th percentile aggregate. A similar observation can be made about the utilization values for the infrastructures. UtilizationBeforeVirtualization - All metrics are the 95th percentile of the normalized performance data except for Disk Space Utilization which uses the maximum value of the normalized performance data.

Microsoft Private Cloud Fast Track Consolidation Report-<date>.xlsx ConsolidationOnBaseConfig - Analogous to the ConsolidationRecommendations worksheet in the ServerVirtRecommendation-<date>.xlsx workbook described above where the host configuration is the base configuration of the Microsoft Private Cloud Fast Track infrastructure selected in the Microsoft Private Cloud Fast Track Consolidation Wizard. Note that there are no memory and disk space overhead values specified for the placed guests in this wizard. ConsolidationOnMaximumConfig - Just like ConsolidationOnBaseConfig except that the infrastructure configuration is the maximum configuration of the infrastructure selected in the Microsoft Private Cloud Fast Track Consolidation Wizard. UtilizationBeforeVirtualization - All metrics are the 95th percentile of the normalized performance data except for Disk Space Utilization which uses the maximum value of the normalized performance data.

Performance Data Normalization and the Time Series Placement Algorithm

Introduction

95th Percentile of a Performance Metric

Performance Data Normalization

In MAP performance data is used and aggregated in the Server Consolidation and the Private Cloud Fast Track scenarios. The following sections give

When you collect performance data with MAP, a variety of performance metrics are sampled every 5 minutes for the included machines. Consider the metric %CPU utilization for a hypothetical machine Guest1. The sequence of %CPU utilization samples taken from Guest1 over time might look like the following where each pair is the elapsed time expressed as Hours:Minutes:Seconds since data collection began followed by the %CPU utilization:

(00:00:00, 25.5), (00:05:00, 36.2), (00:10:00, 24.4), (00:15:00, 41.33), (00:20:00, 57.41), ..., (47:55:00, 29.6), (48:00:00, 33.7)

When you have a sequence of %CPU utilization samples over time like this, one natural question to ask is "What was the %CPU utilization of Guest1 over the entire time span?" This, correspondingly, begs the question of how you aggregate this sequence of numbers into a single number representing the %CPU utilization for the entire time span; this is where aggregates like average, max and 95th percentile come into the picture.

Prior to MAP 6.0, the average or max aggregates were used when reporting performance metrics or when using performance metrics to place guests in the Server Consolidation scenario. However, for capacity planning exercises like the Server Consolidation scenario, a better aggregation method is to use a percentile aggregation with the 95th percentile being the typical choice. The 95th percentile of a sequence of %CPU utilization samples like the above is defined as the minimum sample S for which 95% of the samples in the sequence are less than or equal to S. Typically this will mean that 5% of the samples are greater than S. Why is this a good aggregation choice for capacity planning? If you plan enough capacity for the 95th percentile of a sequence of %CPU utilization samples over time, then this means 95% of the time you will have enough CPU capacity to service the observed load. Correspondingly, 5% of the time your systems may be over utilized, but this is a reasonable tradeoff between hardware costs and the fraction of time when responsiveness is degraded. A similar observation can be made for other resources such as disk IO, memory, and network,


Guest1: (00:00:00, 3.5), (00:05:00, 11.1), (00:10:00, 5.4), (00:15:00, ?.??), (00:20:00, 3.71), ..., (47:55:00, 1.19), (48:00:00, 15.0)Guest2: (00:00:15, ?.?), (00:05:35, ??.?), (00:10:47, 2.7), (00:16:03, 7.12), (00:21:13, 1.04), ..., (47:58:22, 1.19), (48:03:35, ??.?)

The two sequences are lined up such that the samples that are closest in time from Guest1 and Guest2 are stacked one right on top of the other. Notice, however, that the samples are not taken at exactly the same time. Due to all sorts of variables in the environment and the resource limits of the machine running MAP, performance metrics cannot be sampled from all machines simultaneously on a rigid 5 minute schedule. Moreover, notice that some of the samples are marked with question marks like "?.??" to indicate that these samples were unavailable because the target machine was offline, or MAP was not collecting performance data for that machine at the time. Clearly, the raw performance data has some rough edges. So what if we want to add these two sequences of network utilization metrics together to get the combined utilization of Guest1 and Guest2? This is where

1. The total time span over which normalized data exists is taken to be the minimum time for which raw performance data exists for some machine up to the maximum time for which raw performance data exists for some machine (call this time span Tmin to Tmax).

2. Tmin to Tmax is chopped up into 10 minute intervals starting at Tmin to give the sequence of times Tmin, Tmin+10, Tmin+20, ..., Tmin+NNN. The NNN in the last value in the sequence is the minimum multiple of 10 such that Tmin+NNN is greater than or equal to Tmax.

3. For each of these 10 minute intervals, if 1 or more samples of a performance metric exists for a machine in the bounds of that 10 minute interval then those samples are aggregated into a single normalized sample for the 10 minute interval. If no samples exist for a machine in the 10 minute interval, then an aggregate of samples nearby in time are used generate a normalized sample to fill in the hole in the data. So after normalization the

Guest1: (Tmin, 9.05), (Tmin+10, 5.4), (Tmin+20, 4.1), (Tmin+30, 13.3), (Tmin+40, 7.50), ..., (Tmin+NNN-10, 11.7), (Tmin+NNN, 8.69)Guest2: (Tmin, 2.70), (Tmin+10, 6.3), (Tmin+20, 1.8), (Tmin+30, 11.2), (Tmin+40, 10.1), ..., (Tmin+NNN-10, 2.31), (Tmin+NNN, 1.19)


Data Quality Considerations


Guest1: (00:00:00, 3.5), (00:05:00, 11.1), (00:10:00, 5.4), (00:15:00, ?.??), (00:20:00, 3.71), ..., (47:55:00, 1.19), (48:00:00, 15.0)Guest2: (00:00:15, ?.?), (00:05:35, ??.?), (00:10:47, 2.7), (00:16:03, 7.12), (00:21:13, 1.04), ..., (47:58:22, 1.19), (48:03:35, ??.?)

The two sequences are lined up such that the samples that are closest in time from Guest1 and Guest2 are stacked one right on top of the other. Notice, however, that the samples are not taken at exactly the same time. Due to all sorts of variables in the environment and the resource limits of the machine running MAP, performance metrics cannot be sampled from all machines simultaneously on a rigid 5 minute schedule. Moreover, notice that some of the samples are marked with question marks like "?.??" to indicate that these samples were unavailable because the target machine was offline, or MAP was not collecting performance data for that machine at the time. Clearly, the raw performance data has some rough edges. So what if we want to add these two sequences of network utilization metrics together to get the combined utilization of Guest1 and Guest2? This is where

1. The total time span over which normalized data exists is taken to be the minimum time for which raw performance data exists for some machine up to the maximum time for which raw performance data exists for some machine (call this time span Tmin to Tmax).

2. Tmin to Tmax is chopped up into 10 minute intervals starting at Tmin to give the sequence of times Tmin, Tmin+10, Tmin+20, ..., Tmin+NNN. The NNN in the last value in the sequence is the minimum multiple of 10 such that Tmin+NNN is greater than or equal to Tmax.

3. For each of these 10 minute intervals, if 1 or more samples of a performance metric exists for a machine in the bounds of that 10 minute interval then those samples are aggregated into a single normalized sample for the 10 minute interval. If no samples exist for a machine in the 10 minute interval, then an aggregate of samples nearby in time are used generate a normalized sample to fill in the hole in the data. So after normalization the

Guest1: (Tmin, 9.05), (Tmin+10, 5.4), (Tmin+20, 4.1), (Tmin+30, 13.3), (Tmin+40, 7.50), ..., (Tmin+NNN-10, 11.7), (Tmin+NNN, 8.69)Guest2: (Tmin, 2.70), (Tmin+10, 6.3), (Tmin+20, 1.8), (Tmin+30, 11.2), (Tmin+40, 10.1), ..., (Tmin+NNN-10, 2.31), (Tmin+NNN, 1.19)


Given the explanations in the previous two sections on how the 95th percentile aggregate and performance data normalization works, a natural question to ask is how you know that the 95th percentile aggregates of the normalized performance data accurately represent the real world behavior

If you think about the definition of the 95th percentile aggregate---the minimum sample S for which 95% of the samples in the sequence are less than or equal to S---, then it becomes clear that this aggregate is not very useful for small data sets. For example, what does it mean for 95% of 8 samples to be less than or equal to one of these 8 samples? MAP can still compute the 95th percentile aggregate in this case because it uses a deterministic algorithm to compute the value (the computed value will be or be close to the max value), but the statistically interesting properties of the 95th percentile only show up for much larger data sets. This means you should plan to collect performance data for at least 2 days before you can expect to get good values from the 95th percentile aggregate. That said, you should not hesitate to collect performance data for shorter periods of

Another data quality issue to consider is what happens when MAP normalizes the performance data and fills in values for times at which machines are missing values by using aggregates of other samples nearby in time. Filling in these "holes" is necessary so that we can add the sequences of performance metrics from different machines together as described in the previous section, but if there is a large percentage of missing values (say more than 5%), then this may significantly distort the statistical properties of the normalized performance data compared to the real world behavior.

1. When you collect performance data in MAP, you want to collect performance data for the same period of time for all machines.2. After collecting the performance data, generate the Performance Metrics report and look at the CollectionStatistics worksheet. For good results,

you should only use machines in the consolidation scenarios (Server Consolidation and Microsoft Private Cloud Fast Track) that have a high success

One consequence of the first rule of thumb is that the results will not be as accurate if you use the functionality in the Performance Metrics wizard that allows you to append the collection of new performance data to existing data. The primary purpose behind this feature is to let you continue collecting performance data from a set of machines if it was interrupted unexpectedly for some reason (for example, the MAP machine collecting the performance data rebooted after applying an update). If, for example, you use this feature to collect performance data from one group of machines on Monday and Tuesday and then another group of machines on Thursday and Friday, then the normalized time period over which data was collected is Monday through Friday. This means that 3 of the 5 days will be missing data for all of the machines and the "holes" in the data that the normalization process fills in will be at least 60% of the normalized data. This obviously is not a desirable state of affairs, so using the append

The Time Series Placement Algorithm

Infrastructures

Given the explanations in the previous two sections on how the 95th percentile aggregate and performance data normalization works, a natural question to ask is how you know that the 95th percentile aggregates of the normalized performance data accurately represent the real world behavior

If you think about the definition of the 95th percentile aggregate---the minimum sample S for which 95% of the samples in the sequence are less than or equal to S---, then it becomes clear that this aggregate is not very useful for small data sets. For example, what does it mean for 95% of 8 samples to be less than or equal to one of these 8 samples? MAP can still compute the 95th percentile aggregate in this case because it uses a deterministic algorithm to compute the value (the computed value will be or be close to the max value), but the statistically interesting properties of the 95th percentile only show up for much larger data sets. This means you should plan to collect performance data for at least 2 days before you can expect to get good values from the 95th percentile aggregate. That said, you should not hesitate to collect performance data for shorter periods of

Another data quality issue to consider is what happens when MAP normalizes the performance data and fills in values for times at which machines are missing values by using aggregates of other samples nearby in time. Filling in these "holes" is necessary so that we can add the sequences of performance metrics from different machines together as described in the previous section, but if there is a large percentage of missing values (say more than 5%), then this may significantly distort the statistical properties of the normalized performance data compared to the real world behavior.

1. When you collect performance data in MAP, you want to collect performance data for the same period of time for all machines.2. After collecting the performance data, generate the Performance Metrics report and look at the CollectionStatistics worksheet. For good results,

you should only use machines in the consolidation scenarios (Server Consolidation and Microsoft Private Cloud Fast Track) that have a high success

One consequence of the first rule of thumb is that the results will not be as accurate if you use the functionality in the Performance Metrics wizard that allows you to append the collection of new performance data to existing data. The primary purpose behind this feature is to let you continue collecting performance data from a set of machines if it was interrupted unexpectedly for some reason (for example, the MAP machine collecting the performance data rebooted after applying an update). If, for example, you use this feature to collect performance data from one group of machines on Monday and Tuesday and then another group of machines on Thursday and Friday, then the normalized time period over which data was collected is Monday through Friday. This means that 3 of the 5 days will be missing data for all of the machines and the "holes" in the data that the normalization process fills in will be at least 60% of the normalized data. This obviously is not a desirable state of affairs, so using the append

In versions of MAP prior to 6.0, a single aggregate (max or average depending on the resource) was computed for the entire time span over which performance data was collected for a machine for each resource type (CPU, memory, etc.). These numbers were used to determine if there was room for the machine on a Hyper-V host in the Server Consolidation scenario. This approach is not ideal because it does not deal well with machines whose resource utilization is uneven over time. For example, if the machine Guest1 is used by people in North America and Guest2 is used by people in Asia, then these machines are likely to have inverted resource usage profiles over time and would fit well together on the same Hyper-V host. Analogously, if two machines served the same geographic region then their resource usage profiles over time would likely be similar and they may not be a good fit together on the same Hyper-V host. However, using an average metric over the entire time span for which performance data was collected misses these subtleties. In addition to serving different geographic regions, there are myriad other reasons that machines could have usage profiles over time that fit well together or not. These observations led to the method introduced in MAP 6.0 to determine if there is room for a machine on a Hyper-V

The previous section described how MAP normalizes the raw performance data, and how this enables adding together the sequences of normalized resource utilization metrics for two or more machines over time. This ability to add sequences of normalized resource utilization metrics is at the heart of the Time Series Placement Algorithm introduced in MAP 6.0. While running the algorithm, suppose that MAP has determined that machines Guest1, Guest2, ..., Guest5 will fit on Hyper-V host Host1. How do we know if Guest6 will fit on Host1 as well? What the algorithm does is add together the sequences of normalized resource utilization metrics for Guest1, ..., Guest6 for each resource we care about (CPU, memory, etc.) and determines if Host1 has enough capacity in each resource dimension. What is enough capacity? Host1 has enough capacity if the 95th percentile aggregate of the sum of the normalized sequences for Guest1, ..., Guest6 is less than or equal to the total capacity of Host1 for that resource

There are numerous other subtleties to the Time Series Placement Algorithm, not the least of which is determining which potential host Host1, ..., HostN provides the "best" fit for the next candidate machine GuestM. That said, the ability to sum up sequences of normalized resource utilization metrics and take the 95th percentile of the summed sequence is the fundamental insight behind how MAP takes time into account when making consolidation suggestions in the Server Consolidation and Microsoft Private Cloud Fast Track scenarios.

MAP 6.0 introduces the notion of an "infrastructure" used in the Microsoft Private Cloud Fast Track scenario, and optionally in the Server Consolidation scenario. Basically an infrastructure is the resource enclosure in which a group of Hyper-V hosts is provisioned; i.e., a server rack with associated disk (SAN) and network resources along with the Hyper-V hosts. This allows you to run consolidation scenarios that are more reflective of how an organization may be buying server resources; namely, in units of pre-provisioned server racks with everything necessary to run a "private

So how is the infrastructure level taken into account during the Time Series Placement Algorithm when determining if a candidate machine will fit on a Hyper-V host residing in a particular infrastructure? Basically MAP just does the same thing at the infrastructure level that it does at the host level: it sums up the normalized resource utilization sequences for all the guests in the infrastructure across all hosts and determines if the 95th percentile aggregate of this generated sequence exceeds the SAN or network capacity of the infrastructure.

What Do the Numbers in the Reports Represent?The previous sections provide the basic information concerning how the raw performance data is normalized and aggregated and how this data is used in the placement algorithm underlying the Server Consolidation and the Microsoft Private Cloud Fast Track scenarios. This, however, does not explain which numbers are showing up where in the various reports; this section provides these remaining details.

PlacementMetricsSummary - The numbers in this worksheet are derived from the normalized performance data.ProcessorUtilization, MemoryUtilization, NetworkUtilization, PhysicalDiskUtilization, LogicalDiskUtilization - The numbers in these worksheets are

ConsolidationRecommendations - The numbers in this worksheet are derived from the normalized performance data with the details for placed

Placed Guests - All metrics are the 95th percentile of the normalized performance data except for Disk Space Utilization which uses the maximum value of the normalized performance data. Finally, the CPU utilization values for the placed guests have also been rescaled to the CPU

Hosts - All metrics are the 95th percentile of the summed sequences of the normalized performance data of the guests placed on that host except for Disk Space Utilization which is just a sum of the maximum values for the guests. The utilization values for memory and CPU for the hosts

Infrastructures - All metrics are the 95th percentile of the summed sequences of the normalized performance data of the guests placed on all the hosts in the infrastructure except for Disk Space Utilization which is just a sum of the maximum values for the guests.

IMPORTANT NOTE - Because the 95th percentile of the sum of sequences of performance metrics from the placed guests is not the same as the sum of the 95th percentile of each of those sequences, adding up the guest utilization values on this worksheet will not give you the value of the host utilization except in the case of disk space usage which does not use the 95th percentile aggregate. A similar observation can be made about the

UtilizationBeforeVirtualization - All metrics are the 95th percentile of the normalized performance data except for Disk Space Utilization which uses

ConsolidationOnBaseConfig - Analogous to the ConsolidationRecommendations worksheet in the ServerVirtRecommendation-<date>.xlsx workbook described above where the host configuration is the base configuration of the Microsoft Private Cloud Fast Track infrastructure selected in the Microsoft Private Cloud Fast Track Consolidation Wizard. Note that there are no memory and disk space overhead values specified for the placed

ConsolidationOnMaximumConfig - Just like ConsolidationOnBaseConfig except that the infrastructure configuration is the maximum configuration

UtilizationBeforeVirtualization - All metrics are the 95th percentile of the normalized performance data except for Disk Space Utilization which uses

Placement Metrics Summary

Machine Name Operating System CPUWIN-5RMIAUPUG85.hello365.com

This report provides aggregates of normalized performance metrics used in several consolidation scenarios. These aggregates include the average, maximum and 95th percentile values for CPU utilization, memory, disk IOPS, network utilization and storage utilization. Refer to the "How it works" worksheet for details on how the data is normalized.

Microsoft Windows Server 2008 R2 Enterprise

AMD A10 PRO-7350B R6, 10 Compute Cores 4C+6G, 64 bit

CPU Speed (GHz) Cores Average CPU Utilization (%)2.096 1 61.08

This report provides aggregates of normalized performance metrics used in several consolidation scenarios. These aggregates include the average, maximum and 95th percentile values for CPU utilization, memory, disk IOPS, network utilization and storage utilization. Refer to the "How it works"

Maximum CPU Utilization (%) 95th Percentile CPU Utilization (%) Memory (MB)99.27 99.27 2048

Average Memory Utilization (GB) Maximum Memory Utilization (GB)1.72 1.77 1.77

95th Percentile Memory Utilization (GB)

Average Disk IOPS Maximum Disk IOPS 95th Percentile Disk IOPS17.6 44.93 44.93

Avg Disk Writes/sec Max Disk Writes/sec 95th Percentile Disk Writes/sec13.47 35.35 35.35

Avg Disk Reads/sec Max Disk Reads/sec 95th Percentile Disk Reads/sec4.13 9.58 9.58

Average Network Utilization (MB/s)0 0 0

Maximum Network Utilization (MB/s)

95th Percentile Network Utilization (MB/s)

Avg Network Bytes Sent (MB/s) Max Network Bytes Sent (MB/s)0 0 0

95th Percentile Network Bytes Sent (MB/s)

Avg Network Bytes Received (MB/s) Max Network Bytes Received (MB/s)0 0 0

95th Percentile Network Bytes Received (MB/s)

Disk Drive Disk Drive Size (GB) Average Disk Space Utilization (GB)39 20.29VMware, VMware Virtual S SCSI Disk

Device

Maximum Disk Space Utilization (GB) Avg Disk Queue Length20.5 20.5 0.08

95th Percentile Disk Space Utilization (GB)

Max Disk Queue Length 95th Percentile Disk Queue Length Avg Disk Read Queue Length0.18 0.18 0.07

Max Disk Read Queue Length Avg Disk Write Queue Length0.14 0.14 0.01

95th Percentile Disk Read Queue Length

Max Disk Write Queue Length Avg Disk Bytes/sec0.04 0.04 388516.59

95th Percentile Disk Write Queue Length

Max Disk Bytes/sec 95th Percentile Disk Bytes/sec869981.53 869981.53

Collection Statistics

Host Name Oldest Sample Date Final Sample DateWIN-5RMIAUPUG85 Aug 16 2015 2:12PM Aug 16 2015 3:07PM

This report provides statistics for each machine for which data was collected during the collection period including the success ratio as well as reasons for failure (if any) during the collection period.

Number of Successful Attempts Success Percentage Last Collection Status12 100 Success

This report provides statistics for each machine for which data was collected during the collection period including the success ratio as well as reasons

PerfMetricResults 08-16-2015 15h08m29s

Documents

Transcript of PerfMetricResults 08-16-2015 15h08m29s