Evaluating CPU Performance

12
Evaluating CPU Performance: an Exploration of the Estimation of Relative Performance Qi Zhou , Fiona Xue, Luyun Zhao ABSTRACT CPU performance is incredibly important to evaluate for choosing your computer, as well as system configuration and design. But how does one compare and evaluate CPU Performance? Currently, some manufacturers provide a number called “published relative performance.” However, there are several problems with relying on published relative performance to evaluate CPU performance. Phillip Ein-dor and Jacob Feldmesser collected a data set of several attributes of computers along with their published relative performance. In this paper, we will use this data and perform statistical analyses on it. We created a final model, for which we will show our statistical testing and how our model estimates PRP. In our modern world, computers are used so prevalently that evaluation of the performance of a computer system – or more specifically, the central processing unit (CPU) of a computer – is incredibly important in a variety of decisions. The CPU of a computer performs several functions: calculations, logical decisions, moving data from one place in the computer’s memory to another place, and multitasking such as switching between apps that are running on your desktop. Thus, CPU performance is incredibly important for not only selecting computers, but also for computer system configuration and system design. Published Relative Performance One of the ways that CPU performance is evaluated is by a dimensionless, manufacturer- provided metric called published relative performance (PRP). However, there are three problems with relying on published relative performance to evaluate CPU performance. First, not all manufacturers provide the metric, so we do not have the published relative performance for all brands and models on the market. Second, manufacturers are not transparent about their methods to come up with this metric. Because we do not have the PRP for all current models on the market and we do not have the methodologies the manufacturers to come up with the metric, we are unable to evaluate the CPU performance of all computers. So lastly, there is no way for us to come up with the PRP for computers not yet on the market. Thus, we must find a way to estimate PRP based on the data we currently have. The Dataset Phillip Ein-dor and Jacob Feldmesser sought to tackle this very problem in their paper “Attributes of the Performance of Central Processing Units: A Relative Performance Prediction Model”, written in 1987. They collected data for 209 computers that were on the market in 1981- 1984 for various manufacturers and performance capabilities. Seven attributes were collected for each computer: published relative performance, cache memory, minimum I/O channels, maximum I/O channels, machine cycle time, minimum main memory, and maximum main memory. Published relative performance was discussed above, but we shall briefly give layman descriptions of the other six attributes. (1) Cache memory, also known as CPU memory, is in kilobytes and is an integer value. A computer’s processor can access cache memory more rapidly than it can access regular random access memory (RAM), and provides information to the CPU in the shortest time without any lag. Thus, cache memory can significantly impact CPU Performance. (2) and (3) I/O channels are the equipment that form the input/output system of a computer, and transfer data between main storage and peripherals. Our data has maximum channel I/O and minimum channel I/O data, for which the unit is channels and the data points are

Transcript of Evaluating CPU Performance

Page 1: Evaluating CPU Performance

Evaluating CPU Performance: an Exploration of the Estimation of Relative Performance Qi Zhou , Fiona Xue, Luyun Zhao ABSTRACT CPU performance is incredibly important to evaluate for choosing your computer, as well as system configuration and design. But how does one compare and evaluate CPU Performance? Currently, some manufacturers provide a number called “published relative performance.” However, there are several problems with relying on published relative performance to evaluate CPU performance. Phillip Ein-dor and Jacob Feldmesser collected a data set of several attributes of computers along with their published relative performance. In this paper, we will use this data and perform statistical analyses on it. We created a final model, for which we will show our statistical testing and how our model estimates PRP. In our modern world, computers are used so prevalently that evaluation of the performance of a computer system – or more specifically, the central processing unit (CPU) of a computer – is incredibly important in a variety of decisions. The CPU of a computer performs several functions: calculations, logical decisions, moving data from one place in the computer’s memory to another place, and multitasking such as switching between apps that are running on your desktop. Thus, CPU performance is incredibly important for not only selecting computers, but also for computer system configuration and system design.

Published Relative Performance One of the ways that CPU performance is evaluated is by a dimensionless, manufacturer-provided metric called published relative performance (PRP). However, there are three problems with relying on published relative performance to evaluate CPU performance. First, not all manufacturers provide the metric, so we do not have the published relative performance for all brands and models on the market. Second, manufacturers are not transparent about their methods to come up with this metric. Because we do not have the PRP for all current models on the market and we do not have the methodologies the manufacturers to come up with the metric, we are unable to evaluate the CPU performance of all computers. So lastly, there is no way for us to come up with the PRP for computers not yet on the market. Thus, we must find a way to estimate PRP based on the data we currently have. The Dataset

Phillip Ein-dor and Jacob Feldmesser sought to tackle this very problem in their paper “Attributes of the Performance of Central Processing Units: A Relative Performance Prediction Model”, written in 1987. They collected data for 209 computers that were on the market in 1981-1984 for various manufacturers and performance capabilities. Seven attributes were collected for each computer: published relative performance, cache memory, minimum I/O channels, maximum I/O channels, machine cycle time, minimum main memory, and maximum main memory. Published relative performance was discussed above, but we shall briefly give layman descriptions of the other six attributes. (1) Cache memory, also known as CPU memory, is in kilobytes and is an integer value. A computer’s processor can access cache memory more rapidly than it can access regular random access memory (RAM), and provides information to the CPU in the shortest time without any lag. Thus, cache memory can significantly impact CPU Performance. (2) and (3) I/O channels are the equipment that form the input/output system of a computer, and transfer data between main storage and peripherals. Our data has maximum channel I/O and minimum channel I/O data, for which the unit is channels and the data points are

Page 2: Evaluating CPU Performance

all integer values. (4) Machine cycle time is the time it takes for a machine to complete all of its operations on a single piece. Thus, we would expect an inverse relationship between machine cycle time and CPU performance. This data is in nanoseconds. (5) and (6) lastly, main memory is where programs and data are kept when the processor is actively using them. The industry generally agrees that the performance of the memory system can significantly impact CPU performance. In this dataset, we have minimum main memory and maximum main memory, in kilobytes. Using these attributes, the authors created a model that attempts to estimate the published relative performance of a computer, with PRP as the dependent variable and the six defining attributes of the computers as the independent variables. We will not be discussing their paper further except to compare our model to theirs. Based on the definitions and our research, we expect cache memory and main memory variables to be the most important factors in the regression, and channel minimum should have a very small effect on PRP. We are not sure of the degree of impact of machine cycle time, but as previously mentioned, it is clear that the longer a machine cycle is (i.e. the data point will be larger), the lower the CPU performance must be. The primary questions we sought to answer are: what is the model that we can produce that best estimates PRP? How accurate or reliable is this model in estimating PRP? Are there any significant interactions that must be included to do so? We discovered that using just five of the six specifications of computers, we were able to relatively accurately predict PRP and CPU performance. The following sections of our paper will show our statistical analyses and justification of why our final model is a good predictor of PRP. Methodology – Exploratory Data Analysis (I)

Our first step in our exploration is to find the outliers of Ein-dor and Feldmesser’s dataset. Outliers in our data set will be specific computer models that have a value for one of the seven attributes that is extremely different from the other values in our data set. Thus, the inclusion or exclusions of outliers can greatly affect our analysis and process. Here is an exploratory data analysis of our dependent and six independent variables. First, basic boxplots of each the dependent and independent variables show potential univariate outliers. Based on the boxplot of PRP, we see 10 potential outliers:

Figure 1: Boxplot of PRP

Page 3: Evaluating CPU Performance

Next, for our six independent variables, we see that each variable has a varying amount of potential univariate outliers.

Figure 2: Boxplots of the independent variables

Further exploratory data analysis is included in the Appendix, Part I. Though our EDA might indicate the potential presence of univariate outliers, there are two reasons why we have concluded that there are not any univariate outliers. First, the authors, the creators of the data set, stated explicitly that they deliberately selected 209 computers that represent a wide range of values of all 6 attributes so that their final model could predict PRP for varying ranges of specifications. Thus, if we deleted outliers based solely on our EDA, we would be limiting the predicting power of our final model. Second, manufacturers would not spend the money and time to create a computer with specifications that are illogical or wildly abnormal.

Methodology – Transformations (II) We plotted the histograms of our variables as part of our exploratory data analysis.

Figure 3: Histogram of PRP

Page 4: Evaluating CPU Performance

Figure 4: Histograms of the six independent variables

Based on the heavy left-skewedness of our data, we chose to employ a log transformation on published relative performance (PRP), machine cycle time (MYCT), main memory minimum (MMIN), and main memory maximum (MMAX). We did not take the log transform of cache memory (CACHE), channel I/O minimum (CHMIN), or channel I/O maximum (CHMAX) because of the presence of zeros in those data subsets. Performing a linear regression of log(PRP) on these 6 independent variables, 3 transformed and 3 un-transformed [log(PRP)~log(MYCT)+log(MMIN)+log(MMAX)+CACHE+CHMIN+CHMAX], produced a good preliminary result.

Figure 3: Paired plots of PRP vs. the 6 independent variables

To justify that leaving cache memory, channel I/O minimum, and channel I/O maximum un-transformed was the best route, we took the log transform of each and repeated the linear

Page 5: Evaluating CPU Performance

regression of log(PRP) on all 6 independent variables [log(PRP)~log(MYCT)+log(MMIN)+ log(MMAX)+log(CHMIN)+log(CHMAX)+log(CACHE)]. This was achieved by replacing all the 0’s in those data columns with a small number (0.1), a valid method because cache memory, channel I/O minimum, and channel I/O maximum are all integers in our data set. This second model with the non-zero data and 6 log transforms was not as good of a fit as our first model with 3 log transformed and 3 un-transformed independent variables. Based on shape of the paired plots of CACHE, CHMIN, and CHMAX, we also decided to attempt square root transforms and cube root transforms on those three variables. These further attempts only produced models that are worse than [log(PRP)~log(MYCT)+log(MMIN)+log(MMAX )+CACHE+CHMIN+CHMAX]. Thus, we decided to leave those three variables untransformed. From our new paired-plots below, we can see the result the log transform of PRP, MYCT, MMAX, and MMIN produced.

Figure 4: Testing of the transformed variables

Model – Preliminary (III) Here are the results of our preliminary model, where variables with a (T) preceding them indicate a log transformation: Log(RP)~TMYCT+TMMIN+TMMAX+CACHE+CHMIN+CHMAX. R^2 RSS Average Deviation BIC 0.833 0.4348 0.3548 -342.76 For reference, the average deviation of the authors’ paper was 34.10%. The methods of calculating average deviation will be discussed in Average Deviation, section (VI). Methodology – Outliers, cont. (IV)

Based on our preliminary model, it is important to look at whether or not there are any regression outliers that may be skewing our model. We performed a robust regression and plotted the qqPlot to detect any possible outliers that are being masked in our preliminary model, which is an OLS fit, then plotted the Cook’s distances.

Page 6: Evaluating CPU Performance

Figure 5: qqPlot of the residuals of our robust regression

Figure 6: Cook’s distance

The qqPlot shows that our robust regression result was a decent linear fit, with 6 points just outside of the confidence interval. The coefficients of the robust regression did not differ greatly from our OLS preliminary model, so we concluded there were no regression outliers. We noted that our Cook’s distance plot shows that we do have three influential cases (1,15,157). Methodology – Variable Selection (V)

As further testing on our preliminary model, we ran a series of variable selection methods: Bayesian, stepwise, and an MG screen. The table below shows which variables in our preliminary model each method found to be significant. This is denoted by a (*). Note that only the Bayesian method produced a result different from our preliminary model. Method CACHE CHMIN CHMAX tMMIN tMMAX tMYCT Bayesian Model

* * * * *

Page 7: Evaluating CPU Performance

Stepwise * * * * * * MG-screen * * * * * *

Figure 6: Variable selection We chose to alter our preliminary model based on the preliminary result. This new regression produced a slightly lower R^2, but a better BIC and a lower average deviation that our preliminary model. Looking at the correlation plot below:

Figure 7: Correlation plot

We can see that channel I/O minimum and cache memory show very similar behavior. Thus, based on the result of the Bayesian model to delete CHMIN, the arguably improved model results, and the redundancy of the behavior of CHMIN and CACHE, we removed CHMIN from our model. Average Deviation

Average deviation was a metric that the original authors used as a gauge of the accuracy of their model. We employed the same technique as a further means to test our model. Average deviation is calculated by taking the average of the absolute value of the difference in our estimated PRP (ERP) and the actual PRP of all 209 computers:

Average (!"#!!"!!"!

) x 100%, where ERP represents the fitted values.

Final Model & Testing (VII)

Because our variables are all hardware components, it is clear there must be some interactions between variables that are currently not represented in our model. After exploring the intuitive interactions based on the definitions of the variables, we found ultimately that there is a big interaction between machine cycle time and channel I/O maximum. Thus brings us to our final model: Log(PRP)~CACHE+CHMAX+TMYCT+TMMAX+TMMIN+TMYCT*CHMAX. Here is a table of the results, against our preliminary models for comparison.

Page 8: Evaluating CPU Performance

Model R^2 RSS Average Deviation BIC Preliminary 0.833 0.4348 0.3548 -342.76 Preliminary - CHMIN 0.8261 0.4372 0.3540 -333.54 Final 0.8343 0.4331 0.3534 -343.63

Figure 8: Regression results

Intercept CACHE CHMAX TMYCT TMMIN TMMAX Interaction Coeff. -0.4318 0.007604 0.0180180 -0.113569 0.192822 0.370235 -0.002965 Sig. -- *** ** * *** *** *

Figure 9: Regression coefficients

Figure 10: qqPlot and residuals plot

Conclusion The estimation of PRP – and ultimately, CPU Performance – will continue to be a difficult and challenging task. However, our paper has shown that using just five attributes of computers, we can predict the dimensionless metric PRP with almost 85% accuracy. We have come up with a model that is comparable to the best model produced by Ein-dor and Feldmesser using just five of the six attributes and different transformations. Furthermore, our final model confirms our prior knowledge about the variables, as well as our guesses about which attributes would be influential towards the estimation of PRP. Our final model does not regress on channel I/O minimum, which we intuitively guessed from the beginning would not be important in predicting CPU performance, and also corroborates our guess that machine cycle time will negatively impact PRP. Our model also indicates that cache memory and the main memory variables are the most significant to the regression, and based on the coefficient, that the main memory variables influence PRP the most. Thus, we have created a model that is both statistically and intuitively valid, and a helpful tool for estimating CPU performance.

Page 9: Evaluating CPU Performance

References Ein-Dor, Phillip, and Jacob Feldmesser. "Attributes of the Performance of Central Processing Units: A Relative Performance Prediction Model." Communications of the ACM (1987): 308-17. Print. Faye-Wolfe, Vic. "How The Computer Works: The CPU and Memory." How The Computer Works: The CPU and Memory. Web. 6 Mar. 2015. Nicholls, Fred. "Telecommunications: Channel Capacity." EEE482. University of Cape Town, Cape Town, South Africa. 1 May 2003. Lecture. Venables, W. N., and Brian D. Ripley. Modern Applied Statistics with S. 4th ed. New York: Springer, 2002. Print. Appendix - R code #Import of data data<- read.table(file.choose(), sep= ",", col.name=c("vendor name","Model Name","MYCT","MMIN","MMAX","CACHE","CHMIN","CHMAX","PRP","ERP")) MYCT<- data$MYCT MMIN<- data$MMIN MMAX<- data$MMAX CACHE<- data$CACHE CHMIN<-data$CHMIN CHMAX<-data$CHMAX PRP<- data$PRP ERP<- data$ERP #Transformations t.MYCT <- log(MYCT) t.MMIN <- log(MMIN) t.MMAX <- log(MMAX) t.CACHE <- log(CACHE) t.CHMAX <- log(CHMAX) t.CHMIN <- log(CHMIN) t.PRP <- log(PRP) I. Methodology – Exploratory Data Analysis (I) #Boxplots summary(data) par(mfrow=c(1,1)) boxplot(PRP) par(mfrow=c(2,3))

Page 10: Evaluating CPU Performance

boxplot(MYCT) boxplot(MMAX) boxplot(MMIN) boxplot(CACHE) boxplot(CHMIN) boxplot(CHMAX) View(data) #Histograms par(mfrow=c(1,1)) hist(PRP) par(mfrow=c(2,3)) hist(MYCT) hist(MMAX) hist(MMIN) hist(CACHE) hist(CHMIN) hist(CHMAX) II. Methodology – Transformations #Pairwise plots par(mfrow=c(2,3)) plot(MYCT,PRP) plot(MMAX,PRP) plot(MMIN,PRP) plot(CACHE,PRP) plot(CHMIN,PRP) plot(CHMAX,PRP) par(mfrow=c(1,1)) #Pairwise plots after transformation par(mfrow=c(2,3)) plot(t.MYCT,t.PRP) abline(coef=coef(lm(t.PRP~t.MYCT)),col=2) plot(t.MMAX,t.PRP) abline(coef=coef(lm(t.PRP~t.MMAX)),col=2) plot(t.MMIN,t.PRP) abline(coef=coef(lm(t.PRP~t.MMIN)),col=2) plot(CACHE,t.PRP) abline(coef=coef(lm(t.PRP~CACHE)),col=2) plot(CHMIN,t.PRP) abline(coef=coef(lm(t.PRP~CHMIN)),col=2) plot(CHMAX,t.PRP) abline(coef=coef(lm(t.PRP~CHMAX)),col=2) III. Model – Preliminary lm.1 <- lm(t.PRP~t.MYCT+t.MMAX+t.MMIN+CACHE+CHMIN+CHMAX)

Page 11: Evaluating CPU Performance

summary(lm.1) avg.dev1<-mean(abs(exp(lm.1$fitted)-PRP)/PRP) BIC1<- IV. Methodology – Outliers (cont.) ## Regression Outliers #Plots library(car) par(mfrow=c(1,1)) qqPlot(lm.1$residuals) hist(lm.1$residuals) boxplot(lm.1$residuals) plot(lm.1$fitted,lm.1$residuals) par(mfrow=c(2,2)) plot(lm.1) #Robust library(robustbase) library(hett) lmrob <- lmrob(t.PRP~t.MYCT+t.MMAX+t.MMIN+CHMAX+CHMIN+CACHE) summary(lmrob) lmrob$rweights[lmrob$rweights<0.1] tlm <- tlm(t.PRP~t.MYCT+t.MMAX+t.MMIN+CHMAX+CACHE+CHMIN) summary(tlm) tlm$random[tlm$random<0.1] lts <- ltsReg(t.PRP~t.MYCT+t.MMAX+t.MMIN+CHMAX+CACHE+CHMIN) summary(lts) lts$lts.wt[lts$lts.wt<0.1] n <- length(t.PRP) par (mfrow=c(2,2)) plot (1:n, lts$lts.wt, xlab="Index",ylab="LTS weights") plot (1:n, lmrob$rweights, xlab="Index",ylab="lmrob weights") plot (1:n, tlm$random, xlab="Index",ylab="t-regression weights") #Comapre coefficients summary(lm.1) summary(lmrob) (coef(lmrob)-coef(lm.1))/coef(lm.1) # Cook's distance par(mfrow=c(1,1)) p=6 cooks0 <- cooks.distance (lm.1) plot (cooks0,type="h",main="Cook's distances, Original data") abline (h=.25/(p+1),col="red") abline (h=1/(p+1),col="red") cooks0[cooks0>1/(p+1)] ###Comment: No regression outliers, but have three significant influential cases 1,15,157 V. Methodology – Variable Selection

Page 12: Evaluating CPU Performance

##Variable Selection #Freedman summary(lm.1) #Stepwise step <- stepAIC(lm.1, direction="both") step$anova #Bayesian Selection library(BMA) data.bic <- data.frame(t.PRP,t.MYCT,t.MMAX,t.MMIN,CHMAX,CHMIN,CACHE) x <- data.bic[,-1] y <- data.bic[,1] bicreg <- bicreg(x,y) summary(bicreg)

#Correlation Map library(corrplot) data2 <- cbind(t.PRP,t.MYCT,t.MMAX,t.MMIN,CACHE,CHMIN,CHMAX) par(mfrow=c(1,1)) corrplot(cor(data2)) VI. Final Model & Testing ##Final Model&Testing lm.2<- lm(t.PRP~t.MYCT+t.MMIN+t.MMAX+CACHE+CHMAX+t.MYCT*CHMAX) summary(lm.2) par(mfrow=c(1,2)) qqPlot(lm.2$residuals) plot(lm.2$fitted,lm.2$residuals) abline(h=0,col=2) avg.dev<- mean(abs(exp(lm.2$fitted)-PRP)/PRP) avg.dev R2 <- summary(lm.2)$r.sq R2 BIC <- n*log(1-R2)+p*log(n) BIC plot(lm.2) hist(lm.2$residuals) boxplot(lm.2$residuals)