Matrix calculations in Excel.pdf

Matrix calculations in Excel

http://rcs.chph.ras.ru/Tutorials/excel_en.htm[2011/09/29 01:06:34 ب.ظ]


© Alexey Pomerantsev, Oxana RodionovaRussian Chemometrics Society http://rcs.chph.ras.ru/english.htm

Contents

Introduction1. Basic information

1.1. Regional and Language settings1.2. Workbook, worksheet, cell1.3. Addressing1.4. Range1.5. Simple calculations1.6. Functions1.7. Some important functions1.8. Errors in formulas1.9. Dragging formulas1.10. Create a chart

2. Matrix operations2.1. Array formulas2.2. Create and edit array formulas2.3. Simplest matrix operations2.4. Access to the part of a matrix2.5. Unary operations2.6. Binary operations2.7. Regression2.8. Critical bug in Excel 20032.9. Virtual array

3. Extension of Excel facilities3.1. VBA programming3.2. Example3.3. Macro example3.4. User-defined function example3.5. Add-Ins3.6. Add-Ins installation

Conclusion

Introduction

This tutorial is devoted to Excel which is a popular environment for calculations, analysis, and graphicrepresentation of data. Our goal is to present main Excel features used for data processing. The mainattention is concentrated on operation with multivariate data and matrix calculations. A specials section isdevoted to the Add-ins programs which extend Excel possibilities. This text is not a comprehensive Excelmanual but only a short introduction. The detailed Excel features are explained elsewhere in numerousbooks and articles.

Information presented in the tutorial refers to the basic properties of Excel 2003 and also Excel 2007. Thistutorial is written mainly for Excel 2003 users. Features essential for Excel 2007 are specially underlined inthe text.

Text is accompanied by the examples presented in workbook Excel.xls.

http://rcs.chph.ras.ru/english.htm

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls



Important information on Excel.xls application

Links to the examples are embedded as Excel objects in the tutorial.

Chemometrics tutorials

Contents

1. Basic information

1.1. Regional and Language settings

Excel software may be used in different countries and languages. Regional and Language settings changethe software appearance, names and syntax of standard worksheet functions. For example, recently, inRussian version, the comma (,) was used as a delimiter for the decimal part of a number, therefore thesemicolon (;) played the role of list separator (particularly for the function arguments). Luckily, this isbecoming the thing of the past and in scientific calculations the point (.) is widely used as the fractionseparator.

http://rcs.chph.ras.ru/Tutorials/moffice_en.htm

http://rcs.chph.ras.ru/Tutorials/english.htm



Fig. 1 Customizing regional options

One can use Control Panel for changing Regional and Language Options and Customize button for changing dataappearance (see. Fig. 1)

Names of standard Excel functions depend on the Language settings chosen for the computer. For example,in English version the summation function is

=SUM(A1:A9).

The same function in the Russian version appears as follows

=СУММ(A1:A9).

The list of the English and local function names is presented in file FUNCS.XLS that is ordinary located infolder C:\Program Files\Microsoft Office\OfficeVer\Local. OfficeVer is the name of the Excel version, e.g. Office 11, and Local is



the number of the local version, e.g. 1049 for Russian, and 1033 for English.. Details may be read here. FileExcel_Functions.xls contains the names of all Excel function in 16 languages.

In this tutorial we use the English version of Excel 2003 with the point as the fraction separator and thecomma for the list delimiter.

Contents

1.2. Workbook, worksheet, and cell

An Excel file with extension XLS (XLSX in Excel 2007) is called a workbook. When starting an Excel program,

.e.g. by clicking an icon , on a desktop, a new workbook will be opened.

Fig. 2 New Excel workbook

If a workbook exists, you can open it using Explorer by clicking on the file icon.

http://office.microsoft.com/en-us/excel/HP052427021033.aspx

http://rcs.chph.ras.ru/Tutorials/office/Excel_Functions.xls



Fig. 3 Opening an Excel workbook with the help of Explorer

A workbook consists of several worksheets. Worksheet names are shown in the bottom of the open window.One can delete, add, and rename worksheets. For this purpose right-click on the Sheet name and select anoperation in the pop-up menu.

Fig. 4 Manipulations with worksheets

A standard worksheet name is Sheet1, but you can give it any name you like, e.g. Data. Various objects canbe inserted in the sheet, e.g. charts, pictures, etc

Each worksheet consists of cells. These cells form a table with 256 columns and 65536 rows (In Excel 2007there are 16384 columns and 1048576 rows). Rows are labeled numerically as 1, 2, 3…, and columns arelabeled alphabetically as A, B, …,Z, AA, AB .., etc till column IV (In Excel 2007 till XFD). This referencestyle is called A1. Another reference style is called R1C1, in which the columns are also labeled numerically.Latter style is used rarely and we will not employ it below. Details can be read here.

Columns and rows can be deleted, added, and hided. It is also possible to change columns and rows heightand width.

A cell can contain different content such as number, text, or formula. For visualization purposes a cell can beformatted by changing font, color, border, and etc.

All Excel operations are executed with the help of the Menu items located at the top of the window. ( Fig. 2).Excel 2007 Menu essentially differs from previous versions. A special ribbon substitute habitual icons. We willnot explore these differences in details and redirect an interested reader to these instructions.

Contents

1.3. Addressing

In Excel, each cell has it own address comprised of the corresponding row and columns headings. Forexample, the address of the first cell in a worksheet is A1. The cell located at the intersection of the thirdcolumn and fifth row has address C5. Active cell is indicated by the bold frame and its address is displayed inthe Name Box window (see Fig. 2). To copy the content a cell (e.g. cell A1) into another cell (e.g. cell F1), theformula = address should be employed in the destination cell (e.g. = A1). Addressing (referencing) can beabsolute, relative, or mixed.


http://office.microsoft.com/en-us/excel/HA101662851033.aspx



For example, the first cell has absolute address $A$1, relative address A1, and two mixed addresses $A1and A$1. Differences in the addressing types are manifested when the cell formula is copied or moved intoanother cell. The following example illustrates this issue.

Fig. 5 Absolute and relative addressing

Fragment of the worksheet with data highlighted in yellow is presented in the top panel of Fig. 5. Differenttypes of references to cell A1 (orange cell) are presented in the green regions (column F and row 6).Reference type is indicated near each cell. Let's copy (one by one) the cells from the green region and past



them in columns G and H and rows 7 and 8 (Fig. 5, middle panel). One can see that the result depends onthe reference method. The reference to cell A1 is not changed for the absolute reference. For relativeaddressing, the reference shifts right and down keeping the relative distance between the cells. For mixedaddressing, the results depend on the location of the cell with reference and on the fixed (invariant) part ofthe reference indicated by symbol $. Bottom panel shows the results after formula replications.

For addressing a cell in another worksheet of the same workbook, the reference should include theworksheet name, e.g. Data!B2. Exclamation point (!) separates a worksheet name from a cell address. If aworksheet name includes blank, the name should be enclosed in the single quotes, e.g. 'Raw Spectra'!C6.For referencing the cell in another workbook, the workbook name is enclosed in the brackets, e.g.[Other.xls]Results!P24.

More about the methods of addressing is written here.

Contents

1.4. Range

A matrix occupies a range (i.e. a set of cells) in a worksheet. Fig. 6 illustrates the matrix with 9 rows (fromrow 2 till row 10) and 3 columns (from B till D).

Fig. 6 A range on a worksheet

The addresses of the upper left and lower right cells jointed by the semicolon are used for a matrixreferencing. For example B2:D10 or $B$2:$D$10.

Often it is useful to name a matrix range while operating on the matrices. There are two ways to do this.The most straightforward way is to select the area in a worksheet, then click into the Name Box window (seeFig. 2), delete the current address and input a name, e.g. Data (Fig. 6). The other way is to use the Insert-

Name-Define menu.

A range name may be global, i.e. accessible from any worksheet in a book, or local. A local name isaccessible only for the given worksheet. In the latter case the name should be defined in the following formSheetName!RangeName.

See details here

Contents





1.5. Simple calculations

For performing calculations Excel uses different formulas. A formula is started with the equal sign ( =) andmay include references, operators, constants, and functions.

Operators provided the simple arithmetic calculations are presented in Fig. 7

h

Fig. 7 Simple arithmetic operations

More about this topic.

Contents

1.6. Functions

A function is a standard formula, which operates using given values, called arguments .Examples of someuseful functions are presented in Fig. 8.

Fig. 8 Simple functions

A function consists of a name and a list of arguments embraced in parentheses and separated by commas(or by another list separator).

For example, function in Fig. 9 calculates the value of cumulative (cumulative=TRUE), standard (mean=0,standard_dev =1) normal distribution for the argument value given in cell A1.

Fig. 9 Entering the function by means of Formula Bar



http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Function!C1



There are different ways for entering a formula. The most straightforward way is to type a formula in theFormula Bar window (see. Fig. 2). Beforehand it is necessary to make the formula window visiblewith the help of the View-Formula Bar menu item. This method is suitable in case you remember the functionsyntax. A formula may be entered very quickly as arguments are inputted by clicking the cells with thecorresponding values.

The second method helps when we do not remember exactly a function name and/or the list of arguments.In this case one can use button Insert Function (Fig. 2). This button opens dialog window (Fig. 10) with thelist of accessible functions.

Fig. 10 Entering function by means of Insert Function, first step

As soon as a specific function is selected, the second dialog box appears. This box displays the list of thefunction arguments.



Fig. 11 Entering specific function by means of Insert Function, second step

Details may be studied here.

Contents

1.7. Some important functions

Excel includes numerous standard worksheet functions, which cannot be outlined here. Only functions thatare repeatedly used in chemometric applications will be considered.

SUM

Summarizes all values in the argument list, or in the range, and returns the sum.

Syntax:

SUM(number1 [,number2] [,...])

Example

http://office.microsoft.com/training/training.aspx?AssetID=RC061914861033



Fig. 12 Function SUM

Description

SUMSQ

Returns the sum of the squares of the arguments, or the cells in the range.

Syntax:

SUMSQ(number1 [,number2] [,...])

Example

Fig. 13 Function SUMSQ

Description

SUMPRODUCT

Pair-wise multiplies the corresponding components in the given ranges, and returns the sum of thoseproducts.

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#SUM!A13


http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#SUM!C13




Syntax:

SUMPRODUCT (array1, array2, ...)

Example

Fig. 14 Function SUMPRODUCT

Description

AVERAGE

Returns the arithmetic mean of the arguments, or the cell values in the range.

Syntax:

AVERAGE(number1 [,number2] [,...])

Example

Fig. 15 Function AVERAGE

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#SUM!L13


http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#SUM!E13



Description

VAR

Estimates variance based on a sample .

Syntax:

VAR(number1 ,number2, ...)

Example

Fig. 16 Function VAR

Description

STDEV

Estimates standard deviation based on a sample .

Syntax:

STDEV(number1 ,number2, ...)

Example


http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#SUM!G13




Fig. 17 Function STDEV

Description

CORREL

Returns the correlation coefficient between the cell ranges array1 and array2.

Syntax:

CORREL(array1, array2)

Example

Fig. 18 Function CORREL

Description

Functions may be combined in one formula, see example in Fig. 19.

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#SUM!I13


http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#SUM!O13




Fig. 19 Composite formula

Contents

1.8. Errors in formulas

Errors may occur on typing a formula and further in the course of a worksheet modification. In such a casethe cell does not contain an expected result but the special symbols indicating an error of different type.Description of error types is presented in Fig. 20.

Fig. 20 Errors in formulas

Menu item Formula Auditing in the Tool menu helps to reveal the source of an error.

Details regarding different errors may be read here.

Contents

1.9. Dragging formulas

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#SUM!Q13

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Errors!A1

http://office.microsoft.com/training/Training.aspx?AssetID=RP010919371033



Often, while working with matrices, there is a necessity to input not one but a range of formulae. Forexample, performing SNV pre-processing of spectral data it is necessary to calculate the mean and standarddeviation values for each row. It would be tiresome to input the same formula repeatedly with changes onlyin one argument even for a rather small example as in Fig. 22. In a real dataset the number of rows may beabout several thousands. Fortunately Excel has a special dragging mechanism for such operations.

Let us explain the dragging technique with the following example. We start with typing formula in cell J3.

Fig. 21 Fill handle

Formulas to the adjacent cells may be entered with the help of the fill handle, a small black point in thebottom right-hand corner of the selection. This handle changes to the black cross on cursor hit. Afterwardsthe active cell may be dragged to the adjustment region. Dragging direction may be vertical (down), as inFig. 22, and horizontal (right).

Fig. 22 Formula dragging

A formula may be duplicated in the other way. On the first step we copy the cell with a formula. Then weselect a range of the cells where this formula should be entered. Afterwards we use Paste Special operation withFormulas option selected.

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Expand!J3



Fig. 23 Duplicating a set of similar formulas

It may be seen that regardless of the duplicate method we yield a proper references in the functionarguments. This is due to utilizing the relative references, B3:I3, in the formula arguments.

Details may be read here.

Contents

1.10. Create a chart

The charts of different types can be created in Excel. Two types are of the most importance for use. Theyare the scatter and line plots. Fig. 24 presents an example of the scatter.

Fig. 24 Scatter chart

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Expand!B17


http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Bug!N4



Such charts are used for presenting of the scores plots, predicted versus measured dependences, and etc.There is an essential difference between the Scatter and Line charts. The Scatter chart has two peer valueaxes. In the Line chart x-axis is used only as the category axes in which just the order of categories isdisplayed but not the scaled numeric values. Therefore the line charts are suitable for presentingdependences on the number of principal components, e.g. diagrams of RMSEC and RMSEP versus modelcomplexity.

Chart creating is different in versions Excel 2003 and 2007. We will not stop on this issue and leave it for theself-instruction study.

Contents

2. Matrix operations

2.1. Array formulas

Many matrix operations are performed with the help of special, so-called array functions. The result ofcalculation of such a function is not a single value (number) but a set of numbers, e.g. array, even it has asingle value. Array formulas are created in the same way as any other Excel formula with one difference. Toenter an array formula it is necessary to press CTRL+SHIFT+ENTER for its completion.

Let us explain the application of array formulas with a simple example. Suppose we need to performautoscaling (column-wise centering and scaling) for the data in matrix X. For this purpose the mean valuesmj and the standard deviation values sj should be calculated for each column j of matrix X. Afterwards, weshould subtract value mj from each column and divide the result by sj

The autoscaling procedure may be performed by ordinary (not array) Excel functions as it is shown in Fig. 25.

Fig. 25 Ordinary formulas

In this case it is important to mind symbol $ before the rows with number 9 (m) and 11 (s), to fix theposition of the corresponding values in the formulas,

For large X matrix it is convenient to apply the array formula. (Fig. 26). Let us name by X, m and s thecorresponding ranges in the worksheet. At first we select the empty rangy N3:R7 that has the size ofanticipated result. Then we enter formula =(X-m)/s in the Formula Bar. Finally we terminate operation bypressing CTRL+SHIFT+ENTER. If all these actions were performed properly, we yield { =(X-m)/s} in the Formula Bar.Braces, {}, indicate the array formula.


http://rcs.chph.ras.ru/Tutorials/projection_en.htm#Ch1.4

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Standard!I4



Fig. 26 Array formula

Additional information

Contents

2.2. Create and edit array formulas

To enter an array formula properly it is necessary to select a region, which size corresponds to the size ofthe expected result.

In case the selected area is larger, the redundant cells are filled with error symbol #N/A after calculation. Ifthe selected region is smaller, the part of the result values is missed. After selection of the resulting region,a formula is placed in the Formula Bar and the whole operation is terminated by pressing CTRL+SHIFT+ENTER.

Alternatively, you may input a formula in one cell, then select the whole region with the cell containing theformula as the first in the area (right and down), switch to the Formula Bar, and press CTRL+SHIFT+ENTER.

For changing an array formula, it is necessary to select the whole range with the formula result and switchto the Formula Bar. Braces that frame the array formula will disappear. After that one can correct the formulaand press CTRL+SHIFT+ENTER to confirm the result.

For extending the output region of an array formula, it is sufficient to select a new region, then go to theFormula Bar, and press CTRL+SHIFT+ENTER. It is a little bit more complicated to reduce the resulting region, i.e. todelete extra cells that contain the #N/A symbols. First of all it is recommended to select one cell from theresulting region, switch to the Formula Bar, and copy the formula pressing Ctrl+Ins. Then you should clear the oldresulting region and select a smaller one. Afterwards you should switch to the Formula Bar, paste the formulapressing Shift+Ins, and terminate with CTRL+SHIFT+ENTER.

Changes in the cells inside the resulting region are forbidden. On attempting to change the content of such acell the following warning box is displayed .

Fig. 27 Warning regarding the forbidden operations with array formula

Contents

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Standard!N3




2.3. Simplest matrix operations

It is easy to sum up matrices or multiply matrix by a number with the help of array functions in Excel.

Fig. 28 Matrices addition and multiplication by a number

For matrices multiplication array function MMULT is used.

Contents

2.4. Access to the part of a matrix

Two standard worksheet functions are useful for access and manipulation with a part of a matrix.

OFFSET

Returns a reference to a region that is specified by number of rows and columns from a cell or a range ofcells.

Syntax

OFFSET(reference, rows, cols [, height] [, width])

reference is a reference to the first cell in the region from which offset starts;

rows is the number of rows for which the shift should be done (if positive, the shift direction isbelow the starting reference, if negative, the shift direction is above the starting cell);

cols is the same but for columns (if positive, the shift is to the right from the starting reference,if negative, the shift is to the left from the starting cell);

height is an optional argument, must be a positive number. This is a number of rows in thereturned reference;

width is an optional argument, must be a positive number. This is a number of columns in thereturned reference

Remarks

If arguments height and/or width are omitted, it is assumed to be the same height or width asreference;

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Binary!B10



Argument reference is a reference to a real (located in a worksheet) range, but not to a virtualarray.

Example

Fig. 29 Function OFFSET

OFFSET is an array function and must be terminated by CTRL+SHIFT+ENTER.

Description

INDEX

Returns the values in an array selected by indices of rows and columns.

Syntax

INDEX (reference [, row_num] [, column_num])

reference is a numerical array (a matrix);

row_num is an optional argument, a row number from which values are selected;

col_num is an optional argument, a column number from which values are selected;

Remarks

if argument row_num is omitted then the whole column is selected;

if argument col_num is omitted then the whole row is selected;

if both arguments are used, the function return a cell located in the intercept of the correspondingrow and column;

argument reference can be both a reference to a region located in a worksheet and to a virtualarray.

Example

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Lookup!I4




Fig. 30 Function INDEX

INDEX is an array function which must be terminated by CTRL+SHIFT+ENTER.

Description

Contents

2.5. Unary operations

The following unary operations can be performed on matrices.

MINVERSE

Returns the inverse matrix.

Syntax

MINVERSE (array)

array is a numerical array (a matrix).

Remarks

Matrix array must be a square one;

For a singular matrix symbol #VALUE is returned.

Example

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Lookup!I9




Fig. 31 Function MINVERSE

MINVERSE is an array function and must be terminated by CTRL+SHIFT+ENTER.

Description

TRANSPOSE

Returns the transposed matrix.

Syntax

TRANSPOSE (array)


Example

Fig. 32 Function TRANSPOSE

TRANSPOSE is an array function and must be terminated by CTRL+SHIFT+ENTER.

Description

MDETERM

Returns the matrix determinant.

Syntax

MDETERM (array)

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Unitary!B10


http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Unitary!B3





Remarks

Matrix array must be a square one.

MDETERM is not an array function and must be entered by pressing ENTER.

Description

Contents

2.6. Binary operations

The following binary operations can be performed on matrices.

MMULT

Return the results of multiplication of two matrices.

Syntax

MMULT (array1, array2)

array1, array2 are numerical arrays (matrices).

Remarks

The number of columns in array1 must be equal to the number of rows in array2, otherwisesymbol #VALUE! is returned;

The number of elements in the resulting matrix must be less than or equal to 5461 (Excel 2003).

Example

Fig. 33 Function MMULT

MMULT is an array function and must be terminated by CTRL+SHIFT+ENTER.


http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#MMULT!B11



Description

Contents

2.7. Regression

Several standard worksheet functions involve linear regression .

TREND

Builds the linear regression

y=b+m1 x1+…+mJ xJ+e

Approximates known response values of vector known_y's for given predictor matrix known_x's and returnsresponse values for given matrix new_x's.

Syntax

TREND(known_y's [,known_x's] [,new_x's] [,const])

known_y's is a vector of known response values y (calibration set);

known_x's is an optional argument. Matrix X of known values of predictors (calibration set);

new_x's is an optional argument. Matrix Xnew of new predictor values (test set or new matrix) forwhich the results are calculated and returned;

const is an optional argument. Boolean value which indicates whether parameter b must be equalto zero or not. If const equals TRUE or omitted then b is calculated in an ordinary way.Otherwise b=0.

Remarks

Vector known_y's must be located in a single column. In this case each column of array known_x's istreated as a separate variable;

If argument known_x's is omitted, it is supposed to be the vector of numbers {1;2;3;...} having thesame size as known_y's;

Matrix of new values new_x's must have the same number of columns (variables) as matrixknown_x's;

If argument new_x's is omitted, it is assumed to be the same as known_x's. The result is a vectorwith the same number of rows as in array new_x's.

Example




Fig. 34 Function TREND

TREND is an array function and must be terminated by CTRL+SHIFT+ENTER.

Description

LINEST

Calculates statistics for the linear regression

y=b+m1 x1+…+mJ xJ+e

Syntax

LINEST(known_y's [,known_x's] [,new_x's] [,const] [,stats])

known_y's is a vector of known response values y (calibration set);

known_x's is an optional argument. Matrix X of known values of predictors (calibration set);

new_x's is an optional argument. Matrix Xnew of new predictor values (test set or new matrix) forwhich the results are calculated and returned;

const is an optional argument. Boolean value which indicates whether parameter b must be equalto zero or not. If const equals TRUE or omitted then b is calculated in an ordinary way.Otherwise b=0.

stats is an optional argument. Boolean value which indicates the necessity to output additionalstatistics information. If stats is equal to FALSE or omitted, the estimates of regressionparameters mJ, …, m2, m1 and b are returned. Otherwise the following table is output

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Regression!I16




Fig. 35 Table provided by LINEST function

mJ, …, m2, m1 and b are the estimates of the regression parameters;

sJ, …, s2, s1 and sb are the standard errors for the regression parameters estimates;

R2 is the coefficient of determination;

sy is the standard error for the y estimate;

F is the F-statistics;

DoF is the number of the degrees of freedom;

SSreg is the regression sum of squares;

SSres is the residual sum of squares.

Remarks

LINEST is badly designed function and very inconvenient in application;

Remarks included in TREND description are applicable for LINEST function.

Example

Fig. 36 Function LINEST

LINEST is an array function and must be terminated by CTRL+SHIFT+ENTER.

Description

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Regression!M3




Contents

2.8. Critical bug in Excel 2003

Functions TERND and LINEST in Excel 2003 provide wrong results in special circumstances.

Wrong results are yielded when simultaneously

mean value for each predictor variable in X is equal to zero;

mean value of the response vector Y does not equal zero.

Fig. 37 demonstrates such a case. Mean values for all columns of Xc equal zero but mean value for vectorYc does not equal zero.

Example

Fig. 37 Bug in regression functions in Excel 2003

The situation may be corrected by a special trick. You can apply TREND function to the centered responsevalues and afterwards correct the result. For this purpose the following formula

=TREND(Yc-ym, Xc)+ym,

is used (see the same figure).

Strange that this bug was not mentioned by users; however in version Excel 2007 this error is corrected.

Contents

2.9. Virtual array

Often in the course of data processing there is a problem of storing the intermediate computations, whichare not important by themselves, but have to be calculated for yielding the final result. For example,residuals in the PCA decomposition are rarely analyzed per se, but used for calculation of the explainedvariance, orthogonal distances, and etc. At the same time, such intermediate arrays may be very large andmust be calculated for the various numbers of principal components. They cause flooding the worksheets bythe unnecessary, intermediate information. One can avoid this situation applying virtual arrays. Let usexplain this with a simple example.

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Bug!I4



Fig. 38 Example of application of a virtual array

Suppose that we have matrix A and we want to calculate determinate of matrix AtA. Fig. 38 demonstratedtwo methods of calculations. The first method uses a sequence of intermediate calculations indicated by thered arrows. The second method uses only one formula indicated by green arrow. Both methods provide thesame final result, but the first one occupies a lot of space in the worksheet, and the latter one uses severalvirtual intermediate arrays. All these virtual arrays are the same which are calculated by the first method,but they are not output explicitly.

The first virtual array is the transposed matrix At calculated by function TRANSPOSE(A).

The second virtual array is the result of multiplication of the first virtual array and matrix A using functionMMULT(TRANSPOSE(A), A).

Function MDETERM applied to the second virtual array gives the final result.

Virtual arrays are very useful for calculation of various auxiliary characteristics in multivariate data analysis,such as residuals, eigenvalues, leverages, etc. This is explained in details in tutorial Useful Formulae forChemometrics Add-In

Contents

3. Extension of Excel facilities

3.1. VBA programming

Sometimes, the Excel facilities are not enough and we have to add some user-defined features to Excel. Aspecially designed programming tool, Microsoft Visual Basic for Applications (VBA) is used for thesepurposes. VBA helps to create macros (a special set of instruction for performing a sequence of operation)

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#Virtual!B3

http://rcs.chph.ras.ru/Tutorials/tricks_en.htm

http://rcs.chph.ras.ru/Tutorials/tricks_en.htm



and user-defined functions (a special set of instructions for performing calculations in a worksheet). Macrosare used for automation of standard procedures. Once a macro is created it can be repeatedly used for aroutine operation. To run a macro from menu, select the Tools-Macro-Macros item. Sometimes it is moreconvenient to assign a macro to a special new button, placed in the Tool item, or in a worksheet.

User-defined functions are run in the same way as the standard Excel worksheet functions via Formula Bar.

To make the macros and user-defined functions accessible it is necessary to set up a special level of securityvia menu item Tools-Macro-Security (Excel 2003)

Fig. 39 Selection of security level in Excel 2003

For Excel 2007 the selection of the security level is performed via Office Button-Excel Options-Trust Center.



Fig. 40 Selection of the security level in Excel 2007

On each opening of an Excel file, the system will ask the permission for running macros, if level Medium (Excel2003) or Disable all macros with notification (Excel 2007) has been selected. We recommend selection of a securitylevel as it is shown in Fig. 39 and Fig. 40, but mind the reliable antivirus software for testing the alien Excelfiles.

The capabilities of the VBA application are essentially limited on the initial installation of Excel 2007. Torestore these properties you should follow the sequence Office Button– Excel Options–Popular and switch on optionShow Developer Tab in the Ribbon.

Details can be read here.

Contents

3.2. Example

Let us consider the application of VBA by an example. For modeling the nonisothermic kinetics (DSC, TGA,etc.) it is necessary to calculate the integral exponential function E1(x). By definition,

An infinite series can be used for E1(x) calculation


http://en.wikipedia.org/wiki/Exponential_integral

http://www.math.sfu.ca/~cbm/aands/page_229.htm



Each series term can be placed in its own cell in a worksheet. Afterwards one can sum up all these cells.This approach is presented in Fig. 41.

Fig. 41 Calculation of the E1(x) function in a worksheet

We cannot consider this approach as a wise one. First of all, these calculations occupy rather large area inthe worksheet. But the main drawback is that we do not know in advance how many terms should beincluded in the summation. Sometimes 10 terms are sufficient, but in other cases 50 terms are not enough.

Contents

3.3. Macro example

The second way of calculation is application of a recurrent formula that links two neighboring terms of theseries

For application of this formula it is necessary to manage the recurrence calculations in a worksheet. The

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#VBA!D5



example is shown in Fig. 42.

Fig. 42 Calculation of the E1(x) function by recurrence method

One iteration step transfers the values from range J2:J4 into range L2:L4. To start the next iteration step itis necessary to copy the values from range L2:L4 and paste them into range J2:J4. This copy-pasteoperation should insert only values without formulas, i.e. Copy-Paste Special must be used. Range H2:H4 containsinitial values for starting the iteration process. Numerously repeating the Copy-Paste Special operation, one canyield the target value in cell L4, though this is boring.

It is better to make a macro for the procedure automation. The simplest way for designing a macro isrecording the operations performed in a worksheet. Go to the menu item Tools–Macro–Record New Macro. In thedialog window (Fig. 43) one can name a macro and point the place where this macro will be stored.

Fig. 43 Dialog window for the macro recording

After the OK button is pressed, all operations performed in a worksheet are recorded in the macro procedure.When recording is over, it must be terminated by the Tools–Macro–Stop Recording command. The result may beseen in the Visual Basic editor.

http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#VBA!L3



Fig. 44 Visual Basic editor

Fig. 44 illustrates the macro after the minor correction. We added a cycle for repeating the Copy-Paste

operation for nIter times. Value nIter is determined in cell J6 in the worksheet. Cell J6 has a local name n.For finalizing of the automation, we use button Repeat with macro Iteration assigned to this button.

More about VBA macros can be read here.

Contents

3.4. User-defined function example

At last, the wisest decision for calculation of the integral exponential function is the designing of the user-defined function with the help of VBA

Function IntExp(dX) GammaConst = 0.5772156 Eps = 0.00000000001 IntExp = -GammaConst - Log(dX) nFact = 1 For n = 1 To 1000 nFact = nFact * n dR = (-dX) ^ n / n / nFact IntExp = IntExp - dR If (Abs(dR) < Eps) Then Exit Function Next n End Function

Fig. 45 IntExp function

Fig. 45 illustrates the VBA code and the function reference for this example.

We will not concern the VBA programming as this is a very large and complex issue. For self-study one canread a numerous literature elsewhere in the internet, for example here.

VBA macros are rather slow and they badly fit for the large calculations. For example, we do not recommendprogramming the PCA decomposition procedure in VBA. For the large arrays such a procedure will work veryslowly. Better to consider Excel and VBA as a front end tool for input and output of data, which are furtherpassed to the dynamic link library (DLL), coded in a fast programming language, such as C++ (back end).


http://rcs.chph.ras.ru/Tutorials/excel_en/Excel.xls#VBA!P2

http://www.excel-vba.com/excel-vba-contents.htm



This specific approach was used in Fitter and Chemometrics Add-In.

More about the user-defined functions can be read here.

Contents

3.5. Add-in

The VBA programs created by a user are stored in the same Excel workbook in which they were created. Tomake the macros available for other workbooks one can either copy the macros into these workbooks, orcreate an Add-In application.

Add-In is a special Excel file containing several VBA modules linked to the required DLL libraries. Add-In maybe added to Excel to extend its capabilities.

Standard Excel configuration includes several Add-Ins. The most interesting among them are: Solver andAnalysis Toolpak Add-Ins.

Solver is designed to optimize a value in the objective cell. The result is calculated with respect to other cellsfunctionally linked to the objective one.

Analysis Toolpak contains a set of statistical functions for data analysis

There are various Excel Add-Ins in the Internet. Some of them may be downloaded for free and others arecommercial. Bellow are some of these programs.

The J-Walk Chart Tools Add-In is a free utility for the chart management (add the legend, names, etc. to thechart elements); all the features that are absent in the standard Excel package

XLStat is a large and expensive package for statistical analysis, including PLS regression

The Multivariate Analysis Add-In is a shareware package for the multivariate data analysis from BristolUniversity

Fitter is an Add-In for nonlinear regression analysis

Chemometrics is a set of worksheet functions for multivariate data analysis

More about the Add-Ins can be read here.

Contents

3.6. Add-in installation

Before application of an Add-In program one must perform an installation consisting of two steps.

At the first step all files comprised the Add-In package must be placed in the computer. Some Add-Insprovide a special Setup program for automatic installation. In other cases, the files should be located by auser. We will explain how this should be performed.

A package always includes a file with extension XLA and several additional files with extension DLL, HLP, andetc. All these additional files should be placed in folder C:\Windows, or C:\Windows\System, or C:\Windows\System32.

The main XLA file may be located in any folder. But two following folders are preferable. Microsoftrecommends that XLA file should be placed in folder C:\Documents and Settings\User\Application Data\Microsoft\AddIns (hereUser is the login name under which the Windows is currently running on). Placing XLA file in this directoryprovide an easy access to the file at the second step of installation. At the same time when the workbooksare run on several computers under the different User names, the links to the XLA files are lost. Therefore, it

http://polycert.chph.ras.ru/fitter.htm

http://rcs.chph.ras.ru/Tutorials/projection_en.htm


.http://office.microsoft.com/en-us/excel/HP051983681033.aspx


http://spreadsheetpage.com/index.php/file/j_walk_chart_tools_add_in/

http://www.xlstat.com/en/home/

http://www.chm.bris.ac.uk/org/chemometrics/addins/index.html

http://polycert.chph.ras.ru/fitter.htm





is necessary to update the links continuously.

We recommend to place file Chemometrics.xla in a folder that has the same name at different computers,e.g. C:\Program Files\Chemometrics. Automated installation of Chemometrics Add-In is given here.

The second step of installation is performed from an open Excel workbook. For Excel 2003 it is necessary togo to Tools-Add-Ins., and in Excel 2007 to Office Button– Excel Options– Add-Ins–Go . In the open window (see Fig. 46)you must press Browse and find a pertinent XLA file.

Fig. 46 Add-in installation

After an Add-In has been installed, it may be loaded and unloaded by selecting the corresponding option infront of the Add-In name.

To uninstall an Add-In program, it is necessary to delete a check near its in name, close Excel, and deletefrom the computer all files related to this Add-In.

Contents

Conclusion

We consider only the main features of application of Excel for matrix manipulations. A lot of details left outof this text-book. Tutorial called Projection methods in Excel may partly fill this gap



http://rcs.chph.ras.ru/Tutorials/installation_en.htm




Contents

Matrix calculations in Excel.pdf

Documents

Transcript of Matrix calculations in Excel.pdf