2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

12
1 How to Run PAM analysis using R in combination with Coffalyser.NET program To classify a sample as BRCA1-like or non-BRCA1-like a classifier in the statistical programming language R can be used. This classifier was developed using Prediction analysis for microarrays (PAM) () (Tibshirani R et al. 2002, PNAS, 99:6567-72). This is an approach to cancer class prediction from gene expression profiling, based on an enhancement of the simple nearest prototype (centroid) classifier. The prototypes shrink and hence obtain a classifier that is often more accurate than competing methods. The method of “nearest shrunken centroids” identifies subsets of genes that best characterize each class. The technique is general and can be used in many other classification problems. More information about this method can be found at: http://statweb.stanford.edu/~tibs/SAM/ Contents How to Run PAM analysis using R in combination with Coffalyser.NET program............................................. 1 Coffalyser.Net - Support............................................................................................................................. 2 Contact us.................................................................................................................................................. 2 Using R scripts for calling your BRCA1ness classification with P376 ............................................................ 3 Step 1: Collect all relevant data / programs ............................................................................................ 3 Step 2: Windows regional settings ......................................................................................................... 3 Step 3: Install Coffalyser.Net and install R .............................................................................................. 3 Step 4: Install pamr package in R ............................................................................................................ 3 Create your training data file ..................................................................................................................... 4 Formats of export files ........................................................................................................................... 4 Step 5. Analyze your training data set .................................................................................................... 5 Step 6. Open the experiment explorer and export data in R format........................................................ 6 Step 7. Add the classification to the txt training data.............................................................................. 7 Calling your unknown data......................................................................................................................... 9 Step 8. Analyze your test data ................................................................................................................ 9 Step 9. Call your data in R..................................................................................................................... 11 Output file ........................................................................................................................................... 12

description

2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

Transcript of 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

Page 1: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

1

How to Run PAM analysis using R in combination

with Coffalyser.NET program

To classify a sample as BRCA1-like or non-BRCA1-like a classifier in the statistical programming language R

can be used. This classifier was developed using Prediction analysis for microarrays (PAM) () (Tibshirani R et

al. 2002, PNAS, 99:6567-72).

This is an approach to cancer class prediction from gene expression profiling, based on an enhancement of

the simple nearest prototype (centroid) classifier. The prototypes shrink and hence obtain a classifier that is

often more accurate than competing methods. The method of “nearest shrunken centroids” identifies

subsets of genes that best characterize each class. The technique is general and can be used in many other

classification problems. More information about this method can be found at:

http://statweb.stanford.edu/~tibs/SAM/

Contents How to Run PAM analysis using R in combination with Coffalyser.NET program............................................. 1

Coffalyser.Net - Support............................................................................................................................. 2

Contact us.................................................................................................................................................. 2

Using R scripts for calling your BRCA1ness classification with P376 ............................................................ 3

Step 1: Collect all relevant data / programs ............................................................................................ 3

Step 2: Windows regional settings ......................................................................................................... 3

Step 3: Install Coffalyser.Net and install R .............................................................................................. 3

Step 4: Install pamr package in R ............................................................................................................ 3

Create your training data file ..................................................................................................................... 4

Formats of export files ........................................................................................................................... 4

Step 5. Analyze your training data set .................................................................................................... 5

Step 6. Open the experiment explorer and export data in R format ........................................................ 6

Step 7. Add the classification to the txt training data .............................................................................. 7

Calling your unknown data......................................................................................................................... 9

Step 8. Analyze your test data ................................................................................................................ 9

Step 9. Call your data in R..................................................................................................................... 11

Output file ........................................................................................................................................... 12

Page 2: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

2

Coffalyser.Net - Support

• Coffalyser.net Home Wordpress

http://coffalyser.wordpress.com/

• YouTube (flash instruction videos)

http://www.youtube.com/user/Coffalyser

• Registration page, click on login on the left side:

http://www.mlpa.com

• Wiki (our old home for support material)

http://wiki.coffalyser.net

• Publication with regard to analysis methods (open book)

http://www.intechopen.com/books/modern-approaches-to-quality-control/analysis-of-mlpa-data-using-novel-

software-coffalyser-net-by-mrc-holland

• MRC-Holland Main

http://www.mlpa.com

• Download R for your operating system at:

http://cran.r-project.org/bin/windows/base/

NOTE: we only tested version 2.15.1

Contact us

• MRC-Holland provides free support to all Coffalyser.Net users.

• For general MLPA related questions you can send an email to [email protected]

• For Coffalyser.Net related questions you can send an email to [email protected]

Page 3: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

3

Using R scripts for calling your BRCA1ness classification with P376

Step 1: Collect all relevant data / programs

You need to have:

• The last version of Coffalyser.Net v.140425.1321 (www.mlpa.com)

• The R program version (versions 3.1.1, 3.0.3 and 2.15.1 had been tested at MRC-Holland)

(http://cran.r-project.org/bin/windows/base/old/)

• The training data in ABIF format (if not yet provided with this manual email to [email protected])

• The unknown data you are planning to call for BRCAness-like in ABIF format

Step 2: Windows regional settings

Please note that in our findings the method did not work unless your regional settings have a dot as the

decimal separation sign and a comma as the thousand separation sign. You can adjust these settings under:

Start Menu � Configuration Screen � Clock, Language and Region � Region and Language � On the tab

Formats � Additional Settings � Customize Format

Step 3: Install Coffalyser.Net and install R

Install Coffalyser.Net using the installation manual provided with the setup files. Also install one of in step 1

mentioned R versions for Windows according to the instruction on the screen.

Step 4: Install pamr package in R

You will need to install the R-package for PAM training and calling. From the menu bar click on “Packages”

and then select “Install package(s)”.

Page 4: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

4

Next you will need to select a mirror. Select the closest mirror to your location. Now scroll through the list of

packages and select “pamr” and click on “OK”.

Create your training data file

Before you can classify your unknown data you will first need to make a training data set using a set of

samples of which the type is already known. MRC-Holland can provide a set of samples in ABIF format that

includes reference samples and test samples. Within the selection of test samples, there are sporadic tumors

and BRCA1ness tumors. These samples were analyzed using the P376 lot B2-0911 MLPA mix and the

fragment products were separated on an ABI-3130 XL genetic analyzer with a GS-500 LIZ size marker.

This training set can also be used when test samples are analysed with P376 lot B3-0414.

Formats of export files

Coffalyser.Net has a special export function that will export files to a format that can directly be accepted by

R. If you do not wish to use the export function, then please consult the manual of R and PAM in order to

create input files in the correct format.

Please note: Both data types (training and Unknown data) need to be normalised in the same way. Our

recommendation is to use Coffalyser.Net to normalise your data. However, if you are using the global

normalisation method described by the NKI (see detailed instructions Lips E. et al. 2011 Breast Cancer Res.

13(5):R107), you need to normalise your unknown data set in the very same way.

Mosaicism: In case your tumor samples contain normal cells then better results may be obtained by

changing the arbitrary borders to 0.85-1.2.

Page 5: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

5

Step 5. Analyze your training data set

Open Coffalyser.Net and analyze the training data set provided by MRC-Holland according to the analysis

manual that can be found on the Coffalyser.Net home page. Use the reference samples and No DNA samples

as provided below.

Reference samples:

1. P376-B2-0911-NEW MB-NKI-REF1-CHE-1

2. P376-B2-0911-NEW MB-NKI-REF2-CHE-1

3. P376-B2-0911-NEW MB-NKI-REF3-CHE-2

4. P376-B2-0911-NEW MB-NKI-REF4-CHE-3

5. P376-B2-0911-NEW MB-NKI-REF5-CHE-3

6. P376-B2-0911-NEW MB-NKI-REF6-CHE-4

7. P376-B2-0911-NEW MB-NKI-REF7-CHE-5

8. P376-B2-0911-NEW MB-NKI-REF8-CHE-6

9. P376-B2-0911-NEW MB-NKI-REF9-CHE-7

10. P376-B2-0911-NEW MB-NKI-REF10-CHE-7

11. P376-B2-0911-NEW MB-NKI-REF11-CHE-8

12. P376-B2-0911-NEW MB-NKI-REF12-CHE-9

13. P376-B2-0911-NEW MB-NKI-REF13-CHE-10

14. P376-B2-0911-NEW MB-NKI-REF14-CHE-10

15. P376-B2-0911-NEW MB-NKI-REF15-CHE-11

16. P376-B2-0911-NEW MB-NKI-REF15-CHE-11-2

No DNA control samples:

1. P376-B2-0911-NEW MB-NKI-NODNA-CHE-5

2. P376-B2-0911-NEW MB-NKI-NODNA-CHE-10

Note: during the analysis we recommend to only include samples that have 100% score on the FMRS. So

only include samples that have 4 bars for the FMRS score in the comparative analysis. Also please note

that if you change analysis settings that you are consequent with these changes for both your test samples

and the training data set!

Page 6: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

6

Step 6. Open the experiment explorer and export data in R format

Open the experiment results from the experiment analysis form.

In the ‘Comparative Analysis Experiment Explorer’ use the key combination of: Ctrl + Shift + Alt + R, this will

allow you to save the grid data to a specific txt file format that may be used for R.

Do not make the R-script file yet, you will need to use this option for your test data later.

Page 7: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

7

Step 7. Add the classification to the txt training data

Open the trainings set data in Excel. In the first row you will see the sample names. You can make the names

easier recognizable by replacing P376-B2-0911-NEW MB-NKI- for nothing ( ) in the entire row. You can do

this by selecting the entire row and using the key combination “Ctrl-F”.

Now select all the columns that contain the reference samples (noted with “REF”) and remove these

columns from the worksheet. Also remove the column with the sample names: N20120-CHE-10, N16986-

CHE-10, B1022-CHE-6 and C020-CHE-7.

Page 8: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

8

Now we need to add the classification for each sample in row 2, right underneath the sample name. You can

find the classification of all the samples of the training set in the table below. Please be sure to use the exact

classification names for all samples. If you accidently add a single symbol then this sample will be seen as a

new group.

Table 1: Classification of all samples in the training data

Sporadic_Like BRCA1_Like

2058 2131

2124 2165

2134 2224

2151 2254

2169 2312

2175 2355

2182 B1007

2188 B1035

2195 B1045

2204 B1049

2216 B1058

2227 B1061

2232 B1064

2234 B1065

2276 C119

2278 C121

2295 T4147

2298 T6701

2350

C035

C036

C044

C048

C065

C068

C127

C128

C129

C130

Now save the changes you made to the grid, be sure that you KEEP the format: Text (Tab delimited)!

Page 9: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

9

Calling your unknown data

Step 8. Analyze your test data

Now analyze your unknown test data and open the “Comparative Analysis Experiment Explorer”. While on

the tab with the overview use the keyboard combination of "Ctrl + Alt + Shift + R". This will generate a txt file

suitable for importing in R - program. Save the data at the same folder location as your test data!

When asked to create an R-script, answer “Yes”.

Page 10: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

10

Now you will be asked to select the file that you want to use as training data. Select the training data txt file

where you have just added the classification and click on “Open”.

Please note: the R codes needed to train and make the calls are now copied to your clipboard. This is done

so that you do not need to type in all the codes that direct the program to all the relevant file locations. If

you want to use this option, you will need to open the R program directly after and paste the content of your

clipboard in the “R console” as explained in the next step. The R codes will also be saved in a txt file that will

be saved at the same location.

Page 11: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

11

Step 9. Call your data in R

Open RGui and paste the content of the clipboard in the R Console. Depending on the locations of the files

your R code will look something like this:

thesource("http://bioconductor.org/biocLite.R")

biocLite("pamr")

library (pamr)

pamrB1excel<-pamr.from.excel('C:/PAM/p376 0911 trainingset.txt', 52, sample.labels=TRUE,

batch.labels= FALSE)

pamr_b1_vs_spor.train <- pamr.train(pamrB1excel)

pamrB1exceltest<-pamr.from.excel('C:/PAM/p376 0911 GM.txt', 27, sample.labels=TRUE,

batch.labels= FALSE)

test_predict<-pamr.predict(pamr_b1_vs_spor.train, pamrB1exceltest$x, threshold=0)

table( pamrB1exceltest$y,test_predict)

test_predict<-pamr.predict(pamr_b1_vs_spor.train, pamrB1exceltest$x, threshold=0, type=

"posterior")

test_predict

data.frame(SampleID=pamrB1exceltest$samplelabels, test_predict)

write.table(data.frame(sample=pamrB1exceltest$samplelabels, test_predict), sep="\t",

row.names=F, file='C:/PAM/OUTPUT FILE.txt')

----------------------------------------------------------------------------------------------

In case you want to type in the R codes yourself, you will need to replace the file locations with the correct

information. Please note: depending on the version that you used for installation the PAM code may works

directly. It is also possible that you receive an error message indicating a missing package.

Page 12: 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material

12

If you receive the error message:

Error: could not find function "pamr.from.excel"; pamr_b1_vs_spor.train <- pamr.train(pamrB1excel); Error:

could not find function "pamr.train"; Error: could not find function "pamr.from.excel"

Then you probably miss the package for R, please see step 4 on how to install the pamr package.

Output file

Your output will look something like this. Please first use an experiment with samples that are known and

classified to validate the method works! This output will also be available as a file in the same folder with all

the other data. This file will be called: OUTPUT FILE.txt, calls in these files are shown as P-values. The cut-off

value to classify a sample as ‘BRCA1-like’ should be set at 0.5. Below this score, a sample should be classified

as ‘non-BRCA1-like’ (Lips E. et al. 2011 Breast Cancer Res. 13(5):R107).