2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material
-
Upload
josema-pereira -
Category
Documents
-
view
215 -
download
0
description
Transcript of 2014 How to Run PAM Using R in Combination With Coffalyser for P376 Customer Support Material
1
How to Run PAM analysis using R in combination
with Coffalyser.NET program
To classify a sample as BRCA1-like or non-BRCA1-like a classifier in the statistical programming language R
can be used. This classifier was developed using Prediction analysis for microarrays (PAM) () (Tibshirani R et
al. 2002, PNAS, 99:6567-72).
This is an approach to cancer class prediction from gene expression profiling, based on an enhancement of
the simple nearest prototype (centroid) classifier. The prototypes shrink and hence obtain a classifier that is
often more accurate than competing methods. The method of “nearest shrunken centroids” identifies
subsets of genes that best characterize each class. The technique is general and can be used in many other
classification problems. More information about this method can be found at:
http://statweb.stanford.edu/~tibs/SAM/
Contents How to Run PAM analysis using R in combination with Coffalyser.NET program............................................. 1
Coffalyser.Net - Support............................................................................................................................. 2
Contact us.................................................................................................................................................. 2
Using R scripts for calling your BRCA1ness classification with P376 ............................................................ 3
Step 1: Collect all relevant data / programs ............................................................................................ 3
Step 2: Windows regional settings ......................................................................................................... 3
Step 3: Install Coffalyser.Net and install R .............................................................................................. 3
Step 4: Install pamr package in R ............................................................................................................ 3
Create your training data file ..................................................................................................................... 4
Formats of export files ........................................................................................................................... 4
Step 5. Analyze your training data set .................................................................................................... 5
Step 6. Open the experiment explorer and export data in R format ........................................................ 6
Step 7. Add the classification to the txt training data .............................................................................. 7
Calling your unknown data......................................................................................................................... 9
Step 8. Analyze your test data ................................................................................................................ 9
Step 9. Call your data in R..................................................................................................................... 11
Output file ........................................................................................................................................... 12
2
Coffalyser.Net - Support
• Coffalyser.net Home Wordpress
http://coffalyser.wordpress.com/
• YouTube (flash instruction videos)
http://www.youtube.com/user/Coffalyser
• Registration page, click on login on the left side:
http://www.mlpa.com
• Wiki (our old home for support material)
http://wiki.coffalyser.net
• Publication with regard to analysis methods (open book)
http://www.intechopen.com/books/modern-approaches-to-quality-control/analysis-of-mlpa-data-using-novel-
software-coffalyser-net-by-mrc-holland
• MRC-Holland Main
http://www.mlpa.com
• Download R for your operating system at:
http://cran.r-project.org/bin/windows/base/
NOTE: we only tested version 2.15.1
Contact us
• MRC-Holland provides free support to all Coffalyser.Net users.
• For general MLPA related questions you can send an email to [email protected]
• For Coffalyser.Net related questions you can send an email to [email protected]
3
Using R scripts for calling your BRCA1ness classification with P376
Step 1: Collect all relevant data / programs
You need to have:
• The last version of Coffalyser.Net v.140425.1321 (www.mlpa.com)
• The R program version (versions 3.1.1, 3.0.3 and 2.15.1 had been tested at MRC-Holland)
(http://cran.r-project.org/bin/windows/base/old/)
• The training data in ABIF format (if not yet provided with this manual email to [email protected])
• The unknown data you are planning to call for BRCAness-like in ABIF format
Step 2: Windows regional settings
Please note that in our findings the method did not work unless your regional settings have a dot as the
decimal separation sign and a comma as the thousand separation sign. You can adjust these settings under:
Start Menu � Configuration Screen � Clock, Language and Region � Region and Language � On the tab
Formats � Additional Settings � Customize Format
Step 3: Install Coffalyser.Net and install R
Install Coffalyser.Net using the installation manual provided with the setup files. Also install one of in step 1
mentioned R versions for Windows according to the instruction on the screen.
Step 4: Install pamr package in R
You will need to install the R-package for PAM training and calling. From the menu bar click on “Packages”
and then select “Install package(s)”.
4
Next you will need to select a mirror. Select the closest mirror to your location. Now scroll through the list of
packages and select “pamr” and click on “OK”.
Create your training data file
Before you can classify your unknown data you will first need to make a training data set using a set of
samples of which the type is already known. MRC-Holland can provide a set of samples in ABIF format that
includes reference samples and test samples. Within the selection of test samples, there are sporadic tumors
and BRCA1ness tumors. These samples were analyzed using the P376 lot B2-0911 MLPA mix and the
fragment products were separated on an ABI-3130 XL genetic analyzer with a GS-500 LIZ size marker.
This training set can also be used when test samples are analysed with P376 lot B3-0414.
Formats of export files
Coffalyser.Net has a special export function that will export files to a format that can directly be accepted by
R. If you do not wish to use the export function, then please consult the manual of R and PAM in order to
create input files in the correct format.
Please note: Both data types (training and Unknown data) need to be normalised in the same way. Our
recommendation is to use Coffalyser.Net to normalise your data. However, if you are using the global
normalisation method described by the NKI (see detailed instructions Lips E. et al. 2011 Breast Cancer Res.
13(5):R107), you need to normalise your unknown data set in the very same way.
Mosaicism: In case your tumor samples contain normal cells then better results may be obtained by
changing the arbitrary borders to 0.85-1.2.
5
Step 5. Analyze your training data set
Open Coffalyser.Net and analyze the training data set provided by MRC-Holland according to the analysis
manual that can be found on the Coffalyser.Net home page. Use the reference samples and No DNA samples
as provided below.
Reference samples:
1. P376-B2-0911-NEW MB-NKI-REF1-CHE-1
2. P376-B2-0911-NEW MB-NKI-REF2-CHE-1
3. P376-B2-0911-NEW MB-NKI-REF3-CHE-2
4. P376-B2-0911-NEW MB-NKI-REF4-CHE-3
5. P376-B2-0911-NEW MB-NKI-REF5-CHE-3
6. P376-B2-0911-NEW MB-NKI-REF6-CHE-4
7. P376-B2-0911-NEW MB-NKI-REF7-CHE-5
8. P376-B2-0911-NEW MB-NKI-REF8-CHE-6
9. P376-B2-0911-NEW MB-NKI-REF9-CHE-7
10. P376-B2-0911-NEW MB-NKI-REF10-CHE-7
11. P376-B2-0911-NEW MB-NKI-REF11-CHE-8
12. P376-B2-0911-NEW MB-NKI-REF12-CHE-9
13. P376-B2-0911-NEW MB-NKI-REF13-CHE-10
14. P376-B2-0911-NEW MB-NKI-REF14-CHE-10
15. P376-B2-0911-NEW MB-NKI-REF15-CHE-11
16. P376-B2-0911-NEW MB-NKI-REF15-CHE-11-2
No DNA control samples:
1. P376-B2-0911-NEW MB-NKI-NODNA-CHE-5
2. P376-B2-0911-NEW MB-NKI-NODNA-CHE-10
Note: during the analysis we recommend to only include samples that have 100% score on the FMRS. So
only include samples that have 4 bars for the FMRS score in the comparative analysis. Also please note
that if you change analysis settings that you are consequent with these changes for both your test samples
and the training data set!
6
Step 6. Open the experiment explorer and export data in R format
Open the experiment results from the experiment analysis form.
In the ‘Comparative Analysis Experiment Explorer’ use the key combination of: Ctrl + Shift + Alt + R, this will
allow you to save the grid data to a specific txt file format that may be used for R.
Do not make the R-script file yet, you will need to use this option for your test data later.
7
Step 7. Add the classification to the txt training data
Open the trainings set data in Excel. In the first row you will see the sample names. You can make the names
easier recognizable by replacing P376-B2-0911-NEW MB-NKI- for nothing ( ) in the entire row. You can do
this by selecting the entire row and using the key combination “Ctrl-F”.
Now select all the columns that contain the reference samples (noted with “REF”) and remove these
columns from the worksheet. Also remove the column with the sample names: N20120-CHE-10, N16986-
CHE-10, B1022-CHE-6 and C020-CHE-7.
8
Now we need to add the classification for each sample in row 2, right underneath the sample name. You can
find the classification of all the samples of the training set in the table below. Please be sure to use the exact
classification names for all samples. If you accidently add a single symbol then this sample will be seen as a
new group.
Table 1: Classification of all samples in the training data
Sporadic_Like BRCA1_Like
2058 2131
2124 2165
2134 2224
2151 2254
2169 2312
2175 2355
2182 B1007
2188 B1035
2195 B1045
2204 B1049
2216 B1058
2227 B1061
2232 B1064
2234 B1065
2276 C119
2278 C121
2295 T4147
2298 T6701
2350
C035
C036
C044
C048
C065
C068
C127
C128
C129
C130
Now save the changes you made to the grid, be sure that you KEEP the format: Text (Tab delimited)!
9
Calling your unknown data
Step 8. Analyze your test data
Now analyze your unknown test data and open the “Comparative Analysis Experiment Explorer”. While on
the tab with the overview use the keyboard combination of "Ctrl + Alt + Shift + R". This will generate a txt file
suitable for importing in R - program. Save the data at the same folder location as your test data!
When asked to create an R-script, answer “Yes”.
10
Now you will be asked to select the file that you want to use as training data. Select the training data txt file
where you have just added the classification and click on “Open”.
Please note: the R codes needed to train and make the calls are now copied to your clipboard. This is done
so that you do not need to type in all the codes that direct the program to all the relevant file locations. If
you want to use this option, you will need to open the R program directly after and paste the content of your
clipboard in the “R console” as explained in the next step. The R codes will also be saved in a txt file that will
be saved at the same location.
11
Step 9. Call your data in R
Open RGui and paste the content of the clipboard in the R Console. Depending on the locations of the files
your R code will look something like this:
thesource("http://bioconductor.org/biocLite.R")
biocLite("pamr")
library (pamr)
pamrB1excel<-pamr.from.excel('C:/PAM/p376 0911 trainingset.txt', 52, sample.labels=TRUE,
batch.labels= FALSE)
pamr_b1_vs_spor.train <- pamr.train(pamrB1excel)
pamrB1exceltest<-pamr.from.excel('C:/PAM/p376 0911 GM.txt', 27, sample.labels=TRUE,
batch.labels= FALSE)
test_predict<-pamr.predict(pamr_b1_vs_spor.train, pamrB1exceltest$x, threshold=0)
table( pamrB1exceltest$y,test_predict)
test_predict<-pamr.predict(pamr_b1_vs_spor.train, pamrB1exceltest$x, threshold=0, type=
"posterior")
test_predict
data.frame(SampleID=pamrB1exceltest$samplelabels, test_predict)
write.table(data.frame(sample=pamrB1exceltest$samplelabels, test_predict), sep="\t",
row.names=F, file='C:/PAM/OUTPUT FILE.txt')
----------------------------------------------------------------------------------------------
In case you want to type in the R codes yourself, you will need to replace the file locations with the correct
information. Please note: depending on the version that you used for installation the PAM code may works
directly. It is also possible that you receive an error message indicating a missing package.
12
If you receive the error message:
Error: could not find function "pamr.from.excel"; pamr_b1_vs_spor.train <- pamr.train(pamrB1excel); Error:
could not find function "pamr.train"; Error: could not find function "pamr.from.excel"
Then you probably miss the package for R, please see step 4 on how to install the pamr package.
Output file
Your output will look something like this. Please first use an experiment with samples that are known and
classified to validate the method works! This output will also be available as a file in the same folder with all
the other data. This file will be called: OUTPUT FILE.txt, calls in these files are shown as P-values. The cut-off
value to classify a sample as ‘BRCA1-like’ should be set at 0.5. Below this score, a sample should be classified
as ‘non-BRCA1-like’ (Lips E. et al. 2011 Breast Cancer Res. 13(5):R107).