SELDI-TOF Mass Spectrometry Protein Data

37
SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La

Transcript of SELDI-TOF Mass Spectrometry Protein Data

Page 1: SELDI-TOF Mass Spectrometry Protein Data

SELDI-TOF Mass Spectrometry Protein DataBy Huong Thi Dieu La

Page 2: SELDI-TOF Mass Spectrometry Protein Data

References

Alejandro Cruz-Marcelo, Rudy Guerra, Marina Vannucci, Yiting Li, Ching C. Lau, and Tsz-Kwong Man. Comparison of algorithms for pre-processing of SELDI-TOF mass spectrometry data. Bioinformatics, 24(19):2129–2136, 2008. Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael

Irizarry, and Sandrine Dudoit. Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health). Springer Science and Business Media, Inc, New York, first edition edition, 2005. Haleem J. Issaq, Timothy D. Veenstra, Thomas P. Conrads, , and

Donna Felschow. The SELDI-TOF MS approach to proteomics: Protein profiling and biomarker identification. Biochemical and Biophysical Research Communications, 292:587–592, 2002.

Page 3: SELDI-TOF Mass Spectrometry Protein Data

SELDI-TOF-MS

Surface Enhanced Laser Desorption/Ionization Time-of-Flight Mass SpectrometryUsed to profile protein markers from tissue or bodily fluids and thus identify biomarkers that can aid in diagnosis, prognosis or treatment.Application: psychiatric disease, renal function, cancer (pancreatic, prostate, ovarian, and breast)

Page 4: SELDI-TOF Mass Spectrometry Protein Data

SELDI-TOF-MS Components

• ProteinChip array– Retain specific proteins from the sample

• Reader– Measures the molecular weights of the retained

proteins and generates a trace showing the relative abundance vs. the molecular weights of these proteins

• Software– Identify differences in protein abundances

between two samples

Page 5: SELDI-TOF Mass Spectrometry Protein Data

Source:http://www.rci.rutgers.edu/~layla/AnalMedChem511/pdf_files/RB_pdf/403feature_issaq.pdf

Page 6: SELDI-TOF Mass Spectrometry Protein Data

Preparation

• Biological samples are processed via fractionation.

• Fractionation: the process of splitting the original sample into subsamples which contain proteins that are more homogeneous

Page 7: SELDI-TOF Mass Spectrometry Protein Data

Source:http://urology.jhu.edu/research/img1/proteomics13.jpg

EAM: Energy Absorbing Molecule

Page 8: SELDI-TOF Mass Spectrometry Protein Data

Preprocessing of MS data

Alignment of the spectraFiltering (Denoising)Baseline subtractionNormalizationPeak DetectionClustering of peaksPeak quantification

Page 9: SELDI-TOF Mass Spectrometry Protein Data

SELDI-TOF-MS softwares

• ProteinChip Software 3.1• SpecAlign• Cromwell• PROcess• MassSpecWAvelet

Page 10: SELDI-TOF Mass Spectrometry Protein Data

PROcess package

• Process a single spectrum• Process a set of spectra

Page 11: SELDI-TOF Mass Spectrometry Protein Data

Process a single spectrum

• Baseline subtraction• Peak detection

Page 12: SELDI-TOF Mass Spectrometry Protein Data

Baseline subtractionPurpose: To level off the elevated, non-constant baseline caused by the chemical noise in the EAM and by ion overload, thus, make different spectra compatible.Solution: Using local regression to estimate the bottom of a spectrum and then subtracting that estimate from a spectrumTwo approaches: Fitting local regression to:

The points below a certain quantileLocal minima: yields better results when estimating the baseline.

Page 13: SELDI-TOF Mass Spectrometry Protein Data

Baseline Subtraction

Page 14: SELDI-TOF Mass Spectrometry Protein Data

Baseline subtraction: algorithm

For each spectrum, find local minima by segmenting the m/z range.Fit a local regression to local minima for each spectrumSubtract the estimated baseline from each spectrum

Page 15: SELDI-TOF Mass Spectrometry Protein Data

### Load libraries

library(survival)

library(Icens)

library(PROcess)

### Read in the raw spectrum

fdat <- system.file("Test", package="PROcess")

fs <- list.files(fdat, pattern = "\\.*csv\\.*", full.names=TRUE)

f1 <- read.files(fs[1])

### Plot the raw spectrum

jpeg("f1.jpeg", width=480, height=480)

plot(f1, type="l", xlab="m/z")

title(basename(fs[1]))

dev.off()

### Remove the baseline

jpeg("f2.jpeg", width=480, height=480)

bseoff <- bslnoff(f1, method="loess", bw=0.1, xlab="m/z", plot=TRUE)

title(basename(fs[1]))

dev.off()

Page 16: SELDI-TOF Mass Spectrometry Protein Data

Peak detection

• Purpose: To detect peaks that represent the set of proteins that are differentially expressed between different samples.

Page 17: SELDI-TOF Mass Spectrometry Protein Data

Peak Detection: algorithm

Smooth the spectrum using moving averages of ksnearest neighborsCompute local variability as the median of the absolute deviations of kv nearest neighbors.Identify local maxima of the smoothed spectrum using three thresholds:

The signal to noise ratio: local smooth/local variabilityThe detection threshold for the whole spectrumThe shape ratio: the area under the curve within a small distance of a peak candidate/ maximum of all such peak areas of a spectrum

Page 18: SELDI-TOF Mass Spectrometry Protein Data

### Peak detection

jpeg("f3.jpeg", width=480, height=480)

pkgobj <- isPeak(bseoff, span=81, sm.span=11, plot=TRUE, zerothrsh=2, area.w=0.003, ratio=0.2)

dev.off()

### Inspect peaks in a particular range of m/z values

jpeg("f4.jpeg", width=480, height=480)

specZoom(pkgobj, xlim=c(5000,10000))

dev.off()

Page 19: SELDI-TOF Mass Spectrometry Protein Data

Peak detection

Page 20: SELDI-TOF Mass Spectrometry Protein Data

Processing a set of calibration spectra

• Apply baseline subtraction• Normalize spectra• Cutoff selection• Identify peaks• Quality assessment• Get proto-biomarkers

Page 21: SELDI-TOF Mass Spectrometry Protein Data

Example Data Set

• A set of 8 spectra from a calibration data set– Same 5 proteins are present in the sample:

1084, 1638, 3496, 5807, 7034 amu

Page 22: SELDI-TOF Mass Spectrometry Protein Data

### Read in the 8 spectra

amu.cali <- c(1084,1638,3496,5807,7034)

### Plot 8 spectra and mark the protein positions by red vertical lines for each of them

jpeg("f5.jpeg", width=1080, height=560)

par(mfrow=c(2,4))

plotCali <- function(f, main, lab.cali){

x <- read.files(f)

plot(x,main=main, ylim=c(0,max(x[,2])), type="n")

abline(h=0, col="gray")

abline(v=amu.cali, col="salmon")

if(lab.cali)

axis(3, at=amu.cali, labels=amu.cali,

las=3, tick=FALSE, col="salmon", cex.axis=0.94)

lines(x)

return(invisible(x))

}

dir.cali <- system.file("calibration", package="PROcess")

files <- dir(dir.cali, full.names=TRUE)

i <- seq(along=files)

mapply(plotCali, files, LETTERS[i], i <=2)

dev.off()

Page 23: SELDI-TOF Mass Spectrometry Protein Data
Page 24: SELDI-TOF Mass Spectrometry Protein Data

Baseline subtraction

• Similar to baseline subtraction for a single spectra

• R code:Mcal <- rmBaseline(dir.cali, plot=TRUE)

head(Mcal)

060503peptidecalib_1_128.csv 060503peptidecalib_1_16.csv

3.6385 0.7253853 0.7485778

3.6458 0.6859291 0.6960419

3.65287 0.6856960 0.7088729

3.65972 0.6985420 0.7249795

3.66635 0.6885195 0.6953421

3.67276 0.6752363 0.6885879

Page 25: SELDI-TOF Mass Spectrometry Protein Data

Normalize SpectraPurpose: reduce variation due to experimental noiseTotal ion normalization:

Calculate each spectrum's area under the curve (AUC) for m/z values greater than the selected cutoffScale all spectra to the median AUCAssumptions:

• The number of proteins being over-expressed is approximately equal to the number of proteins being under-expressed.

• The number of proteins whose expression levels change is small relative to the total number of proteins bound to the protein array surface

Page 26: SELDI-TOF Mass Spectrometry Protein Data

Cutoff selectionChoose a cutoff point such that the magnitude of the noise is relatively stable above that point.Algorithm for a single cutoff point:

– Baseline-subtracted spectra within the group are normalized to the median of the sums of intensities of spectra

– The standard deviation of intensities at each m/z value is calculated

– The mean of those standard deviations is computed.

Repeat for different cutoff points and Plot average standard deviations vs. cutoff points.

Page 27: SELDI-TOF Mass Spectrometry Protein Data

### Cutoff selection

cts <- round(10^(seq(2,4,length=14)))

sdsFirst <- sapply(cts, avesd, Ma=Mcal)

jpeg("f6.jpeg", width=480, height=480)

par(mfrow=c(1,1))

plot(cts, sdsFirst, xlab="cutpoint", pch=21, bg="red", log="x", ylab="average sd")

dev.off()

### Normalize spectra- cutoff point m/z=400

M.r <- renorm(Mcal, cutoff=400)

Page 28: SELDI-TOF Mass Spectrometry Protein Data

Identify Peaks

• Similar to peak detection for a single baseline-adjusted spectrum

• R Code### Identify peaks

peakfile <- "calipeak.csv"

getPeaks(M.r, peakfile, ratio=0.1)

Page 29: SELDI-TOF Mass Spectrometry Protein Data

Quality Assessment• Purpose: Identify and eliminate spectra of

poor quality• Based on 3 parameters:

– Quality: measure of separation of signal from noise

– Retain: the number of high peaks in a single spectrum

– Peak: the number of peaks in a spectrum relative to the average number of peaks of the whole set of spectra being considered

• Poor quality spectra: Quality < 0.4, Retain < 0.1, Peak <0.5.

Page 30: SELDI-TOF Mass Spectrometry Protein Data

Quality assessment: algorithm

Estimate the noise by subtracting from each spectrum its moving average with a window size of 5 points.Calculate the noise envelope as 3 times the standard deviation of the noise in a 250 point window.Calculate the area under each spectrum A0

Calculate the area after subtracting the noise envelope from the spectrum A1

Obtain Quality, Retain, and Peak

Page 31: SELDI-TOF Mass Spectrometry Protein Data

Quality assessment: algorithm

• Quality: A1/A0• Retain: the number of points with height

greater than 5 times noise envelope/ the total numbrer of points in the spectrum

• Peak: the number of peaks in each spectrum detected/ the average number of peaks for all spectra in a run

Page 32: SELDI-TOF Mass Spectrometry Protein Data

qualRes <- quality(M.r, peakfile, cutoff=400)

QualRes

Quality Retain peak

060503peptidecalib_1_128.csv 0.4144087 0.1710994 0.9696970

060503peptidecalib_1_16.csv 0.4558286 0.1406047 0.9696970

060503peptidecalib_1_2.csv 0.4971926 0.1178203 0.9696970

060503peptidecalib_1_256.csv 0.4095177 0.1778567 0.7272727

060503peptidecalib_1_32.csv 0.3556932 0.1297756 0.9696970

060503peptidecalib_1_4.csv 0.5220848 0.1432037 1.2121212

060503peptidecalib_1_64.csv 0.4790304 0.1430304 1.2121212

060503peptidecalib_1_8.csv 0.4174718 0.1201594 0.9696970

Page 33: SELDI-TOF Mass Spectrometry Protein Data

Get Proto-biomarkers• Peak alignment: peaks across spectra that are

likely to represent the same protein.• Proto-biomarkers: peaks aligned across

spectra• To obtain a proto-biomarker:

• Generate an interval around each peak that is centered at the m/z value for the peak (0.3%)

• Determine which actual peaks are represented by a proto-biomarker

• Use the maximum value as the height of that proto-biomarker

Page 34: SELDI-TOF Mass Spectrometry Protein Data

### Get proto-biomarkers

bmkfile <- "calibmk.csv"

bmk1 <- pk2bmkr(peakfile, M.r, bmkfile, p.fltr=0.5)

mk1 <- round(as.numeric(gsub("M", "", names(bmk1))))

mk1 ### [1] 2906 3498 5812 7036

jpeg("f7.jpeg", width=1080, height=560)

par(mfrow=c(2,4))

plotCali2 <- function(...){

x <- plotCali(...)

lines(x[,1]*2, x[,2]+25, col="blue")

}

mapply(plotCali2, files, LETTERS[i], i <=2)

dev.off()

Page 35: SELDI-TOF Mass Spectrometry Protein Data
Page 36: SELDI-TOF Mass Spectrometry Protein Data

Analyze the result

• 5 known proteins: 1084, 1638, 3496, 5807, 7034

• Obtained 4 proto-biomarkers: 2906, 3498, 5812, and 7036

• Within 0.3% of m/z values of known proteins: 3498, 5812, and 7036

• Result of larger proteins with two charges: 2x2906 (5807) and 2x3496 (7034)

• Failed to detect peaks at m/z=1084 and 1638

Page 37: SELDI-TOF Mass Spectrometry Protein Data

Summary

• PROcess package:– Process SELDI-TOF-MS data– Advantage: produce more producible results

regarding peak quantification– Limitation: The results were not homogeneous

across laser intensities