A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences...
-
Upload
joshua-reed -
Category
Documents
-
view
220 -
download
0
Transcript of A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences...
![Page 1: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/1.jpg)
A statistical framework for genomic data fusion
William Stafford NobleDepartment of Genome Sciences
Department of Computer Science and Engineering
University of Washington
![Page 2: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/2.jpg)
Outline
• Recognizing correctly identified peptides
• The support vector machine algorithm
• Experimental results
• Yeast protein classification
• SVM learning from heterogeneous data
• Results
![Page 3: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/3.jpg)
Recognizing correctly identified peptides
![Page 4: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/4.jpg)
Database searchProtein sample
Tandem mass spectrometer
Search algorithm
Observed spectra
Predicted peptides
Sequence database
![Page 5: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/5.jpg)
The learning task
• We are given paired observed and theoretical spectra.
• Question: Is the pairing correct?
Observed Theoretical
![Page 6: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/6.jpg)
Properties of the observed spectrum
1. Total peptide mass. Too small yields little information; too large (>25 amino acids) yields uneven fragmentation.
2. Charge (+1, +2 or +3). Provides some evidence about amino acid composition.
3. Total ion current. Proportional to the amount of peptide present.
4. Peak count. Small indicates poor fragmentation; large indicates noise.
![Page 7: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/7.jpg)
Observed vs. theoretical spectra
5. Mass difference.
6. Percent of ions matched. Number of matched ions / total number of ions.
7. Percent of peaks matched. Number of matched peaks / total number of peaks.
8. Percent of peptide fragment ion current matched. Total intensity of matched peaks / total intensity of all peaks.
9. Preliminary SEQUEST score (Sp).
10. Preliminary score rank.
11. SEQUEST cross-correlation (XCorr).
![Page 8: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/8.jpg)
Top-ranked vs. second-ranked peptides
12.Change in cross-correlation. Compute the difference in XCorr for the top-ranked and second-ranked peptide.
13.Percent sequence identity. Usually anti-correlated with change in cross-correlation.
![Page 9: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/9.jpg)
Pos
itiv
e ex
ampl
es
Neg
ativ
e ex
ampl
es
![Page 10: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/10.jpg)
The support vector machine algorithm
![Page 11: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/11.jpg)
SVMs in computational biology
• Splice site recognition• Protein sequence
similarity detection• Protein functional
classification• Regulatory module
search
• Protein-protein interaction prediction
• Gene functional classification from microarray data
• Cancer classification from microarray data
![Page 12: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/12.jpg)
Support vector machine
![Page 13: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/13.jpg)
Support vector machine
++
+
+ +
+
+
+ +
+ +
+-- -
-
-
-
-
--
-
--
-+
+
-
--
Locate a plane that separates
positive from negative
examples.
Focus on the examples closest to the boundary.
![Page 14: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/14.jpg)
Kernel matrix
![Page 15: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/15.jpg)
31, YXYXK
![Page 16: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/16.jpg)
2
2
2exp,
YX
YXK
![Page 17: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/17.jpg)
Kernel functions
• Let X be a finite input space.• A kernel is a function K, such that for all x, z
X, K(x, z) = (x) · (y), where is a mapping from X to an (inner product) feature space F.
• Let K(x,z) be a symmetric function on X. Then K(x,z) is a kernel function if and only if the matrix
is positive semi-definite.
njijiK
1,,
xxK
![Page 18: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/18.jpg)
Peptide ID kernel function
• Let p(x,y) be the function that computes a 13-element vector of parameters for a pair of spectra, x and y.
• The kernel function K operates on pairs of observed and theoretical spectra:
21,,
,,,:,:
Bt
Bo
At
Ao
Bt
Bo
At
Ao
Bt
Bo
At
Ao
SSpSSp
SSpSSpKSSSSK
![Page 19: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/19.jpg)
Experimental results
![Page 20: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/20.jpg)
Experimental design
• Data consists of one 13-element vector per predicted peptide.
• Each feature is normalized to sum to 1.0 across all examples.
• The SVM is tested using leave-one-out cross-validation.
• The SVM uses a second-degree polynomial, normalized kernel with a 2-norm asymmetric soft margin.
![Page 21: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/21.jpg)
Three data sets
• Set 1: Ion trap mass spectrometer. Sequest search on the full non-redundant database.
• Set 2: Ion trap mass spectrometer. Sequest search on human NRDB.
• Set 3: Quadrupole time-of-flight mass spectrometer. Sequence search on human NRDB.
![Page 22: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/22.jpg)
Data set sizes
15405231017QTOF HNRDB
1161465696Ion-trap HNRDB
976479497Ion-trap NRDB
TotalNegativePositive
![Page 23: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/23.jpg)
![Page 24: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/24.jpg)
![Page 25: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/25.jpg)
0.940.95
0.99
![Page 26: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/26.jpg)
(18,966) (126,931)
(57,732)
(108,936)(27,936)
![Page 27: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/27.jpg)
![Page 28: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/28.jpg)
![Page 29: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/29.jpg)
Conversion to probabilities
• Hold out a subset of the training examples.
• Use the hold-out set to fit a sigmoid.
• This is equivalent to assuming that the SVM output is proportional to the log-odds of a positive example.
BAfefy
1
11Pr
y = labelf = discriminant
![Page 30: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/30.jpg)
114.015.111
Pr xe
x
![Page 31: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/31.jpg)
Yeast protein classification
![Page 32: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/32.jpg)
Membrane proteins
• Membrane proteins anchor in a cellular membrane (plasma, ER, golgi, mitochondrial).
• Communicate across membrane.
• Pass through the membrane several times.
![Page 33: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/33.jpg)
mRNA expression
dataprotein-protein interaction data
sequence data
Heterogeneous data
![Page 34: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/34.jpg)
Vector representation
• Each matrix entry is an mRNA expression measurement.
• Each column is an experiment.
• Each row corresponds to a gene.
![Page 35: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/35.jpg)
2
2
2exp,
YX
YXK
![Page 36: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/36.jpg)
>ICYA_MANSEGDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKYDGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVNLVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH
>LACB_BOVINMKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKIDALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI
Sequence kernels
• We cannot compute a scalar product on a pair of variable-length, discrete strings.
![Page 37: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/37.jpg)
Pairwise comparison kernel
![Page 38: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/38.jpg)
Pairwise comparison kernel
![Page 39: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/39.jpg)
Pairwise kernel variants
• Smith-Waterman all-vs-all
• BLAST all-vs-all• Smith-Waterman
w.r.t. SCOP database• E-values from Pfam
database
![Page 40: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/40.jpg)
Protein-protein interactions
• Pairwise interactions can be represented as a graph or a matrix.1 0 0 1 0 1 0
11 0 1 0 1 1 0 10 0 0 0 1 1 0 00 0 1 0 1 1 0 10 0 1 0 1 0 0 11 0 0 0 0 0 0 10 0 1 0 1 0 0 0
protein
protein
![Page 41: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/41.jpg)
Linear interaction kernel
• The simplest kernel counts the number of interactions between each pair.
1 0 0 1 0 1 0 11 0 1 0 1 1 0 10 0 0 0 1 1 0 00 0 1 0 1 1 0 10 0 1 0 1 0 0 11 0 0 0 0 0 0 10 0 1 0 1 0 0 0
3
![Page 42: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/42.jpg)
Diffusion kernel
• A general method for establishing similarities between nodes of a graph.
• Based upon a random walk.• Efficiently accounts for all paths connecting two
nodes, weighted by path lengths.
![Page 43: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/43.jpg)
Hydrophobicity profile
• Transmembrane regions are typically hydrophobic, and vice versa.
• The hydrophobicity profile of a membrane protein is evolutionarily conserved.
Membrane Membrane proteinprotein
Non-membrane Non-membrane proteinprotein
![Page 44: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/44.jpg)
Hydrophobicity kernel
• Generate hydropathy profile from amino acid sequence using Kyte-Doolittle index.
• Prefilter the profiles.• Compare two profiles by
– Computing fast Fourier transform (FFT), and– Applying Gaussian kernel function.
• This kernel detects periodicities in the hydrophobicity profile.
![Page 45: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/45.jpg)
SVM learning from heterogeneous data
![Page 46: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/46.jpg)
Combining kernels
Identical
A B
K(A) K(B)
K(A)+K(B)
A B
A:B
K(A:B)
![Page 47: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/47.jpg)
Semidefinite programming
• Define a convex cost function to assess the quality of a kernel matrix.
• Semidefinite programming (SDP) optimizes convex cost functions over the convex cone of positive semidefinite matrices.
![Page 48: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/48.jpg)
Learn K from the convex cone Learn K from the convex cone of positive-semidefinite of positive-semidefinite matrices or a convex subset of matrices or a convex subset of it :it :
According to a convex According to a convex quality measure:quality measure:
Integrate constructed Integrate constructed kernelskernels
Learn a linear mix
Large margin classifier Large margin classifier (SVM)(SVM)
Maximize the margin
SDPSDP
Semidefinite programming
i
iiKK
![Page 49: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/49.jpg)
i
iiKK
Integrate constructed Integrate constructed kernelskernels
Learn a linear mix
Large margin classifier Large margin classifier (SVM)(SVM)
Maximize the margin
![Page 50: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/50.jpg)
Experimental results
![Page 51: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/51.jpg)
Seven yeast kernels
Kernel Data Similarity measure
KSW protein sequence Smith-Waterman
KB protein sequence BLAST
KHMM protein sequence Pfam HMM
KFFT hydropathy profile FFT
KLI protein interactions
linear kernel
KD protein interactions
diffusion kernel
KE gene expression radial basis kernel
![Page 52: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/52.jpg)
Membrane proteins
![Page 53: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/53.jpg)
Comparison of performance
Simple rules fromhydrophobicity profile
TMHMM
![Page 54: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/54.jpg)
Cytoplasmic ribosomal proteins
![Page 55: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/55.jpg)
False negative predictions
![Page 56: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/56.jpg)
False negative expression profiles
![Page 57: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/57.jpg)
![Page 58: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/58.jpg)
Markov Random Field
• General Bayesian method, applied by Deng et al. to yeast functional classification.
• Used five different types of data.
• For their model, the input data must be binary.
• Reported improved accuracy compared to using any single data type.
![Page 59: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/59.jpg)
Yeast functional classesCategory SizeMetabolism 1048
Energy 242
Cell cycle & DNA processing 600
Transcription 753
Protein synthesis 335
Protein fate 578
Cellular transport 479
Cell rescue, defense 264
Interaction w/ evironment 193
Cell fate 411
Cellular organization 192
Transport facilitation 306
Other classes 81
![Page 60: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/60.jpg)
Six types of data
• Presence of Pfam domains.
• Genetic interactions from CYGD.
• Physical interactions from CYGD.
• Protein-protein interaction by TAP.
• mRNA expression profiles.
• (Smith-Waterman scores).
![Page 61: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/61.jpg)
Results
MRF
SDP/SVM(binary)
SDP/SVM(enriched)
![Page 62: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/62.jpg)
Many yeast kernels
• protein sequence• phylogenetic profiles• separate gene
expression kernels• time series
expression kernel• promoter regions
using seven aligned species
• protein localization• ChIP• protein-protein
interactions• yeast knockout
growth data• more ...
![Page 63: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/63.jpg)
Future work
• New kernel functions that incorporate domain knowledge.
• Better understanding of the semantics of kernel weights.
• Further investigation of yeast biology.
• Improved scalability of the algorithm.
• Prediction of protein-protein interactions.
![Page 64: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/64.jpg)
Acknowledgments
• Dave Anderson, University of Oregon
• Wei Wu, Genome Sciences, UW & FHCRC
• Michael Jordan, Statistics & EECS, UC Berkeley
• Laurent El Ghaoui, EECS, UC Berkeley
• Gert Lanckriet, EECS, UC Berkeley
• Nello Cristianini, Statistics, UC Davis
![Page 65: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/65.jpg)
Fisher criterion score
22
21
221
High scoreLow score
![Page 66: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/66.jpg)
Feature ranking
delta Cn 2.861% match total ion current 2.804Cn 2.444% match peaks 2.314Sp 1.158mass 0.704charge 0.488rank Sp 0.313peak count 0.209sequence similarity 0.115% ion match 0.079total ion current 0.026delta mass 0.024
![Page 67: A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.](https://reader035.fdocuments.in/reader035/viewer/2022062801/56649e3c5503460f94b2e4c9/html5/thumbnails/67.jpg)
Pairwise feature ranking
% match TIC-delta Cn 4.741% match peaks-delta Cn 4.233% match TIC-Cn 3.819delta Cn-Cn 3.597delta Cn-charge 3.563% match peaks-Cn 3.377delta Cn-mass 3.119% match TIC-% match peaks 2.823% ion match-delta Cn 2.812Sp-delta Cn 2.799% match TIC-Sp 2.579% match peaks-Sp 2.383
% match TIC-mass 2.097% ion match-mass 2.091Cn-charge 1.943Sp-mass 1.922% match TIC-charge 1.898Cn-mass 1.884Sp-Cn 1.881% ion match-Cn 1.827Sp-charge 1.770% match peaks-mass 1.668% match peaks-charge 1.528% match TIC-% ion match 1.473