Plaid Models and Microarrays - Stanford...
-
Upload
hoangnguyet -
Category
Documents
-
view
224 -
download
0
Transcript of Plaid Models and Microarrays - Stanford...
Plaid Models 1�
�
�
�
Plaid Models
and Microarrays
Laura Lazzeroni Art Owen
Stanford University Stanford University
[email protected] [email protected]
May 30, 2000
Sequoia Hall 200
Plaid Models 2�
�
�
�
Expression dataFor Genes � � �� � � � � �
And Samples � � �� � � � � �
��� measures expression level
Samples
From different organs/individuals/times, etc.
Genes
Many or all of the organism’s genes
Expression
Activity level of gene � in sample �
Starting points
1. Eisen, Spellman, Brown, Botstein: PNAS (1998)
2. Hastie, Tibshirani, Eisen, Brown, Ross, Scherf,
Weinstein, Alizadeh, Staudt, Botstein: S.U. Tech.
Report (2000)
Plaid Models 3�
�
�
�
Transposable Data
Observations Variables Data Dimension
Genes Samples � �� �
Samples Genes � � �� �
Movies Viewers � �� �
Viewers Movies � � �� �
Words Documents � �� �
Documents Words � � �� �
Good statistics problems:
1. (When) should model be symmetric?
2. Can’t have both ����� and �����
3. How best to bootstrap?
4. How to use row/col specific covariates?
Plaid Models 4�
�
�
�
Managing the data� is usually large
� can be large
� � � graphical methods
Eisen et. al. (1998) data
���� yeast genes
�� samples
multiple experiments
Plaid Models 5�
�
�
�
Modeling ApproachTry:
����� � �
�����
�� ��� �
�� � �� ��
��� � �� ��
Interpretation:
� is a background level
There are � “layers”, with levels �
�� for gene membership
��� for sample membership
� , for upregulation
� � , for downregulation
Plaid Models 6�
�
�
�
Bigger models
����� � �
�����
������
����� � �
�����
������� � ���
�
����� � �
�����
������� � ���
�
����� � �
�����
������� � ��� � ���
�
Subject to
�����
����� � � ��
�����
������ � � ��
Anova-lets, but without the orthogonality
Plaid Models 7�
�
�
�
Geometry of a layerInclude ��� but not ���
Drop subscript k
Let
� ������
�
� ���
���
��
� � � �� � ��
Like a cluster of � genes around � � ��
� � � some, maybe not all, genes
� � � some, maybe not all, samples
� �� gives an “expression pattern”
Importance of sample � given by � ��
Adding layers: lets genes be in multiple clusters
Converse: get cluster of samples wrt some genes
Plaid Models 8�
�
�
�
More geometryConsider � �� � ��
Genes � cluster around a line through
�� � � �� � ��
Samples � cluster around a line through
� � � �� � ��
Most important/typical genes: large � ��
Plaid Models 9�
�
�
�
Even Bigger models1. Write
� ������
������� � ��� � ��� � �������
�
2. Incorporates Tukey’s 1 df for non-additivity
3. Clusters genes around a more general line in ��
4. We can mix/match layer types.
5. We can replace the background � by a model
layer.
Plaid Models 10�
�
�
�
SVD and others
�����
�����
��������
Method �� ��� ��� Also:
SVD � � � �������� � ���
SDD � ���� ����
NND � ��� ���
VQ � �� �� ��
� ��� � �
VQ � � �� ���
� ��� � �
Shave � ���� �
ADDCL � �� �� �� �� � � �
Plaid �� �� �� ��
Plaid replaces �� by a model
Plaid Models 11�
�
�
�
AlgorithmSeek small value of
�����
�����
���� �
�����
���������
��
Where
��� ��� � �� ��� and,
���� � �
�� � � ���
�� � � ���
�� � � ��� � ���
1. Likely to be NP-hard � � � even clustering is
2. We pick one layer at a time � � �
3. � � � using an interior point algorithm
4. Larger clusters are more attractive
5. Clusters near background not attractive
Plaid Models 12�
�
�
�
Finding one layer
Residual:
��� � ��� ��������
���������
Drop � and write:
� ��
�
��
��
���� � ������
��
We want to min � over �, , �
1. Start with arbitrary �� �� � �� �
2. Update ��� given � and ��
3. Update � and �� given ���
Alternate 2, 3 above, but:
1. Keep �, �� away from and � early on
2. Force �, �� to or � later
Plaid Models 13�
�
�
�
Fuzzy anovaMinimize:
�
�
��
��
���� � ���
�� �� � ��
���
Subject to:
���
���� ���
�����
By taking:
�
��
�� ��������
� ��
���� �
��
�
�� �
������ � ��� ��
��
� ���
�� �
������ � ��� �
���
� ��
Plaid Models 14�
�
�
�
Updating �� and ��
Minimize:
�
�
��
��
���� � ���
�� �� � ��
���
Let:
��� � � �� � ��
� �
�� ���������� �
����
��
�� �
�� ��������� �
���
��
Notes
1. The �� update only uses gene �’s data
2. The � update only uses gene �’s data
3. Avoids ���� costs
4. Similarly for �� , �� and ����
Plaid Models 15�
�
�
�
Some detailsStarting values SVD finds a -only plaid layer.
Rescale singular vectors to start and �
Backfitting Given ��, ��� � �� �� for
� � �� � � � �� it is cheap to re-estimate all the ��� .
Choosing K Permute row contents, then columns.
Stop if the algorithm finds more structure in the
permuted data. Negative binomial regularization.
Stepping Use� � steps to get �, �� into �� ��.
Unisign We may want a common sign for � ��.
Robustness Inspect each new found layer: release
any rows or columns not well explained.
Plaid Models 16�
�
�
�
Food data� � ��� foods
� � � measures:
1. Fat proportion
2. Saturated fat proportion
3. Calories per gram
4. Cholesterol proportion� �
5. Protein proportion
6. Carbohydrate proportion
For each column: subtract mean, divide by st.dev.
Source:
http://www.ntwrks.com/ mikev/chart1.html
Plaid Models 17�
�
�
�
Yeast dataName Samples
Alpha 1–18
Elutriation 19–32
CDC 33-47
Sporulation 48–53
Sporulation-5 54–56
Sporulation- 57–58
Heat Shock 59–64
DTT 65–68
Cold 69–72
Diauxic Shift 73–79
Eisen, Spellman, Brown, Botstein: PNAS (1998)
Plaid Models 18�
�
�
�
Data Analysis� Analyze log expression
� Few missing values: imputed by additive model
� Background Layer
– Full model � �� � ��
– All genes � � ����
– All samples �� � ����
� Mine the interaction, with up to � layers
– unisign
– 50% threshold
– � permutations per round
� Search stopped at �� layers: ��th had zero genes
Plaid Models 19�
�
�
�
Future directions� Refine existing algorithm
� Explore information retrieval applications
� Explore recommender system applications
� Find less greedy version
� Incorporate predictors
� Extend to higher way tables of data
� Larger data sets (99% missing)
� Use covariates
� Are there “plaid-lets”?
Code
Available for academic research
www-stat.stanford.edu/ owen/clickwrap/plaid.html
Plaid Models 20�
�
�
�
RefinementsReplace
minimize�
�
��
��
���� � ���
�� �� � ��
���
s.t. ���
���� ���
�����
�� �� � �
By:
minimize�
�
��
��
���� �
�� �� � ��
���
��
�
��
��
������� ���
s.t. ���
��� ���
����
�� �� � �� ��
Plaid Models 21�
�
�
�
Updates become:Model parts
�
��
�� �������
�
�� ���
�� �
�� ������ � �
� ��
�� �
�� ����� � �
� �
Memberships
� � � iff��
������� � ���
� � ����
��
�� � � iff��
������ � ���
� � ����
��
So far:
1. Seems to find slightly better layers2. Harder to frame multi-layer model