Yuval Hart, Weizmann 2010© 1 Introduction to Matlab & Data Analysis Final Project: That’s all,...
-
Upload
tobias-newman -
Category
Documents
-
view
216 -
download
1
Transcript of Yuval Hart, Weizmann 2010© 1 Introduction to Matlab & Data Analysis Final Project: That’s all,...
Yuval Hart, Weizmann 2010© 1
Introduction to Matlab & Data Analysis
Final Project: That’s all, Folks!
2
Outline
Parsing files Efficient programming - vectorization Correlation coefficients Passing extra parameters Image plotting Curve Fitting & Optimization Figure handling
3
“Rotation in 60 minutes”
4
Rotation in 60 minutes:
During the past month you’ve measured promoter activity of 20 genes.
Your PI wants you to present your results at the next group meeting.
5
To Do List
Get the sequences of the genes from a GenBank+Fasta files and calculate GC content
Display all correlation coefficients of the measured PA and relation to GC content
Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway
6
To Do List
Get the sequences of the genes from a GenBank+Fasta files and calculate GC content
Display all correlation coefficients of the measured PA and relation to GC content
Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway
7
GenBank file format
8
Step 3: Attach every gene name with its DNA sequence
Build the structure with all needed fields:
% Build the structure Genes with the desired genes and their data: % name, startPosition, endPosition, sequence, complement (1/0), GCcontent% This is also the way to preallocate for structures:% Genes(1,sum(indGeneList))=struct( 'name', [], 'complement', [], 'sequence',[],...% 'StartPosition',[],'EndPosition',[],'GCcontent',1);
Genes=struct('name',geneNames(indGeneList),…'complement', num2cell(indComplement(indGeneList)'),... 'StartPosition',CDSpositionStartEndCelled(indGeneList,1)',…'EndPosition',CDSpositionStartEndCelled(indGeneList,2)',...'sequence',seq,'GCcontent',GCcontent);a=Genes;Note: Structures are assigned one by one only with
cell arrays
9
To Do List
Get the sequences of the genes from a GenBank+Fasta files and calculate GC content
Display all correlation coefficients of the measured PA and relation to GC content
Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway
10
Calculate and plot Correlation Matrix
Load the list of genes and measurements% Input:% measurement mat file contains:% geneList - a cell array of the genes Names% measurements - a matrix of 20 genes measurements at 1001 time points% GenesGCcontent - a vector of the genes GCcontent values
%measurements has a row for each gene containing its measurements through%1001 time points and the geneList namesload measurements
11
Plot GC content and mean PA dependence
Plot fit results upon the previous graph:
Note: Smoothed data can lower the effect of outliers
12
Calculate and plot Correlation Matrix
Calculate and display the corr. matrix
13
To Do List
Get the sequences of the genes from a GenBank+Fasta files and calculate GC content
Display all correlation coefficients of the measured PA and relation to GC content
Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway
14
Step 2: Fit correlations to the desired function
Using anonymous function to add more Parameters and fitting using lsqcurvefit:
function y_hat=FittingCurveExpGuess(c,x,init)% This assumes an exponential decreasing curvey_hat=init+c(1)*exp(c(2).*x);
initDis=-0.1;c0=[.7 0.1]; %assigning the initial values for the fit searchparamfunc = @(c,x)FittingCurveExpGuess(c,x,initDis); %def. of the anonymous functionExpParam=lsqcurvefit(paramfunc,c0,XdataPoints,correl,[0 -1],[1 1],options);
Function name
Initial guess
X data
Y data
Lower bound
upper bound
15
Step 3: Plot the correlation data and fit
16
Best of Luck in the Group Meeting !
17
Best of Luck in the Group Meeting !
18
This is the end, my friend, the end
"Louis, I think this is the beginning of a beautiful friendship."