DNA Microarray analysis using Hierarchical clustering
-
Upload
rawatpooran05 -
Category
Documents
-
view
215 -
download
0
Transcript of DNA Microarray analysis using Hierarchical clustering
-
7/21/2019 DNA Microarray analysis using Hierarchical clustering
1/6
Experiment 41. Experiment: DNA Microarray analysis using Hierarchical clustering
Equipment Required: Computer with internet connection.Material Required: - Matla
2. Learning Objectives:
!o acquaint students with the clustering o" large data.!o acquaint with matla
3. Theory:
Clstering metho!s
#ne o" the goals o" microarray data analysis is to cluster genes or samples with similar
e$pression pro"iles together% to ma&e meaning"ul iological in"erence aout the set o"genes or samples. Clustering% also &nown as class disco'ery%
!he idea is that coregulated and "unctionally related genes are proaly going to e$press
(go up or down) simultaneously% so they can e grouped into clusters. allows easiermanagement o" the data set.
Clustering techniques can help:
* to identi"y groups o" coregulated genes%* to identi"y spatial or temporal e$pression patterns%
* to reduce redundancy in prediction models.
* to identi"y new iological classes (i.e. new tumor classes)%* to detect e$perimental arti"acts%
* or "or display purposes.
Clustering is one o" the unsuper'ised approaches to classi"y data
"hat is #$TL$%&*MA!+A, short "or #atri$ Laboratory.*MA!+A, is a tool "or doing numerical
computations with matrices and 'ectors.*/t is 'ery power"ul and easy to use*integrates
computation%'isuali0ation and programming12E2:
3. !eaching 4,ioin"ormatics graduate and undergraduate courses
*M/!%Har'ard%2tan"ord%Cornell%Carnegie Mellon%56. Research --recent papers use MA!+A, "or:
a. 2equencing
*,ase calling algorithm design
. Microarray analysis*2tatistical modeling o" microarrays%image analysis
c. 7roteomics
*Mass spectrometry data classi"icationd. 2ystems ,iology
*8lu$ Analysis%2imulation o" Metaolic 7athways%/nteraction Networ& /denti"ication
Clustering algorithms
-
7/21/2019 DNA Microarray analysis using Hierarchical clustering
2/6
!he traditional algorithms "or clustering are:
3. Hierarchical clustering.
6. 9-means clustering.. 2el"-organi0ing "eature maps (a 'ariant o" sel"-organi0ing maps).
;. ,inning
'ierarchical clsteringHierarchical clustering typically uses a progressi'e comination o" elements that are most
similar. !he result is plotted as a dendrogram that represents the clusters and relations etween
the clusters. . Repeat steps and ; "or the most high-le'el clusters.!he top-down algorithm wor&s as "ollows:
3. All the genes or e$periments are considered to e in one super-cluster.
6. Di'ide each cluster into 6 clusters y using &-means clustering with &?6.. Repeat step until all clusters contains a single gene or e$periment.
!his algorithm tends to e "aster than the ottom-up approach.
4. (roce!re:
/nstall Matla on the system
1se cluster analysis tool:
'ierarchical Clstering in #$TL$%!o per"orm hierarchical cluster analysis on a data set using the 2tatistics !oolo$
"unctions% "ollow this procedure:
@ )tep * 1 +in! the similarity or !issimilarity bet,een every pair o- objects in the !ata
set:
@ /n this step% you calculate the distanceetween o=ects using the p!ist"unction. !he
p!ist"unction supports many di""erent ways to compute this measurement.@ )tep * 2 rop the objects into a binary/ hierarchical clster tree:@ /n this step% you lin& pairs o" o=ects that are in close pro$imity using thelin0age
-
7/21/2019 DNA Microarray analysis using Hierarchical clustering
3/6
"unction% which is the main "unction to implement hierarchical clustering method. !he
lin0age"unction uses the distance in"ormation generated in step 3 to determine the
pro$imity o" o=ects to each other. As o=ects are paired into inary clusters% the newly"ormed clusters are grouped into larger clusters until a hierarchical tree is "ormed.
@ )tep * 3 etermine ,here to ct the hierarchical tree into clsters:
@ /n this step% you use the cluster "unction to prune ranches o"" the ottom o" thehierarchical tree% and assign all the o=ects elow each cut to a single cluster. !his creates
a partition o" the data. !he cluster "unction can create these clusters y detecting natural
groupings in the hierarchical tree or y cutting o"" the hierarchical tree at an aritrarypoint.
!he MA!+A,s 2tatistics !oolo$ includes a con'enience "unction% clster!ata% which
per"orms all these steps. No need to e$ecute the p!ist/ lin0age/ or clster"unctionsseparately
Command used:3. pDist "unction: D ? pdist(B) computes the Euclidean distance etween pairs o" o=ects in
m-y-n data matri$ B.
6. lin&age "unction: ? lin&age() creates an agglomerati'e hierarchical cluster tree "romthe distances in
. square"orm "unction: ? square"orm(y)% where y is a 'ector as created y the pdist
"unction% con'erts y into a square% symmetric "ormat % in which (i%=) denotes thedistance etween the ith and =th o=ects in the original data.
;. Dendrogram: H ? dendrogram() generates a dendrogram plot o" the hierarchical% inary
cluster tree represented y
Example:1
B ? F3 6G6.> ;.>G6 6G; 3.>G; 6.>
? pdist(B)
B ?
3.IIII 6.IIII
6.>III ;.>III 6.IIII 6.IIII
;.IIII 3.>III
;.IIII 6.>III
?
6.J3>> 3.IIII .I;3; .I;3; 6.>;J> .>;3 6.>III 6.IK3K 6.IK3K 3.IIII
square"orm()
-
7/21/2019 DNA Microarray analysis using Hierarchical clustering
4/6
ans ?
I 6.J3>> 3.IIII .I;3; .I;3; 6.J3>> I 6.>;J> .>;3 6.>III
3.IIII 6.>;J> I 6.IK3K 6.IK3K
.I;3; .>;3 6.IK3K I 3.IIII .I;3; 6.>III 6.IK3K 3.IIII I
? lin&age()
?
;.IIII >.IIII 3.IIII 3.IIII .IIII 3.IIII
K.IIII L.IIII 6.IK3K
6.IIII .IIII 6.>III
dendrogram()
Example: 2
load mn.dat
mn'ector ? square"orm(mn)
mn'ector ?
-
7/21/2019 DNA Microarray analysis using Hierarchical clustering
5/6
Columns 3 through J
I.II I.;II I.LII I.KLII I.K;II I.L>II I.KII I.>II
I.6II
Columns 3I through 3
I.6JII I.;>II I.>II I.>LII I.L>II I.LKII I.66II I.>III.>3II
Columns 3J through 6L
I.>LII I.L6II I.L;II I.;>II I.>III I.>KII I.KJII I.L3II
I.II
Columns 6 through K
I.6JII I.;KII I.;KII I.6II I.;6II I.;3II I.;II I.6III.63II
mnclustering ? lin&age(mn'ector%complete)OOO mnclustering ? lin&age(mn'ector%complete)
P
Error: !he input character is not 'alid in MA!+A, statements or e$pressions.
mnclustering ? lin&age(mn'ector%QcompleteQ)
mnclustering ?
.IIII J.IIII I.63II
.IIII ;.IIII I.66II K.IIII L.IIII I.6II
6.IIII 33.IIII I.6JII
>.IIII 36.IIII I.II
3.IIII 3.IIII I.II 3I.IIII 3;.IIII I.;KII
3>.IIII 3K.IIII I.KII
dendrogram (mnclustering)
. eire! eslts:!he student should e ale to identi"y clusters.7arameters: None
Relationships to e determined: None
-
7/21/2019 DNA Microarray analysis using Hierarchical clustering
6/6
5. Cations:None
6. Learning otcomes: