Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization...
Transcript of Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization...
![Page 1: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/1.jpg)
Minería de Datos
ANALISIS DE UN SET DE DATOS
! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions
![Page 2: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/2.jpg)
Data Mining on the DAG
ü When working with large datasets, annotation results need to be summarized
ü The DAG provides visualization of annotation data within its biological context
ü In Blast2GO --> Combined Graph Function
![Page 3: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/3.jpg)
Combined Graph
Each term has a number of sequences associated
Nodes can be coloured to indicate relevance
Each term is displayed around its biological context
Node shape to differentiate between direct and indirect annotation
![Page 4: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/4.jpg)
Combined Graph
Different GO branches Reduces nodes by number of annotate sequences
Criterion for highlighting and filtering nodes
Node data to be displayed
![Page 5: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/5.jpg)
Let's paint the DAG of the dataset analized yesterday (1000 sequences)
Too many nodes!!!
Combined Graph
Need way to find relevant information
![Page 6: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/6.jpg)
Accumulated by node (Sequence Count)
3 1
4
5
1 3
1
Incomming information (Node Score)
3 1
2.4
2.5
1
1 3
Node Information Content
![Page 7: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/7.jpg)
Node score We compute a node score that reflects the
amount of direct information at the node
3 1
2.4
2.5
1
1 3
![Page 8: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/8.jpg)
Node score
GO2 3
GO1 1
GO2 2.4
GO4 2.5
1
1 3
NodeScore (GO1) = 1 * 0.6 0 = 1 NodeScore (GO2) = 3 * 0.6 0 = 3
dist=0 dist=0
dist=1 dist=1
NodeScore (GO3) = 1 * 0.6 1 + 3 * 0.61 = 0.6 + 1.8 = 2.4
dist=0
dist=2 dist=2
NodeScore (GO4) = 1 * 0.6 2 + 3 * 0.62 + 1 * 0.60 = 0.36 + 1.08 + 1 = 2.5
α = 0.6
![Page 9: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/9.jpg)
Node score vs Annotation score
3 1
2.4
2.5
1
1 3
GO1 1 seq
GO2 1 child
GO3 50
hit1
hit2 hit3
ROOT
GO4 55
GO1 1 child
GO2 52
GO1 60
Annotation Score:
- In annotation context
- Relates to Blast results of ONE sequence
Node Score:
- In data-mining context
- Relates to analysis of a GROUP of sequences
DO NOT MIX-UP !!!!!
AS = max{%sim * ECw]}+ (#TPR_GOs-1) * GOw
![Page 10: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/10.jpg)
Filtered Graph
Direct annotations
Transition nodes
# Filtered Nodes
![Page 11: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/11.jpg)
Compacting Graphs by GOSlim
![Page 12: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/12.jpg)
Show node content
![Page 13: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/13.jpg)
Saving Options
Save as picture and as txt
![Page 14: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/14.jpg)
Graph Charts
![Page 15: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/15.jpg)
Graph Charts
• Sequence Distribution/GO as Multilevel-Pie (#score or #seq cutoff)
• Sequence Distribution/GO as Bar-Chart
• Sequence Distribution/GO as Level-Pie (level selection)
![Page 16: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/16.jpg)
Multilevel vs. GO-Slim Chart
GO-Slim: Handy to summarize functional content
Multi-level Pie with a sequence filter of 20
![Page 17: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/17.jpg)
Use DAG to analyze a function
How many sequences are annotated to the function “photosynthesis”?
Option 1: Find in the GO graph à direct & indirect annotation Option 2: Find through the Select function. Two sub options
Option 2.1. Direct annotation (use GOid or description) Option 2.2. Direct&indirect (use GOid and “include GO parents”)
DAG can be used to make queries on general concepts without direct annotations
![Page 18: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/18.jpg)
Example: analyze a specific function
Find a function on the graph
search export
![Page 19: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/19.jpg)
Example: analyze a specific function
Select all sequences annotated to this function and its descendents
![Page 20: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/20.jpg)
Example: analyze a specific function
Locate these sequences
![Page 21: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/21.jpg)
Example: analyze a specific function
Explore the annotation diversity of a given function within the graph
Exporting the sequence table you can see all Sequences annotated to a given function (GO)
![Page 22: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/22.jpg)
Conclusions ü DAGs are interesting for browsing functional
annotation but can be too large ü With filtering and pruning options you can create
more navigable DAGs ü Pies are good to compact information: try out levels ü GO-Slim compacts to more equivalent terms than
filtering the GO ü You can use the DAG to query on general terms
![Page 23: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/23.jpg)
Minería de Datos
ANALISIS DE VARIOS SETS DE DATOS
! Functional Enrichment ! Enriched Graphs ! Meta-analysis
![Page 24: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/24.jpg)
Biosynthesis 54% Biosynthesis 18%
Sporulation 18% Sporulation 27%
One Gene List (A) The other list (B)
Are this two groups of genes carrying out
different biological roles?
Enrichment Analysis
Are these differences statistically significant?
???
???
Interpretation of a large list of genes: which are relevant functions?
![Page 25: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/25.jpg)
Biosynthesis 54% Biosynthesis 18%
Sporulation 18% Sporulation 27%
One Gene List (A) The other list (B)
9 5 No biosynthesis
2 6 Biosynthesis
B A
Fisher's Exact Test
Contingency table
p-value for biosynthesis < 0.05 8 9 No sporulation
3 2 Sporulation
B A
p-value for sporulation > 0.05
![Page 26: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/26.jpg)
Multiple testing correction
We do this for all GO term of our dataset!!!
Many tests => Many false positive => We need correction!
FDR control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses.
FWER control: The familywise error rate is the probability of making one or more false discoveries among all the hypotheses when performing multiple pairwise tests.
(more conservative)
![Page 27: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/27.jpg)
8 9 No GO 3 2 GO B A
Test-set Ref-set
Fisher’s Exact Test in Blast2GO
Three files: ! Blast2GO project with annotations (.dat/.annot) ! One txt file with IDs: Test-set (.txt) ! Other txt file with IDs: Ref-set (.txt)
![Page 28: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/28.jpg)
Different types of comparisons
● Compare one condition against another
● Remove Common Ids ● Test and Ref-Set are
interchangeable
Set 1
Set 2
Common IDs
● Compare a subset against the total
● Gossip default setting
● Test and Ref-Set are NOT interchangeable
Test- Set
Ref- Set
Common IDs
Test- Set
Ref- Set
Common IDs
![Page 29: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/29.jpg)
FET in Blast2GO ● Two-Tailed test not only identifies over but also
under represented functions. ● If no Ref-Set is chosen all annotations are
used as reference
![Page 30: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/30.jpg)
Enrichment Results
● Result table with link outs to sequence lists
![Page 31: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/31.jpg)
Most specific terms
Retains only the lowest, most specific enriched term per GO branch
![Page 32: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/32.jpg)
Enriched Graph View enriched terms data as DAG graphs!
reduce
=> To draw all nodes, set filter to 1
![Page 33: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/33.jpg)
Bar-Chart ● Export enriched terms as chart!
=> Filter results
% of sequences in Ref group
% of sequences in Test group
If Test > Ref = over-expressed
If Ref > Test = under-expressed
![Page 34: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/34.jpg)
Meta-analysis in Blast2GO
Sequence_1 GO:0005792 Sequence_1 GO:0006412 Sequence_1 GO:0003735 Sequence_2 GO:0016705 Sequence_2 GO:0005840 Sequence_2 GO:0005506
Treatment_1 GO:0005792 Treatment_1 GO:0006412 Treatment_1 GO:0003735
Annotation Result (.annot) Enrichment Result
ó Equivalent formats
Treatment_1 GO:0005792 Treatment_1 GO:0006412 Treatment_1 GO:0003735 Treatment_2 GO:0016705 Treatment_2 GO:0005840 Treatment_2 GO:0005506
By joining different functional enrichment results we can create and annotation file of conditions that capture their functional profile
Enrichment Result (.annot)
![Page 35: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/35.jpg)
Meta-analysis in Blast2GO
Use seq names to see treatments
Use color by SeqCount
FIND SIMILARITIES BETWEEN TREATMENTS
![Page 36: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/36.jpg)
Meta-analysis in Blast2GO DISPLAY FUNCTIONAL DISSIMILARITIES ON DAG
Use second column number for color
![Page 37: Minería de Datos - WordPress.com · Minería de Datos ANALISIS DE UN SET DE DATOS ! Visualization Techniques ! Combined Graph ! Charts and Pies ! Search for specific functions .](https://reader030.fdocuments.in/reader030/viewer/2022040512/5e5e7a688e79573904483fb3/html5/thumbnails/37.jpg)
Ejercicios: Minería de Datos