Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools...
-
Upload
eagle-genomics-ltd -
Category
Healthcare
-
view
776 -
download
2
description
Transcript of Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools...
o
o
o
o
select aminoid, seq1[0:6], xss[0:6] from amino a where seq1=‘R[2,4]+polar++hydroxyl+’
GO:0003673 : Gene_Ontology (28348)
GO:0008150 : biological_process (21805)
GO:0005575 : cellular_component (13866)
GO:0003674 : molecular_function (20801)
GO:0008369 : obsolete (289)
GO:0004432 : 1-phosphatidylinositol-4-phosphate kinase, class IA (0)
GO:0003824 : enzyme(7162)
GO:0016301 : kinase(1027)
GO:0004428 : inositol/phosphatidylinositol kinase(37)
GO:0016307 : phosphatidylinositol phosphate kinase(9)
GO:0000285 : 1-phosphatidylinositol-3-phosphate 5-kinase(1)
GO:0016740 : transferase(2130)
GO:0016772 : transferase, transferring phosphorus-containing groups(1239)
GO:0016773 : phosphotransferase, alcohol group as acceptor(969)
GO:0004428 : inositol/phosphatidylinositol kinase(37)
GO:0016307 : phosphatidylinositol phosphate kinase(9)
GO:0000285 : 1-phosphatidylinositol-3-phosphate 5-kinase(1)
Ontology
Structured Data Sources Unstructured Data Sources
o
oooooooo
oooooooo
ooooooooo
oooooooo
o
ooooooooooooooo
o
o
o
o
o
o
o
o
o
o
o
Context Vectors Term Vectors
1 2
3 n
‘Zinc’ ‘Finger’
1 2
3 n
Dot product comparisons of query vector vs term/context vectors gives semantic distance
‘Zinc finger’ OR addition
Query vector
‘Tachycardia’ search – (untrained – no starting vocab provided) 400K clinical trials (500MB of XML), unfiltered result set Approx. 1.2M ‘terms’ in corpus
Vector length = semantic distance (in corpus) Colour = term density in corpus
o
o
o
o
o
o ° ° °
o
o
o
o
o
o
o
o
o
o
𝒙′
𝒚′
𝒛′
𝒙𝒚𝒛
𝟏 𝟎 𝟎𝟎 𝒄𝒐𝒔∅ 𝒔𝒊𝒏∅𝟎 −𝒔𝒊𝒏∅ 𝒄𝒐𝒔∅
A B C D
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 1
1 1 1 0
1 1 1 1
A B C D
0 0 0 0
0 1 0 1 0 1 1
0 1 0 1 1 0
1 0 1 0 1 0 1 1
o
o
o
o
o
o
o
o
o
o