On our way toto
Information Overload ?
Or to prevent it by Or to prevent it by Appropriate use of Technology ?Appropriate use of Technology ?
C19881 0.99C92992 0.67C02002 0.66C99229 0.44C00392 0.33C93939 0.21
consolidated knowledge
Collexis Fingerprints (CFP’s)
English
French
Spanish
Peoplemedical researchersaround the world
Activitiesin elect. text like projects, publicationsMedline abstracts...
Disease: #12674
MultilingualThesaurus IndexerMatches keywords, translatesthem to identical numbers and ranks them by their relevance
Maladie: #12674
Enfermedad: #12674
Malaria: #24530
Hospital: #19994
Paludisme: #24530
Paludismo: #24530
Hôpital : #19994
Hospital: #19994
...
...
...
The CommonLanguageEach activity is representedas a set of keyword numbersranked by their relevance
#4256 : 1.0#3627 : 0.8#19994 : 0.5#28746 : 0.3#32874 : 0.1#32874 : 0.1#32874 : 0.1
#14325 : 1.0#3627 : 0.8#19994 : 0.5#28746 : 0.3#32874 : 0.1#32874 : 0.1#32874 : 0.1
#85643 : 1.0#3627 : 0.8#19994 : 0.5#28746 : 0.3#32874 : 0.1#32874 : 0.1#32874 : 0.1
#17345 : 1.0#3627 : 0.8#19994 : 0.5#28746 : 0.3#32874 : 0.1#1c8456 : 0.1#00356 : 0.1
„Collexion“ of activities
You:
#17345:1.0#3627 :0.8#19994:0.5#28746:0.3#32874:0.1
Your activity as text
Submit and indexed to keyword numbers
Find similaractivities andthe peoplebehind
Cross-language networking
The Early evolution of Fingerprint Manipulation
contents fingerprints
addadd
people fingerprints
addadd
organization fingerprint
JobsCV’s, Skills
Articles,books
Emails,Word RFP’s
BIOSEMANTICS• “Cellese”: the language that cells use to communicate
internally and externally.
• The Molecular Language and its biological MEANING• The Group
– Jan Kors PhD.– Erik van Mulligen PhD– Bob Schijvenaars PhD– Marc Weeber PhD– Christiaan v.d. Eyck MsC– Rob Jelier PhD – Barend Mons PhD– Johan van der Lei PhD
SERENDIP
Beyond PublicationBeyond PublicationSemantic metaSemantic meta--analysis of massive data and information sources for discoveryanalysis of massive data and information sources for discovery
Bsik 2003Bsik 2003
A consortium to combine State-of-the-art Information and Knowledge Mining Technologies
To support:
•Thesaurus and ontology enrichment
•Disambiguation of concepts
•Semantic meta-analysis of massive information
To enable:
•Information-based discovery
•Evidence based policy making
Thesaurus and Ontology Enrichment
• New concepts• Synonyms• Homonyms• Genes, Proteins • Pictures
Valida
tion 3
Freetext
UnexplainedText (XML)
Potential concepts
Thesauri:•Mesh•HUGO•SwissProt•SAGE•Others
FUA
4
1Fingerprints(known concepts)
partners
E-BioSci
EMBOElsevier
NLP
2
TNO
LUMC
HUGONC
Genebio
AMC
EUR
UVA
SERENDIP
Too much to read: major trends foreseen:
• From Reading to Consulting• From Reading to Meta-analysis• From Text to Knowledge
Representations
C19881 0.99C92992 0.67C02002 0.66C99229 0.44C00392 0.33C93939 0.21
Semantic typesSemantic typesCo-occurrence dataCo-occurrence data
The first step: to the Conceptual Semantic Network
Calcium deposition Pleocytosis Basal Ganglia EncephalopathyCerebrospinal Fluid Tomography, X-Ray Computed Parents FamilyAicardi Goutieres syndrome Ferrocalcinotic deposition Spastic quadraplegia Fahr disease Microcephaly AGS1
xG-protein coupled receptors G-substrate Lipoid dermatoarthritis Receptors Complement Factor B RNA, Complementary Xenopus oocyte AGS1
SwissProt: Activator of G-protein signaling 1 (AGS1)
*225750
AICARDI-GOUTIERES SYNDROME 1; (AGS1) : OMIM
Aicardi Goutieres syndrome 1Heterogeneity Linkage (Genetics) Clinical diagnosis Family 2 AGS1 **Lod Score Genetic Heterogeneity analysis Toxoplasmosis Calcium deposition 3 Encephalopathy 4 Cadmium Genus: Human cytomegalovir... Cerebrospinal fluid abnorm. 5.. Interferon-alpha Chromosomes Viral Child Head Tricuspid Valve Stenosis
Fingerprinting
disambiguatio
n
ACS
META-ANALYSIS
Applications
• Cross-language, jargon and cross-system matching (implemented): www.sharingpoint.shared-global.org
• Information-based discovery (Research)
• Community building (Experts,Policy Making)
• Trendwatching and Indicators (Policy Making)
Seed-Term based Conceptual Semantic Networks
??
Clustering of genes on-the-fly
Predicting new knowledge ?
III= Distribution over distance categories of concept-pairs without co-occurrence in the learning set.
IV= Distance categories of concept pairs related to the probability that there is no explicit relationship or co-occurrence in Medline (zero ratio) . A ratio of 0 means that an automatic Query in Medline with the concept pair with “AND” in between does lead to 0 hits in Medline.
New Drug discovery ?
Semantic Filtering
Knowledge Maps, Nature Biotechnology Map
Knowledge Maps: Medline Bioterrorism Map 1997
Knowledge Maps: Medline Bioterrorism Map 2001
Private Research
DC
Public
E-BioSciPharma etc.
ORIELSERENDIPFP6 etc.
I-ResearchMinistiesWHO, FAOetc.
SHAREDBIREME/VHLEDCTPOxford intiative etc.
Top Related