Chemical name interpretations & Molecular time lines -

31
1 1 Chemical name interpretations & Molecular time lines -

description

Chemical name interpretations & Molecular time lines -. This shows detailed record view – with molecular links -. This shows the chemicals report with molecular timeline & mouse over of chemical names. Exploring co-table analysis of Molecules with Gene ID ’ s. - PowerPoint PPT Presentation

Transcript of Chemical name interpretations & Molecular time lines -

Page 1: Chemical name interpretations &   Molecular time lines -

11

Chemical name interpretations & Molecular time lines -

Page 2: Chemical name interpretations &   Molecular time lines -

22

This shows detailed record view – with molecular links -

Page 3: Chemical name interpretations &   Molecular time lines -

33

This shows the chemicals report with molecular timeline & mouse over of chemical names

Page 4: Chemical name interpretations &   Molecular time lines -

44

Exploring co-table analysis of Molecules with Gene ID’s

For example – show me all of the co-occurrences of these (x) molecules with these (any / all) gene’s !

Page 5: Chemical name interpretations &   Molecular time lines -

55

From the main menu select the Analyze tab 11

Page 6: Chemical name interpretations &   Molecular time lines -

66

22 From the analyze menu select the Cotable tab !

Page 7: Chemical name interpretations &   Molecular time lines -

77

33 Now Enter the Inchi keys for the molecules of interest -

Click here to enter a sample (test) set of molecules

Page 8: Chemical name interpretations &   Molecular time lines -

88

44 Now select - patent field – to explore “patents” !

These are the molecules of interest – (Inchi keys to explore)

Select Patent field here

Page 9: Chemical name interpretations &   Molecular time lines -

99

55 Now select - facet = patent field + Gene then click analyze

Molecules

Facet = Patents + Genes

Page 10: Chemical name interpretations &   Molecular time lines -

1010

These are the NCBI Gene ID #’s

To transpose the charts or export the data – click here

This shows the “cotable” results = co-occurrences of molecules + NCBI –Gene ID’s

Page 11: Chemical name interpretations &   Molecular time lines -

1111

This shows the transposed chart – of co-occurrences of molecules + NCBI –Gene ID’s

Click here to see the patents containing this molecule + this particular gene

Page 12: Chemical name interpretations &   Molecular time lines -

1212

Co-table Analysis

For example : Show me all documents where imitrex wasMentioned with “any” …..sign and / or symptoms

(note: these are terms such as headache, vomiting, nausea ..etc ..there are > 680 of them).

Page 13: Chemical name interpretations &   Molecular time lines -

1313

Draw a compound of interest Draw a compound of interest 11

22Click – view compound in co-table Click – view compound in co-table

Page 14: Chemical name interpretations &   Molecular time lines -

1414

Draw a compound of interest Draw a compound of interest 11

22Click – view compound in co-table Click – view compound in co-table

Page 15: Chemical name interpretations &   Molecular time lines -

1515

33 Select a MeSH category for Co-occurance analysis Select a MeSH category for Co-occurance analysis

44 Click analyzeClick analyze

Page 16: Chemical name interpretations &   Molecular time lines -

1616

This shows the number of documents that contained the source molecule andANY of the MeSH – C23 terms

This shows the number of documents that contained the source molecule andANY of the MeSH – C23 terms

Click on the numbers to “link to ” the documents

Click on the numbers to “link to ” the documents

Page 17: Chemical name interpretations &   Molecular time lines -

1717

Type in a new MeSH code to change the analysis from ‘signs & symptoms’ (C23) to diseases (C01)

Type in a new MeSH code to change the analysis from ‘signs & symptoms’ (C23) to diseases (C01)

Page 18: Chemical name interpretations &   Molecular time lines -

1818

This shows the number of documents that contained the source molecule andANY of the MeSH – disease (C01) terms

This shows the number of documents that contained the source molecule andANY of the MeSH – disease (C01) terms

Page 19: Chemical name interpretations &   Molecular time lines -

1919

This shows the comparison of 2 drugs and the co-occurrence of MeSH Symptoms (C23) terms

This shows the comparison of 2 drugs and the co-occurrence of MeSH Symptoms (C23) terms

Page 20: Chemical name interpretations &   Molecular time lines -

2020

Medline co-occurrence of Statin structures vs. MeSH –

Chemical Structures vs. Signs and Symptoms

This shows the comparison of different statins and the co-occurrence of MeSh terms

This shows the comparison of different statins and the co-occurrence of MeSh terms

Page 21: Chemical name interpretations &   Molecular time lines -

2121

Screen shoots from our SIMPLE / SIIP Web application

Page 22: Chemical name interpretations &   Molecular time lines -

2222

Search Chemical Search using ChemAxon w/ DB2

Proximal Search Nearest Neighbor Search

Page 23: Chemical name interpretations &   Molecular time lines -

2323

BioTerm Analysis

Clustering Claims Originality

Discovery

Page 24: Chemical name interpretations &   Molecular time lines -

2424

Landscape Analysis

Visualization

Networks

Page 25: Chemical name interpretations &   Molecular time lines -

2525

IBM’s - Massively Parallel Probabilistic Architecture

Question/Topic

Analysis

Question

Hypothesis & Evidence Scoring

Answer, Confidence

SynthesisFinal Merging

& Ranking

QueryDecompositio

n

Hypothesis Generation

Hypothesis & Evidence Scoring

Soft Filtering

Hypothesis Generation

Hypothesis & Evidence Scoring

Soft Filtering

Hypothesis Generation

Trained Models

Primary Search

Candidate Answer Generati

on

A. Sources

SupportingEvidenceRetrieval

Deep Evidence Scoring

Answer Scoring

E. Sources

Evidence

Retrieval

DeepEvidenceScoring

25

Watson generates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. These gather and weigh evidence over both unstructured and

structured content to determine the answer with the best confidence.

Source – J Kreulen

Page 26: Chemical name interpretations &   Molecular time lines -

2626

DeepQA Application (Java/C++)

Watson Infrastructure• 90 Power 750 Servers• Each Server 3.5GHz POWER7 8 Core Processor with

4 threads/core• Total: 2880 POWER7 Cores with 16TB RAM• Processing speed: 500Gb/sec; 80 TeraFLOPS• 94th on Top 500 Supercomputers• Note: This hardware is for Jeopardy. Any other

application of Watson will require appropriate sizing and optimization for purpose.

SUSE Linux Enterprise Server 11

Apace Hadoop + Apache UIMA

Nature of Domain: Open vs. ClosedClosed domain implies all knowledge is contained within a specific domain characterized by ontologies and there is no need to go outside the domain.Jeopardy is an open-domain example where it is general knowledge.

Knowledge/Data Sources: AvailabilityQA systems are natural language search engines. Watson goes beyond NL search. If knowledge sources are incomplete, unavailable, insufficient or inadequate then it is not possible for the system to provide an answer. In some cases one would need to envisage Interactive QA that require human interaction to guide the search. Another very important consideration is the availability of sufficient sample data for training (i.e. training corpus).

Need for multi-modalityIs there a need for Transcription from Speech to Text before a question is answered? This would require integration of Speech to Text capabilities that are not really ready for real-time applications.

LatencyWatson is capable of processing 500GB of information per second with 3 sec response to questions and used most of its knowledge source in memory (as opposed to disk) for speed. What is the latency requirement for the application?

Multi-Lingual or Cross-Lingual SupportWatson can support only English at this time; with language-specific parsers other languages can be supported . If knowledge sources or QA is required in multiple languages then that would not be a good candidate. Additionally if cultural context have to be accommodated in the answer then it would not be prudent to deploy QA systems directly interacting with users.

Question TypeDecomposition and classification of the question is critical to how QA systems work. Bulk of the question types in Jeopardy were Factoid questions. Watson did not include 2 question categories: One is Audio/Video type questions that require looking at a video to answer and another are questions that require special instructions (e.g. verbal instructions to explain a question.)

Answer TypesWatson is not designed to curate a task-oriented system. It can handle temporal and geo-spatial reasoning in its answers. As it stands it cannot handle business process type of reasoning (to do task B tasks A, C must be completed etc.)

Technical Issues to consider when applying QA systems like Watson

Page 27: Chemical name interpretations &   Molecular time lines -

2727

I would like to acknowledge the IBM Almaden Research – team

Jeff Kreulen Ying Chen Scott Spangler Alfredo AlbaTom GriffinEric Louie Su Yan Issic Cheng Prasad Ramachandran Bin HeAna Lelescu

Qi HeLinda KatoAna Lelescu Brad Wade John Colino Meenakshi NagarajanTimothy J Bethea German Attanasio Laura AndersonRobert Prill

+ a host of folks from IBM China Labs -

Page 28: Chemical name interpretations &   Molecular time lines -

2828

Back-up slides

Page 29: Chemical name interpretations &   Molecular time lines -

2929

• Challenges ahead –

• Access to full – text

• Language issues • Chinese• Japanese• Korean • Other

• Legal issues

• Web data

• Integration with Medical content

Page 30: Chemical name interpretations &   Molecular time lines -

3030

Chemicals from Chinese Patents -

Attempts to process Chinese Patent Documents

Extracting chemical structures form Chinese patents…

Page 31: Chemical name interpretations &   Molecular time lines -

3131

Dat

a So

urce

s

View selected

Documents & Reports

U.S.Patents(1976 -—

2009)

U.S. Pre-

Grants (All)

PCT &EPO

Apps

Medline Abstracts

(>18 M)

SelectedInternet Content

User Applications

In-House

Content

Knime or Pipeline Pilot

BIW

SIMPLE

Chem Axon Search

Cognos/DDQB/Other Apps

Parse & Extract

data

Annotator 1

Annotator 2

Database

+compu ted Meta Data

e Classifier & OtherData Associations

Annotation Factory

Computational Analytics

(SemanticAssociations)

Computer Curation Process Overview & integration with our collaborators -

IP Database(e.g. DB2)

ADU*ADU*

* ADU = Automated Data Update

* ADU = Automated Data Update

ChemVersedb

ChemVerse

Services Hosted at IBM Almaden