HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining...

23
> < HOW CAN BIOTM HELP THE NOITFY LIBRARY HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY ? SEMI-AUTOMATED CURATION WORKFLOW INTERACTIVE AND SEMANTIC INDEXING OF CONTENTS 1 Anália Lourenço Augusto Ramoa Florentino Fdez-Riverola Gael Pérez Rodríguez Jorge Condeço Martín Pérez Pérez

Transcript of HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining...

Page 1: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><HOW CAN BIOTM HELP THE NOITFY LIBRARY

HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY ?

SEMI-AUTOMATED CURATION WORKFLOWINTERACTIVE AND SEMANTIC INDEXING OF CONTENTS

1

Anália Lourenço Augusto Ramoa Florentino Fdez-Riverola

Gael Pérez Rodríguez Jorge Condeço Martín Pérez Pérez

Page 2: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,
Page 3: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><HOW CAN BIOTM HELP THE NOITFY LIBRARY

Introduction to Biomedical Text Mining

Main tasks/methods and common applications Means to present results/information to users

01 02 03

Contents

3

Live examples of the workflow

Initial ideas Some experiments Discussing utility and available resources

Final brainstorming

common interests collaboration possibilities

Page 4: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><

01Introduction to Biomedical Text

Mining

Text mining refers to the process of deriving high-quality information from text. Typically, natural language processing (NLP), machine learning (data mining) and statistical analysis approaches are used to devise patterns and trends.

Two main areas: Information Retrieval (IR) supports large-scale processes for many general and domian-focused search engines. Information Extraction (IE) has a long experience in processing common English texts.

Biomedical text mining (BioTM) emerged as a “new” domain specialised in the processing of biomedical documents.

Mainly scientific literature, but also patents and other documents of biomedical relevance. IE is focused on entities important to biomedical domains (e.g. genes, proteins and diseases) and is supported by the numerous biomedical vocabularies (e.g. MESH terms, Gene ontology).

Historical perspective

4

Page 5: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><

Providing domain-tailored tools for experts: automatic screening (retrieval + ranking) relevant documents; semi-automatic annotation of domain-relevant concepts; integration and validation (e.g. database contents).

Enhanced analysis abilities: creation of domain-specific search filters; intuitive and visual attractive presentation of document main contents; interactive and user customised navigation of document contents.

HOW CAN BIOTM HELP THE NOITFY LIBRARY

Two major goals of Biomedical Text Mining

5

Page 6: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><GENERAL VIEW OF OUR BIOTM WORKFLOW 6

1 2

Automatic retrieval of

public literature

3

5

4

Learning from expert’s experience

Automatic document

ranking

Interactive and semantic

searches

Manual expert

curation

Page 7: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><

Any library that enables automatic literature retrieval can be integrated in our workflow.

PubMed is perhaps the most accessed public database of biomedical bibliography. The NCBI (National Center for Biotechnology Information) Entrez Utilities Web services are used to automatically access, search for potentially relevant articles, and download publications.

PubMed searches can be made as specific or relaxed as desired. That is, one may choose to maximise the chance of getting relevant document (true positives) even if it means also retrieving more irrelevant documents (false positives). Document relevance will be further evaluated.

GENERAL VIEW OF OUR BIOTM WORKFLOW

Document retrieval from public sources

7

Page 8: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><GENERAL VIEW OF OUR BIOTM WORKFLOW

Document retrieval from public sources

8

PubMed vs Ovid (2012-2015) PubMed 2016 results

05

101520253035404550

2012 2013 2014 2015 2016

#Docum

ents

Year

PubMed Ovid

Page 9: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><

The documents retrieved from PubMed are automatically processed and the predictive ability of the contents is evaluated. Relevance mining model aims to score the retrieved documents and minimise the time that experts spend discarding non relevant documents.

We look for unigrams, bigrams, and co-occurring words features as well as meaningful mentions of domain-specific concepts. The various textual and entity features are integrated in a variable trigonometric threshold linear classifier, which defines a decision surface, where those feature terms with better predictive ability appear differentiated.

Experts help training the relevance mining model by providing a set of good examples of relevant and irrelevant documents from where the model can be inferred (“learn by example”).

GENERAL VIEW OF OUR BIOTM WORKFLOW

Document triage

9

Page 10: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><

Semantic annotations are important to both triage and document search/indexing. Key domain concepts may help to quickly determine document relevance. The annotation of meaningful domain concepts can be used to:

provide contextualised document summary, implement customised search filters, enable systematic and interactive document navigation.

Named entity recognition methods are used to identify mentions of important domain concepts. Automatic recognition can be accomplished by using dictionaries (and other domain lexica) as well as natural language processing and machine learning entity taggers. We explored public NER tools and the terminology available at Notify site to implement a first example of the NER module. We covered species, organs, blood, cells, and tissues.

GENERAL VIEW OF OUR BIOTM WORKFLOW

Semantic annotation

10

Page 11: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><

Experts establish the key guidelines for automating (any!) text processing. Experts validate the relevance mining model. Experts label a meaningful set of examples of relevant and irrelevant documents from where the mining model can be inferred (“learn by example”). Experts validate the outputs of the model until considered acceptable. Experts revise/make semantic annotations. Define the main contents to be extracted from the documents (“information that is really important”). Ensure that semantic annotations are correct, i.e. the concept is properly marked and classified (“domain classification”).

GENERAL VIEW OF OUR BIOTM WORKFLOW

Expert curation

11

Page 12: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><GENERAL VIEW OF OUR BIOTM WORKFLOW

MARKYT

12

Domain experts define/revise concepts and establish document relevance

Experts make sense of the information scattered in text

Good quality annotations have

many uses

http://sing.ei.uvigo.es/privateMarky/

Page 13: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><

Today, Web-based search interfaces are very popular. Yet, presenting large, growing volumes of information in a systematic, intuitive and interactive way is challenging. We propose a search and visualisation interface that is customised for the domain. This interface allows users to formulate queries at different levels of specificity and using domain concepts. Also, network representation offers an intuitive and visually appealing means to observe and navigate a potentially large number of relationships (contents of documents that meet the search). Network topology provides descriptive statistics about the documents and concepts matching the user query and enables the inference of indirect associations.

GENERAL VIEW OF OUR BIOTM WORKFLOW

Search and visualisation

13

Page 14: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><GENERAL VIEW OF OUR BIOTM WORKFLOW 14

Documents association:

looking for similar evidences

Search and visualisation

Page 15: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><GENERAL VIEW OF OUR BIOTM WORKFLOW 15

Concept-driven visualisation and

navigation

Search and visualisation

Page 16: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><

02Notify Library

Where can BioTM help?

Examples of public online applications supported by our BioTM approach. Domain-specific layout and search functionalities, but common core document processing workflow.

Live examples of the workflow

16

Page 17: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

HOW CAN BIOTM HELP THE NOITFY LIBRARY

Live Applications http://sing.ei.uvigo.es/antimicrobialCombination/

Page 18: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

HOW CAN BIOTM HELP THE NOITFY LIBRARY

Graph visualisation

Page 19: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

HOW CAN BIOTM HELP THE NOITFY LIBRARY

Live Applications http://pcquorum.org/

Page 20: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

HOW CAN BIOTM HELP THE NOITFY LIBRARY

Interrelating different info

Page 21: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><

03Final Brainstorming

How can this cooperation evolve?

21

These are only preliminary experiments on what can be achieved with BioTM. Expert input about needs and useful outputs is critical. Also, it is critical to identify high quality vocabulary suitable for automatic concept annotation.

Page 22: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,

><HOW CAN BIOTM HELP THE NOITFY LIBRARY

A future in common

22

For the Notify Library: integrate the contents of other public libraries; reduce the workload of experts at triage stage; provide additional search tools and new visualisation abilities (better user experience); Expand the coverage of Notify Library at a reduced (human) cost.

For the SING Research Group: be able to interact with domain experts and understand critical needs of these libraries; implement a BioTM workflow for a real-world and routinely used Medical application; co-publishing with Notify group the lessons learned throughout the process.

Page 23: HOW CAN BIOMEDICAL TEXT MINING HELP THE NOTIFY LIBRARY€¦ · Biomedical Text Mining Text mining refers to the process of deriving high-quality information from text. Typically,