BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical...

Hybrid methodology for information extraction from tables in biomedical literature

Nikola Milošević, Cassie Gregson, Robert Hernandez, Goran Nenadić

Contact: nikola.milosevic@manchester.ac.uk

Literature growth

• MEDLINE contains more than 26 million citations• Number of citation is growing exponentially• 2100 new articles published daily in biomedicine• Professionals are no more able to cope with the state-of-the-art

Text mining

Source: https://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining

Table mining• Current text mining efforts focus on main text of the article• Usually ignore tables and figures• Tables contain

• Settings of the experiment (patient characteristics, arms, dosages, etc.)• Results of the experiment• Definition of terms and quantitative scales• Examples (i.e. questionnaires)• …

• Article information are incomplete without tables (and figures)

Table complexity

One dimensional (list) table Two dimensional (matrix) table

Table complexity (2)

Multi-dimensional (super-row) table

Multi-dimensional (multi-table) table

Challenges

• Dense content• Variety of layouts• Variety of value representation formats• Misleading visualization markup• Lack of resources (labelled datasets)

Aim and objectives

• Create a multi-layered approach to mining information from tables

• to facilitate largescale semi-automated extraction • curation of data stored in tables

Table mining methodology overview

Functional processing

• Classifies cells to functional classes• Header, • super-row, • stub, • data

• Uses heuristics based on content and position

• Described in: Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G.

Disentangling structure of tables in scientific literature. In Proceedings of the 21th International Conference on Applications of Natural Language to Information Systems (NLDB 2016) (2016), Springer.

Structural processing

• Determines relationships between cells• Using cell functions and table structure classifies table into one of the structural table type:

• List• Matrix• Super-row• Multi-table

• Based on the type, set of rules resolves the relationships• Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G. Disentangling structure of tables in scientific literature. In Proceedings of the 21th International Conference on Applications of Natural Language to Information Systems (NLDB 2016) (2016), Springer.

Semantic tagging

• Semantically tags terms, phrases or words• Knowledge sources (UMLS, DBPedia, WordNet)• Used MetaMap for tagging with UMLS• Helps with pragmatic classification and information extraction

Pragmatic processing

• Determines the purpose of the table• Machine learning approach• Naïve Bayes, Bayes Nets, SVM, Decision trees, random forests• More specific classes -> better results• Evidence based on 2 trials

• Settings, findings, support tables - ~ 80% F-score• Baseline characteristics, Adverse events, Inclusion/Exclusion, Other - ~95%

F-score

Value identification and syntactic processing• Indemnifying the cell of interest:

• Looks at the navigational cells for lexical cues or for semantic types in tags

• Lexical cues in white and black lists• Syntactic processing

• Uses set of pattern to determine semantics of the value

• Extracts the selected value

Pragmatic classification results

• Pragmatic classification performs well with specific classes• 4 classes – baseline characteristics, adverse events,

inclusion/exclusion, other• Best performance - SVM

Information extraction results

• Extracted number of patiens

• New tests on extracting patient age, adverse events (using UMLS)

Patiens’ age

Adverse reactions

Lessons learned

• Table mining requires multi-layered analysis• Functional and structural analysis are crucial• Semantics of value presentation patterns• Semantic tagging helps• Machine learning helps in certain steps (i.e. pragmatic analysis)• Combination of heuristic based and machine learning based steps• Availability:

• https://github.com/nikolamilosevic86/TableAnnotator• https://github.com/nikolamilosevic86/TableInformationExtractionScripts

Future plans

• Develop easy to use methodology• Develop UI tool (wizard) for information extraction from tables• Improve the methodology• Compare heuristic based vs machine learning based IE• Examine methods for unbalanced datasets

Acknowledgements

Dr Michele Filannino

Dr Azad Dehghan

Nikola MiloševićRuth Stoney

Maksim Belousov

Dr Goran Nenadić

Robert Hernandez

Cassie Gregson

Richard Boyce

Jodi Schneider Steven DeMarco

nikola.milosevic@manchester.ac.uk

BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical...

Technology

Transcript of BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical...

Grinding tables with extraction - catalog

Scheme Matching and Data Extraction over HTML Tables

SRDR Tools for Data Extraction and Creating Summary Tables

Data Extraction Tables

Biomedical Relation Extraction for Knowledge Graph Completion

Workshop Biomedical Information Extraction 2009 - Borovets ...Welcome to the Biomedical Information Extraction Workshop at RANLP09! Undoubtly, the availability of information for biomedicine

Review of Biomedical Relation Extraction€¦ · Review of Biomedical Relation Extraction ... (e.g. chemical-diseases, drug-drug interactions) from biomedical text for knowledge discovery

Event Extraction from Biomedical Literature · 2015-12-15 · Event Extraction from Biomedical Literature Abdur Rahman M. A. Basher1,2*, Alexander S. Purdy1,3 and Inan˘c Birol1,2

ReviewArticle Biomedical Relation Extraction: From …cse.seu.edu.cn/PersonalPage/zhoudeyu/resources/files/jp/6.pdf · ReviewArticle Biomedical Relation Extraction: From Binary to

Table Extraction...Table region detection –Identify all tables –Separate tables from non-table text –Separate tables from each other Cell structure recognition –Partition text

A Rule-based Approach to External Context Extraction from ...studentnet.cs.manchester.ac.uk/resources/library/... · A Rule-based Approach to External Context Extraction from Biomedical

Leveraging Biomedical Resources in Bi-LSTM for Drug-Drug ...thealphalab.org/papers/Leveraging Biomedical... · for Drug-Drug Interaction Extraction BO XU 1, XIUFENG SHI1, ... ing

BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature

Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.

8D040 Basis beeldverwerking Feature Extraction Anna Vilanova i Bartrolí Biomedical Image Analysis Group bmia.bmt.tue.nl.

Drug-drug interaction extraction from biomedical ...static.tongtianta.site/paper_pdf/350d0e90-f869-11e8-b1ad-00163e08… · Drug–drug interaction extraction from biomedical literature

A Rule-based Approach to External Context Extraction from … · 2010-12-06 · A Rule-based Approach to External Context Extraction from Biomedical Literature: URL and Role Extraction

ECC Extraction From Tables to BW

Domain Adaptation for Biomedical Information Extraction

Automating the Extraction of Data from HTML Tables with