Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.
BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical...
-
Upload
nikola-milosevic -
Category
Technology
-
view
41 -
download
1
Transcript of BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical...
![Page 1: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/1.jpg)
Hybrid methodology for information extraction from tables in biomedical literature
Nikola Milošević, Cassie Gregson, Robert Hernandez, Goran Nenadić
Contact: [email protected]
![Page 2: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/2.jpg)
Literature growth
• MEDLINE contains more than 26 million citations• Number of citation is growing exponentially• 2100 new articles published daily in biomedicine• Professionals are no more able to cope with the state-of-the-art
![Page 3: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/3.jpg)
Text mining
Source: https://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining
![Page 4: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/4.jpg)
Table mining• Current text mining efforts focus on main text of the article• Usually ignore tables and figures• Tables contain
• Settings of the experiment (patient characteristics, arms, dosages, etc.)• Results of the experiment• Definition of terms and quantitative scales• Examples (i.e. questionnaires)• …
• Article information are incomplete without tables (and figures)
![Page 5: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/5.jpg)
Table complexity
One dimensional (list) table Two dimensional (matrix) table
![Page 6: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/6.jpg)
Table complexity (2)
Multi-dimensional (super-row) table
Multi-dimensional (multi-table) table
![Page 7: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/7.jpg)
Challenges
• Dense content• Variety of layouts• Variety of value representation formats• Misleading visualization markup• Lack of resources (labelled datasets)
![Page 8: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/8.jpg)
Aim and objectives
• Create a multi-layered approach to mining information from tables
• to facilitate largescale semi-automated extraction • curation of data stored in tables
![Page 9: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/9.jpg)
Table mining methodology overview
![Page 10: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/10.jpg)
Functional processing
• Classifies cells to functional classes• Header, • super-row, • stub, • data
• Uses heuristics based on content and position
• Described in: Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G.
Disentangling structure of tables in scientific literature. In Proceedings of the 21th International Conference on Applications of Natural Language to Information Systems (NLDB 2016) (2016), Springer.
![Page 11: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/11.jpg)
Structural processing
• Determines relationships between cells• Using cell functions and table structure classifies table into one of the structural table type:
• List• Matrix• Super-row• Multi-table
• Based on the type, set of rules resolves the relationships• Milosevic, N., Gregson, C., Hernandez, R.,Nenadic, G. Disentangling structure of tables in scientific literature. In Proceedings of the 21th International Conference on Applications of Natural Language to Information Systems (NLDB 2016) (2016), Springer.
![Page 12: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/12.jpg)
Semantic tagging
• Semantically tags terms, phrases or words• Knowledge sources (UMLS, DBPedia, WordNet)• Used MetaMap for tagging with UMLS• Helps with pragmatic classification and information extraction
![Page 13: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/13.jpg)
Pragmatic processing
• Determines the purpose of the table• Machine learning approach• Naïve Bayes, Bayes Nets, SVM, Decision trees, random forests• More specific classes -> better results• Evidence based on 2 trials
• Settings, findings, support tables - ~ 80% F-score• Baseline characteristics, Adverse events, Inclusion/Exclusion, Other - ~95%
F-score
![Page 14: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/14.jpg)
Value identification and syntactic processing• Indemnifying the cell of interest:
• Looks at the navigational cells for lexical cues or for semantic types in tags
• Lexical cues in white and black lists• Syntactic processing
• Uses set of pattern to determine semantics of the value
• Extracts the selected value
![Page 15: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/15.jpg)
Pragmatic classification results
• Pragmatic classification performs well with specific classes• 4 classes – baseline characteristics, adverse events,
inclusion/exclusion, other• Best performance - SVM
![Page 16: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/16.jpg)
Information extraction results
• Extracted number of patiens
• New tests on extracting patient age, adverse events (using UMLS)
Patiens’ age
Adverse reactions
![Page 17: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/17.jpg)
Lessons learned
• Table mining requires multi-layered analysis• Functional and structural analysis are crucial• Semantics of value presentation patterns• Semantic tagging helps• Machine learning helps in certain steps (i.e. pragmatic analysis)• Combination of heuristic based and machine learning based steps• Availability:
• https://github.com/nikolamilosevic86/TableAnnotator• https://github.com/nikolamilosevic86/TableInformationExtractionScripts
![Page 18: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/18.jpg)
Future plans
• Develop easy to use methodology• Develop UI tool (wizard) for information extraction from tables• Improve the methodology• Compare heuristic based vs machine learning based IE• Examine methods for unbalanced datasets
![Page 19: BelBi2016 presentation: Hybrid methodology for information extraction from tables in biomedical literature](https://reader035.fdocuments.in/reader035/viewer/2022062904/587283771a28abc7068b6ab9/html5/thumbnails/19.jpg)
Acknowledgements
Dr Michele Filannino
Dr Azad Dehghan
Nikola MiloševićRuth Stoney
Maksim Belousov
Dr Goran Nenadić
Robert Hernandez
Cassie Gregson
Richard Boyce
Jodi Schneider Steven DeMarco