Biomedical Text Mining: A survey

Harsh Thakkar

Ph.D. –I

201321008

harshbionlp@gmail.com

Ala carte`

• What?

• Why?

• How?

• Resources & Tools

• Bibliography

• Definition/(s)!!

>10 km

Source: Lars jensen

Analysis

Interpretation

Source: Prof. Prasnjit Majumder

• And its growing at a pace we cannot cope up with! Every minute, day, year • Its exponential!!

Source: Prof. Prasnjit Majumder

Lets see !

Major Data Mining Tasks

• In BDM

1. Discovery of new facts

2. Document summarization

3. Question Answering

• I.R. techniques are dominant

– Biomedical domain specific techniques (W. Hersh, 2005)

#W. Hersh. Information Retrieval: A Health and Biomedical Perspective. Health Informatics. Springer, third edition, 2005.

• Clustering Techniques

Major Data Mining Tasks

Retrieval and reduction Classify/Mine Knowledge

Document Summarization

• Contextual abstraction of information from multiple texts, also know as a text reduction problem

• Information Extraction (IE)

– NER

• Most commonly used and effective tehnique

• In biomedical context, entities like genes and protein interaction, diseases & treaments, drug names & dosages (U. leser et al., 2005)

#U. Leser and J. Hakenberg. What makes a gene name? named entity recognition in the biomedical literature.Briefings in Bioinformatics, 6(4):357–369, 2005

• Why NER?

– This field is ever growing, research never stops -> huge and huge amount of data -> new synonyms( A. Yeh et al, 2005)

– Heart attack – myocardial infection

– It becomes difficult with comprehensive synonymity system to integrate knowledge from multiple sources (so,UMLS Metathesauras or Gene Ontology)

A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman. BioCreAtIvE task 1A: Gene mention finding evaluation. BMC Bioinformatics, 6(Suppl 1):S2, 2005.

• Extensive use of domain specific abr.

– E.g. R.A.

– “right atrium”, “rheumatoid arthritis”, “renal artery”, “refractory anemia”, etc. (S. Pakhomov et al. 2002)

#S. Pakhomov. Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 160–167, 2002.

Aka Entity Normalization

Named Entity Recognition (NER)

Determine entity substring and

boundaries

Assigning entities to defined class

Entity mapping; i.e. selecting a preferred

Unique id for the selected entity

• Generally discussed as a single task, NER is a 3 step process.

BioCreAtivE ® (L. Hirschman et al. , 2005)

• System performance of IE (NER based systems)

• Takes into account F-score, precision, recall

• Task based evaluations – I2b2 : task for providing clinical data for research

purposes. Current projects: autoimmune diseases, diabetes, obesity, etc [driving biology projects-DBP’s]

– BioNLP: conducts shared tasks globally targeted towards the following tasks

• [GE] Genia Event Extraction for NFkB knowledge base construction • [CG] Cancer Genetics • [PC] Pathway Curation • [GRO] Corpus Annotation with Gene Regulation Ontology • [GRN] Gene Regulation Network in Bacteria • [BB] Bacteria Biotopes (semantic annotation by an ontology)

• NER systems have the f-scores of 0.83 & 0.87 for the first (L. Smith et al, 2008) and second (A. Yeh et al, 2005) BioCreAtivE gene media tasks.

• 0.85 for i2b2 tasks concept extraction tasks 2011 (O. Uzuner, 2011) • 0.73 for JNLPBA bio-entity recognition tasks 2004 (J. Kim et al.,

2004) • 0.57 for BioNLP 2013 shared task bacteria biotopes (IRISA) [Institute

for research in computer science and random systems]

• Other approaches

– Dictionary based

• Issues : spelling mistakes, morphological variants, homonymy(M. Krauthammer et al., 2004).

• Overcomes: string matching techniques, either exact or partial (Y. Tsuruoka et al., 2003), (J. Tsujii et al., 2003)

– Rule-based

• Define pattern rules (as in DNA sequence)

• E.g. EMPathiE and PASTA (K. Humphreys et al., 2000),(R. Gaizauskas et al., 2003)

• Classification based – Naïve bayes (K. Takeuchi et al., 2005)

– SVM’s (J. Kazama et al., 2002),(T. Mitsumori et al., 2005),(C. Nobata et al., 1999),(K. Yamamoto et al., 2003)

– BIO Tagging scheme; individual tokens are tagged • B- beginning of entity

• I- inside entity

• O- outside entity

• ISSUES: when boundaries overlap

• Question & Answering:

• Unlike general Q & A, – Domain specific quering resulting in crisp and

precise answers

• Different from other systems as – Limited scope of questions

– Crisp knowledge

• Currently drawing attention of researchers (Y. Hu et al., 2005) (S. Athenikos et al., 2010)

Resources

• MEDLINE

– One of the most important resources in biomedical domain for mining

– Stack of bibliographic material on bio-medicine from 1946 to 2013 and onwards

– Aka PubMed

– www.ncbi.nlm.nih.gov/pubmed

• Source: http://www.ncbi.nlm.nih.gov/pubmed/?term=lactobacilus

• OSHUMED – consisting of 348,566 references (out of a total of

over 7 million), covering all references from 270 medical journals over a five-year period (1987-1991), published data of over 5 years#

– more data from TREC Genomes track from 1994-2003##

– Cross references PubMed, clinically oriented subset of MEDLINE

#TREC-9 filtering track collections.http://trec.nist.gov/data/t9_filtering.html ##TREC genomics track data. http://ir.ohsu.edu/genomics/data.html.

• BioCreAtivE ® – L. Hirschman, M. Colosimo, A. Morgan, and A. Yeh. Overview of BioCreAtIvE task 1B:

Normalized gene lists.BMC Bioinformatics, 6(Suppl 1):S11, 2005.

• Metamap (MMTx) - NLM

– http://mmtx.nlm.nih.gov/

• Negex, Context – University of Pittsburg – BluLab

– http://www.dbmi.pitt.edu/blulab/index.html

• Ctakes – Mayo Clinic

– https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/OHNLP_Documentation_and_Downloads

Bibliography

• U. Leser and J. Hakenberg. What makes a gene name? namedentity recognition in the Biomedical literature.Briefings in Bioinformatics, 6(4):357–369, 2005.

• L. Smith, L. Tanabe, R. Johnson nee Ando, C.-J. Kuo, I.-F. Chung, C.-N. Hsu, Y.-S. Lin, R. Klinger, C. Friedrich, K. Ganchev, M. Torii, H. Liu, B. Haddow, C. Struble, R. Povinelli, A. Vlachos, W. Baumgartner, L. Hunter, B. Carpenter, R. Tzong-Han Tsai, H.-J. Dai, F. Liu, Y. Chen, C. Sun, S. Katrenko, P. Adriaans, C. Blaschke, R. Torres, M. Neves, P. Nakov, A. Divoli, M. ManaLopez, J. Mata, and W. Wilbur. Overview of BioCreAtIve II: Gene mention recognition.Genome Biology, 9(Suppl 2):S2, 2008.

• O. Uzuner, B. R. South, S. Shen, and S. L. DuVall. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.Journal of the American Medical Informatics Association, 18(5):552–556, 2011.

• J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier. Introduction to the bio-entity recognition task at JNLPBA. InProceedings of the International Joint workshop on Natural LanguageProcessing in Biomedicine and its Applications, pages 70–75, 2004.

• M. Krauthammer and G. Nenadic. Term identification in the biomedical literature. Journal of Biomedical Informatics, 37(6):512–526, 2004.

• Y. Tsuruoka and J. Tsujii. Boosting precision and recall of dictionary-based protein name recognition. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 41–48, 2003.

• Y. Tsuruoka and J. Tsujii. Probabilistic term variant generator

• for biomedical terms. In Proceedings of the26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pages 167–173, 2003.

• K. Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science yournal articles: Enzyme interactions and protein structures. InPacific Symposium on Biocomputing, pages 502–513, 2000.

• R. Gaizauskas, G. Demetriou, P. J. Artymiuk, and P. Willett. Protein structures and information extraction from biological texts: The PASTA system.Bioinformatics, 19(1):135–143, 2003.

• J. Kazama, T. Makino, Y. Ohta, and J. Tsujii. Tuning support vector machines for biomedical named entity recognition. InProceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain - Volume 3, pages 1–8,

2002. • K. Yamamoto, T. Kudo, A. Konagaya, and Y. Matsumoto. Protein name tagging for

biomedical annotation in text. In Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine - Volume 13, pages 65–72, 2003.

• H. Yu, C. Sable, and H. Zhu. Classifying medical questions based on an evidence taxonomy. InProceedings of the AAAI 2005 Workshop on Question Answering in Restricted Domains, 2005.

Thank You

Biomedical Text Mining: A survey - Harsh...

Transcript of Biomedical Text Mining: A survey - Harsh...

Biomedical Text Mining: A survey - Harsh...

Documents

Transcript of Biomedical Text Mining: A survey - Harsh...

Biomedical Optics Instrumentation - psicorp.com · Biomedical optical systems for ophthalmic researchers, clinicians, ... Biomedical Imaging Instruments ... Joel Schuman Institutions:

SOFTWARE DESIGN DESCRIPTIONuser.ceng.metu.edu.tr/~e1746221/docs/SDDv1.1.pdf · Leş Koding Baran KÜÇÜKGÜZEL 2013 Batuhan TAŞDÖVEN Ali Barış UZUNER Bekir ÖZTÜRK SOFTWARE

BSc Biomedical Science Course Handbook 2016-17 Biomedical Science Course... · Course Handbook BSc (Hons) Biomedical Science/Applied Biomedical Science 2016-17

Mech 203 Term Project Automobile Differential and Transmission Hakan Uzuner Ümmü Koç Ömer Fidan.

TheFinancialEdge Banks Guide - BlackbaudThe Banks page organizes bank tasks into categories: General tasks, Payment tasks, Administrative tasks, Cash Management tasks , and Deposit

Uzuner Alüminyum, Çevre ve Kalite Politikalarıyla ...uzuneraluminyum.com/images/basin/roportaj-29.04.2016.pdf · Sektör 4 Uzuner Alüminyum is at the Aluminium Profiles and Facade

arXiv:2004.10964v3 [cs.CL] 5 May 2020(x2). We consider four domains (biomedical and computer science publications, news, and reviews; x3) and eight classiﬁcation tasks (two in each

Recent Advances in Biomedical - WSEAS · Recent Advances in Biomedical Electronics and Biomedical Informatics Proceedings of the 2nd WSEAS International Conference on BIOMEDICAL ELECTRONICS

tato-layout 2015-2 · TATO biomedical settings o @ biomedical TATO iiiiiiiiiiiiiiiiiiiiiiiiiiiii . biomedical TATO biomedical . Title: tato-layout 2015-2.cdr Author: Daniele Created

Biomedical Engineering and Biomedical Informatics Program

Department of Biomedical Engineering, University …...Risk Management - “The systematic application of management policies, procedures and practices to the tasks of analyzing, evaluating

UZUNER ALÜMİNYUM METAL ve İNŞAAT SAN. TİC. LTD. ŞTİ | …uzuneraluminyum.com/images/basin/karadeniz-gozde4.pdf · 2018. 4. 4. · gozde07-uzuner aliminyum:Layout 1 03.022018

BioMedical Engineering - Rutgers University, Biomedical ...bme.rutgers.edu/sites/default/files/uploads/Biomedical Engineering... · Biomedical Engineering 80,000 square-foot state

KOBE Biomedical Innovation Cluster · Kenichi TAMIYA Director, Biomedical Policy Biomedical Innovation and New Industry Headquarters KOBE CITY GOVERNMENT KOBE Biomedical Innovation

COLLEGE OF ENGINEERING Biomedical Engineering€¦ · COLLEGE OF ENGINEERING Biomedical Engineering WHY BIOMEDICAL ENGINEERING? Biomedical engineering encompasses a broad range of

Conversations and Quality Aiming for Educationally Valuable Talk in Online Discussions Sedef Uzuner Ruchi Mehta Educational Theory and Practice University.

Background - REBECCA ROACHErebeccaroache.weebly.com/uploads/2/4/9/0/24901968/...However, while biomedical enhancements do not produce dramatic improvements on specific tasks, their

Introduction to Biomedical Engineeringhome.ee.ntu.edu.tw/classnotes/bme2/2006/Biomedical optics... · 2011-11-09 · Introduction to Biomedical Engineering Kung-Bin Sung Biomedical

Biomedical Engineering - NVAO TUE wo... · The QANU Review Committee Biomedical Engineering has fulfilled its tasks with great dedication in a period marked by the transition to the

NANOROBOTIC CHALLENGES IN BIOMEDICAL APPLICATIONS, DESIGN ... · PDF fileNANOROBOTIC CHALLENGES IN BIOMEDICAL APPLICATIONS, DESIGN ... of processing logic tasks by bio-computers ...