Macquarie University Workshop on Text Mining and Health

39
Macquarie University Workshop on Text Mining and Health Diego Moll´ a Macquarie University, Sydney, Australia http://comp.mq.edu.au/research/collaboration-workshops/2014-mq-clinical-nlp/ 26 September 2014

description

Slides of the opening presentation at the Macquarie University workshop on Text Mining and Health, http://comp.mq.edu.au/research/collaboration-workshops/2014-mq-clinical-nlp/

Transcript of Macquarie University Workshop on Text Mining and Health

Page 1: Macquarie University Workshop on Text Mining and Health

Macquarie University Workshop on Text Miningand Health

Diego Molla

Macquarie University,Sydney, Australia

http://comp.mq.edu.au/research/collaboration-workshops/2014-mq-clinical-nlp/

26 September 2014

Page 2: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Contents

1 About the Workshop

2 Text Mining for Evidence Based MedicineThe Scenario

3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering

4 In Progress / Future Research

Text Mining and Health 2014 Diego Molla 2/36

Page 3: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Aims of the Workshop

Bring together

Medical researchers andpractitioners

Researchers in text miningand related areas

Why?

Find ideas for collaboration

Text Mining and Health 2014 Diego Molla 3/36

Page 4: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Some Statistics

Registered: 50+Presentations: 13 + 1

Institutions represented

1 Macquarie University

2 IBM Research

3 The University of Melbourne

4 Defense Science andTechnology Organisation

5 The University of Queensland

6 RMIT University

7 Monash University

8 Royal Melbourne Hospital

9 Alfred Health

10 Queensland University ofTechnology

11 The Commonwealth Scientificand Industrial ResearchOrganisation

12 Semantic Software Asia Pacific

13 The University of New SouthWales

14 Bond UniversityText Mining and Health 2014 Diego Molla 4/36

Page 5: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

ProgramTime Session

8:45 – 9:00 Registration

9:00 – 9:30 Diego MollaIntroduction and research ideas — Text Mining for Evidence Based Medicine

9:30 – 10:30 Session 1 (6 presentations)Antonio Jimeno: Text analytics for Healthcare at IBM Research — AustraliaKarin Verspoor: Syndromic Surveillance from Emergency Department triage notesTudor Groza: Phenotype concept recognition: State of the art and future directionsSimon Kocbek: Topic modeling of Emergency Department Triage notes for characterising pain-relatedchief complaintsLawrence Cavedon: Text mining for lung cancer cases over large patient admission dataReza Haffari: Intelligent Analysis of Health Record Data

10:30 – 10:45 Break

10:45 – 11:55 Session 2 (7 presentations)Guido Zuccon: Towards Exploiting Inference from Semantic Annotations for Medical InformationRetrievalLaurianne Sitbon: Delivering Clinical Information Extraction Tools to PractitionersDung Xuan Thi Le: A Transformation of Free Text to Semantic Data for Analysis PurposesMark Johnson: Extracting and Exploiting Relational Information in Text Data MiningGuy Tsafnat: Agent-based evidence gathering, synthesis and disseminationMiew Keen Choong: Automatic clinical evidence discovery with citation networksAdam Dunn: Automatic classification of published clinical articles using metadata instead of content

11:55 – 12:30 Discussion and closing

Text Mining and Health 2014 Diego Molla 5/36

Page 6: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Thanks to . . .

Department of Computing

Centre for Language Sciences (CLaS)

. . . you all!

Text Mining and Health 2014 Diego Molla 6/36

Page 7: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Contents

1 About the Workshop

2 Text Mining for Evidence Based MedicineThe Scenario

3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering

4 In Progress / Future Research

Text Mining and Health 2014 Diego Molla 7/36

Page 8: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Contents

1 About the Workshop

2 Text Mining for Evidence Based MedicineThe Scenario

3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering

4 In Progress / Future Research

Text Mining and Health 2014 Diego Molla 8/36

Page 9: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Evidence Based Medicine

http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/

Text Mining and Health 2014 Diego Molla 9/36

Page 10: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

The Search Space is Huge

Text Mining and Health 2014 Diego Molla 10/36

Page 11: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Suggested Steps in EBM

http://hlwiki.slais.ubc.ca/index.php?title=Five_steps_of_EBM

Text Mining and Health 2014 Diego Molla 11/36

Page 12: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Where can Research in Text Processing Help?Questions:

Help formulateanswerable questions.Question analysis andclassification.

Search:

Retrieve and rankrelevant literature.Extract theevidence-basedinformation.Summarise the results.

Appraisal: Classify theevidence.

Text Mining and Health 2014 Diego Molla 12/36

Page 13: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Where can Research in Text Processing Help?Questions:

Help formulateanswerable questions.Question analysis andclassification.

Search:

Retrieve and rankrelevant literature.Extract theevidence-basedinformation.Summarise the results.

Appraisal: Classify theevidence.

Text Mining and Health 2014 Diego Molla 12/36

Page 14: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Where can Research in Text Processing Help?Questions:

Help formulateanswerable questions.Question analysis andclassification.

Search:

Retrieve and rankrelevant literature.Extract theevidence-basedinformation.Summarise the results.

Appraisal: Classify theevidence.

Text Mining and Health 2014 Diego Molla 12/36

Page 15: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Contents

1 About the Workshop

2 Text Mining for Evidence Based MedicineThe Scenario

3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering

4 In Progress / Future Research

Text Mining and Health 2014 Diego Molla 13/36

Page 16: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Contents

1 About the Workshop

2 Text Mining for Evidence Based MedicineThe Scenario

3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering

4 In Progress / Future Research

Text Mining and Health 2014 Diego Molla 14/36

Page 17: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Journal of Family Practice’s “Clinical Inquiries”

Text Mining and Health 2014 Diego Molla 15/36

Page 18: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Components of the Corpus

Question Direct extract from the source.

Answer Split from the source and manually checked.

Evidence Extracted from the source.

Additional text Manually extracted from the source and massaged.

References PMID looked up in PubMed (automatic and manualprocedure).

Text Mining and Health 2014 Diego Molla 16/36

Page 19: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Corpus Statistics

Size

456 questions (“records”).

1,396 answer parts (“snips”).

3,036 answer justifications (“longs”).

3,705 references:

2,908 unique references.2,657 XML abstracts from PubMed.

Text Mining and Health 2014 Diego Molla 17/36

Page 20: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Our Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids

(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids

(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques

Text Mining and Health 2014 Diego Molla 18/36

Page 21: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Our Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids

(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids

(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques

Text Mining and Health 2014 Diego Molla 18/36

Page 22: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Contents

1 About the Workshop

2 Text Mining for Evidence Based MedicineThe Scenario

3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering

4 In Progress / Future Research

Text Mining and Health 2014 Diego Molla 19/36

Page 23: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Single-document Query-based Summarisation

Input

Which treatments work best for hemorrhoids?

Abstract of Greenspon J, Williams SB, Young HA ,et al. Thrombosedexternal hemorrhoids: outcome after conservative or surgicalmanagement. Dis Colon Rectum. 2004; 47: 1493-1498.

OutputA retrospective study of 231 patients treated conservatively or surgically foundthat the 48.5% of patients treated surgically had a lower recurrence rate thanthe conservative group (number needed to treat [NNT]=2 for recurrence atmean follow-up of 7.6 months) and earlier resolution of symptoms (average 3.9days compared with 24 days for conservative treatment).

Text Mining and Health 2014 Diego Molla 20/36

Page 24: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Extractive Summarisation by Sarker et al. (CBMS 2012)

Input

Which treatments work best for hemorrhoids?

Abstract of Greenspon J, Williams SB, Young HA ,et al. Thrombosedexternal hemorrhoids: outcome after conservative or surgicalmanagement. Dis Colon Rectum. 2004; 47: 1493-1498.

OutputThe aim was to test the efficacy of local application of nifedipine ointment in healing acute thrombosed externalhemorrhoids.Results obtained were as follows: complete relief of pain in 43 patients (86 percent) of the nifedipine-treated groupas opposed to 24 patients (50 percent) of the control group after 7 days of therapy (P < 0.01); oral analgesicswere used by 4 patients (8 percent) in the nifedipine-treated group as opposed to 26 patients (54.1 percent) of thecontrol group after 7 days of therapy (P < 0.01); and resolution of acute thrombosed external hemorrhoids wasachieved after 14 days of therapy in 46 patients (92 percent) of the nifedipine-treated group, as opposed to 22patients (45.8 percent) of the control group (P < 0.01).Our study clearly demonstrates that the use of topical nifedipine, which at present is for treatment ofcardiovascular disorders, is a reliable new option in the conservative treatment of thrombosed external hemorrhoids.

Text Mining and Health 2014 Diego Molla 21/36

Page 25: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

General Approach (Sarker et al., CBMS 2012)

In a Nutshell1 Gather statistics from the best 3-sentence extracts.

Exhaustive search to find these best extracts.Used ROUGE to automatically compare the extracts with thetarget output.

2 Build three classifiers, one per sentence in the final extract.

Classifier 1 based on statistics from best 1st sentence.Classifier 2 based on statistics from best 2nd sentence.Classifier 3 based on statistics from best 3rd sentence.

Text Mining and Health 2014 Diego Molla 22/36

Page 26: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Results

System F-Score 95% CI Percentile (%)

L3 0.159 0.155–0.163 60.3O3 0.161 0.158–0.165 77.5R 0.158 0.154–0.161 50.3O 0.159 0.155–0.164 60.3PI 0.160 0.157–0.164 69.4

PD 0.166 0.162–0.170 97.3

L3=Last three sentences. O3=Last three PIBOSO outcome sentences.R=Random. O=All outcome sentences. PI=Sentence position independent.PD=Sentence position dependent (our proposal).

Text Mining and Health 2014 Diego Molla 23/36

Page 27: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Contents

1 About the Workshop

2 Text Mining for Evidence Based MedicineThe Scenario

3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering

4 In Progress / Future Research

Text Mining and Health 2014 Diego Molla 24/36

Page 28: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

The ALTA 2011 Shared Task

The ALTA Shared Tasks

Competitions where all participantsare evaluated on the same data.

The ALTA 2011 shared task wasbased on evidence grading.

The Data

Clusters of abstracts.

The SOR grade of each cluster.

The SORT Taxonomy

A Consistent and good-qualitypatient-oriented evidence.

B Inconsistent or limited-qualitypatient-oriented evidence.

C Consensus, usual practise, opinion,disease-oriented evidence, or caseseries for studies of diagnosis,treatment, prevention, orscreening.

Data Fragment41711 B 10553790 15265350

53581 C 12804123 16026213 14627885

53583 B 15213586

52401 A 15329425 9058342 11279767

Text Mining and Health 2014 Diego Molla 25/36

Page 29: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Cascaded Classification (Molla & Sarker, ALTA 2011)

Process: Cascaded SVMs1 Default class: B.

2 SVMs with abstract n-grams to identify A and C.

3 SVMs with publication types to identify A and C.

4 SVMs with title n-grams to identify A and C.

Results

Method Accuracy C I

Majority (B) 48.63% 41.5 – 55.83Cascaded SVMs 62.84%

http://corine13.c.o.pic.centerblog.net/h7f1xcsu.jpg

Text Mining and Health 2014 Diego Molla 26/36

Page 30: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Contents

1 About the Workshop

2 Text Mining for Evidence Based MedicineThe Scenario

3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering

4 In Progress / Future Research

Text Mining and Health 2014 Diego Molla 27/36

Page 31: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Clustering for EBM Summarisation

InputQUESTION:Which treatments workbest for hemorrhoids?

DOCUMENTS:[11289288] [12972967][1442682] [15486746][16235372] [16252313][17054255] [17380367]

clustering

=⇒

Output

1 [11289288] [12972967][15486746]

2 [17054255] [17380367]

3 [1442682] [16252313][16235372]

Text Mining and Health 2014 Diego Molla 28/36

Page 32: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Clustering Approach (Shash & Molla 2013)

K -means(non-overlappingclustering).

Unigram-basedfeatures.

lowercased, stopwords removed,tf.idf ofremainingwords.

Text Mining and Health 2014 Diego Molla 29/36

Page 33: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Results

Table 1: Average entropy for optimal K clusters.

UMLS UMLSMeasure Whole XML Abstract only concepts only semantic types

Euclidean 0.260 0.264 0.274 0.310Correlation 0.348 0.362 0.349 0.347Cosine 0.249 0.266 0.277 0.298Dice 0.332 0.328 0.324 0.334Jaccard 0.320 0.330 0.317 0.327Manhattan 0.288 0.299 0.305 0.296

Entropy of pure random clustering is − log2(1/K ) = 1.263.

Text Mining and Health 2014 Diego Molla 30/36

Page 34: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Contents

1 About the Workshop

2 Text Mining for Evidence Based MedicineThe Scenario

3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering

4 In Progress / Future Research

Text Mining and Health 2014 Diego Molla 31/36

Page 35: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

In Progress: A Proof-of-Concept System (Michael vanTreeck, Masters of IT) I

Text Mining and Health 2014 Diego Molla 32/36

Page 36: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

In Progress: A Proof-of-Concept System (Michael vanTreeck, Masters of IT) II

Text Mining and Health 2014 Diego Molla 33/36

Page 37: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

In Progress: Identifying Keywords of the Answer (JiweiGuan, Masters of Research)

Keyword Extraction Techniques

tf.idf

Using Part of Speech

Using information from the answer

. . .

Keyphrase Extraction Techniques

C-Value, NC-Value

Part of Speech Patterns

. . .

Text Mining and Health 2014 Diego Molla 34/36

Page 38: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Future Research

Fine-tune search techniques

Incorporate question types

Label the clusters

Combine single summaries

Test with real people

Text Mining and Health 2014 Diego Molla 35/36

Page 39: Macquarie University Workshop on Text Mining and Health

About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research

Thank You

Questions?

Further information about our research:http://web.science.mq.edu.au/~diego/medicalnlp/

Diego

Text Mining and Health 2014 Diego Molla 36/36