Knowledge-driven Implicit Information Extraction
-
Upload
sujan-perera -
Category
Data & Analytics
-
view
172 -
download
0
Transcript of Knowledge-driven Implicit Information Extraction
1
Knowledge-driven Implicit Information Extraction
Sujan PereraDissertation Committee : Drs. Amit P. Sheth (advisor), Krishnaprasad
Thirunarayan, Michael Raymer, Pablo N. Mendes (IBM Research)
Ph.D. Dissertation Defense
2
Information Extraction
• More than 70% of data in organizations exist in unstructured form1
• Extraction of structured information from unstructured data is a fundamental task
“All home medications although his insulin dose (nph 20 qPM) was halved (--> NPH 10 qPM) on the floor, and his sugars were running in the 150s-250s range.”
Insulin
Cisapride
contradicti
ng drug
Diabetes Mellitus
Hyperglycemia
may_treat
may treat
Proinsulin
Porcine Insulin Insulin Glulisine
is a is a
is a
1https://en.wikipedia.org/wiki/Unstructured_data
3
Information Extraction
• Almost exclusively focused on explicit information
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have any chest pain.”
4
Information Extraction
• Almost exclusively focused on explicit information
Named Entity Recognition Relationship ExtractionEntity Linking
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have any chest pain.”
Person Person C0018795
C0015672
C0008031
5
Information Extraction
• Misses the implicit information
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have any chest pain.”
Person Person C0018795
C0015672
C0008031
No shortness of breath
edema
Named Entity Recognition Relationship ExtractionEntity Linking Implicit information extraction
6
Thesis Statement
Implicit factual information in unstructured text can be efficiently extracted by bridging syntactic and semantic gaps in natural language
usage and augmenting information extraction techniques with relevant domain knowledge.
7
• Express sarcasm/sentiment• “I'm striving to be positive in what I say on Twitter. So I'll refrain
from making a comment about the latest Michael Bay movie.”• Provide descriptive information• “small fluid adjacent to the gallbladder with gallstones which may
represent inflammation”• Emphasize features of the entity• “Mason Evans 12 year long shoot won big in golden globe”
• Communicate the common understanding• “He is suffering from nausea and severe headaches. Dolasteron was
prescribed.”• Stylistic Preferences• “Democratic candidate Bernie Sanders … The Vermont senator …”
Credit:http://bit.ly/2b9Bnjk
8
Significance
• Volume• 20% movie references and 40% book references in tweets• 35% edema and 40% shortness of breath references in clinical
narratives• Value
Explicit InformationComputer Assisted Coding
30-day Readmission Prediction
Sentiment Analysis
Structured Information
9
Significance
• Volume• 20% movie references and 40% book references in tweets• 35% edema and 40% shortness of breath references in clinical
narratives• Value
Ignoring implicit information in text would adversely affect downstream applications
Explicit Information
Implicit Information
Computer Assisted Coding
30-day Readmission Prediction
Sentiment Analysis
Structured Information
10
Role of Knowledge
New Sandra Bullock astronaut lost in space movie looks absolutely terrifying
The patient showed accumulation of fluid in his extremities, but respirations were unlabored and there
were no use of accessory muscles.
Edema Accumulation of an excessive amount of watery fluid in cells or intercellular tissues
Shortness of breath
Labored or difficult breathing associated with a variety of disorders
UMLS
Sandra Bullock Gravity
Knowledge Bases
WordNet
Image credits: http://bit.ly/2b5HPDQ and Icon made by Freepik from www.flaticon.com
Credit: http://bit.ly/2bi34FGCredit: http://bit.ly/1x3sack Credit: http://bit.ly/2b9CejW Credit: http://bit.ly/2aXM97v
11
Knowledge Acquisition
Knowledge Modeling
Detecting Implicit
Information
Information
Extraction
Implicit Information Extraction
12
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
13
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
14
Sentence Entity
“small fluid adjacent to the gallbladder with gallstones which may represent inflammation.”
Cholecystitis
“His tip of the appendix is inflamed.” Appendicitis
“The respirations were unlabored and there were no use of accessory muscles.” Shortness of breath (NEG)
Implicit Entities in Clinical Documents
• One should know the physiological observations that characterize particular entity
• Negations are embedded in the phrases indicating entities• “Patient denies shortness of breath”• “The respirations were unlabored”
15
Knowledge Acquisition
• Unified Medical Language System – integrate many health and biomedical vocabularies
• Linguistic Knowledge – WordNet• Synonyms/antonyms• Syntactic variations of the same term
CUI AUI STR
CUI TUI
CUI STR DEF SABDefinitions for shortness of breath
A disorder characterized by an uncomfortable sensation of difficulty breathing
Difficult or labored breathing
Labored or difficulty breathing associated with a variety of disorders, indicating inadequate ventilation or low blood oxygen or a subjective experience of breathing discomfort
16
Knowledge Modeling
• Each entity has multiple definitions• Each definition is processed to create entity indicator
• Representative power of the term (r1) calculated with measure inspired by TF-IDF
• A collection of entity indicators constitute entity model
definition1
definition2
definition3
Entity Indicator1
Entity Indicator2
Entity Indicator3
Entity Model
Definition Entity Indicator
A disorder characterized by an uncomfortable sensation of difficulty breathing
(uncomfortable, r1), (sensation, r2), (difficulty, r3), (breathing, r4)
Difficult or labored breathing (difficult, r5), (labored, r6), (breathing, r4)
17
Detecting Sentences with Implicit Entities
• The sentences with entity representative term but without the entity name may have implicit mention of the entity.
“However, Mr. Smith is comfortably breathing in room air.”
Candidate sentence for shortness of breath
18
• The similarity between entity model and the pruned sentence is measured to annotate them with positive or negative labels
• We developed a semantic similarity measure that takes care of the synonyms and antonyms
Information Extraction – Entity Linking
Candidate Sentence
Indicator1
Indicator2
Indicator3
Entity Model
sim1
sim2
sim3
19
Information Extraction – Entity Linking
ct1
ct2
ct3
ct4
et5
et6
et7
Candidate Sentence Entity Indicator
WordNet
If antonym then -1
else max similarity
∑ 𝑠𝑖𝑚∗𝑟𝑝𝑒𝑡
∑ 𝑟𝑝𝑒𝑡
>t1
<t2
Positive Annotation
Negative Annotation
20
Evaluation
• Re-annotated the SemEval-2014 task 7 dataset for implicit entities
• Entities are selected considering the frequency of appearance and with expert feedback
• 857 sentences selected for 8 entities
• Annotated by three domain experts
• Annotation agreement 0.58
Entity Positive Annotations
Negative Annotations
Shortness of Breath 93 94
Edema 115 35
Syncope 96 92
Cholecystitis 78 36
Gastrointestinal Gas 18 14
Colitis 12 11
Cellulitis 8 2
Fasciitis 7 3
21
Algorithm Positive Precision
Positive Recall
Positive F1
Negative Precision
Negative Recall
Negative F1
Our 0.66 0.87 0.75 0.73 0.73 0.73
MCS 0.50 0.93 0.65 0.31 0.76 0.44
SVM 0.73 0.82 0.77 0.66 0.67 0.67
Adding similarity value as a feature for the supervised algorithmSVM+MCS 0.73 0.82 0.77 0.66 0.66 0.66
SVM+Our 0.77 0.85 0.81 0.72 0.75 0.73
• Baselines• MCS algorithm (Mihalcea 2006)• SVM (trained on n-grams)
• Our algorithm outperforms selected baselines in negative category.• SVM is able to leverage the supervision to beat our algorithm in
positive category.
Annotation Performance
22
Similarity as a Feature to Supervised Algorithm
• Added similarity value of unsupervised algorithms as a feature to the SVM.
Positive Annotations Negative Annotations
23
Annotation Performance – A Study with the Confidence
• Each annotation has confidence ranges from 1 to 5
• Low confidence reflects incomplete or ambiguous information
• Annotation performance increases as the confidence increases
• The negative class shows significant increment
24
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
25
• Use diverse characteristics of the entity– “New Sandra Bullock astronaut lost in space movie looks absolutely
terrifying”– “ISRO sends probe to Mars for less money than it takes Hollywood to send a
woman to space.”– “oh yeah there is that new space movie coming out that looks terrifying i am
going to go see it”
• Use time-sensitive phrases
Furious 7Gravity The Martian
Fall 2013 April 2014 Fall 2015
space movie
fastest movie to earn $1 billion
Paul walkers’ last movie
Tweets with Implicit Entities
Credit: http://bit.ly/2bkePJ6
26
• Use diverse characteristics of the entity– “… Richard Linklater movie …”– “… Ellar Coltrane on his 12-year movie …”– “… 12-year long movie shoot …”– “… Mason Evan's childhood movie …”
• Use time-sensitive phrases
Furious 7Gravity The Martian
Fall 2013 April 2014 Fall 2015
space movie
fastest movie to earn $1 billion
Paul walkers’ last movie
Tweets with Implicit Entities
Credit: http://bit.ly/2bk8xdp
27
Knowledge Acquisition
• Acquiring factual knowledge• Source – DBpedia• Not all factual knowledge is important – movie has ‘starring’ and
‘director’ as well as ‘billed‘ and ‘license’• Rank the relationships based on joint probability with the entity type• Values of top-k relationships and the value of rdfs:comment are obtained
• Acquiring contextual knowledge• Source – contemporary tweets• We collect 1000 tweets with explicit mentions of the entity
• Number of views for the entity’s Wikipedia page within last t days
28
Knowledge Acquisition
Wikipedia page titles and anchor texts
Contemporary tweets
Generate semantic cues
Factual knowledge
Clean tweets
Generate n-grams
• Need to extract meaningful phrases from acquired knowledge
• Meaningful phrases = Wikipedia titles + anchor texts• Matching n-grams are added to semantic cues• Non-matching n-grams are added to semantic cues
after removing stop words
29
Knowledge Modeling – Entity Model Network
Sandra BullockAlfonso Curan
Mars orbiter mission
Woman in space
astronaut
• A property graph - reflecting the topical relationships between entities
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦=¿𝑁∨ ¿¿𝑁𝑐 𝑗
∨¿¿¿
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦=𝑡𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 h𝑝 𝑟𝑎𝑠𝑒 𝑖𝑛𝑡𝑤𝑒𝑒𝑡𝑠
number of Wikipedia views
𝑁−𝑡𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠 ,𝑁 𝑐 𝑗𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑑𝑗𝑎𝑐𝑒𝑛𝑡 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠
Factual Knowledge
Contextual Knowledge
Entity
Gravity
Christopher Nolan
Matt Damon
Interstellar
The Martian
30
Detecting Tweets with Implicit Entities
• Tweets are filtered with keywords – movie, film, book, novel• Applied simple annotation technique – dictionary matching• The tweets that are not annotated with entity of types we are
looking for are considered to have implicit entity mentions
KeywordsEntity
Dictionary
Annotating Tweets
31
Information Extraction – Entity Linking
• Two Step Process
• Step 1: Candidate selection and filtering• Objective - prune the search space to reduce number of entities to be
considered in disambiguation step from EMN
• Step 2: Disambiguation• Objective - sort the selected candidate entities to place the implicitly
mentioned entity in top position
32
Entity Linking - Candidate selection and filtering
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c4
c6
c3
c2
c9
c7
“ISRO sends probe to Mars for less money than it takes Hollywood movie to send a woman to space”
m8
EntityFactual Knowledge Contextual Knowledge
33
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c6
c3
c2
c9
c7
“ISRO sends probe to Mars for less money than it takes Hollywood movie to send a woman to space” c5
c2 c7
c8
m8
Factual Knowledge Contextual Knowledge Entity
Entity Linking - Candidate selection and filtering
c4
34
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c6
c3
c2
c9
c7
c5c2
m1
m2
m4
m5
m3
c7
c8
m6
m7
m8
Factual Knowledge Contextual Knowledge Entity
“ISRO sends probe to Mars for less money than it takes Hollywood movie to send a woman to space”
Entity Linking - Candidate selection and filtering
c4
35
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c6
c3
c2
c9
c7
c5c2
m1
m2
m4
m5
m3
𝑠𝑐𝑜𝑟𝑒𝑚𝑖= ∑
𝑐 𝑗𝜖ℂ𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 𝑜𝑓 𝑐 𝑗∗ 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑐 𝑗 ,𝑚𝑖)
c7
c8
m6
m7
m2
m4
m6
m7
m3
is the set of matching cues
m8
Factual Knowledge Contextual Knowledge Entity
“ISRO sends probe to Mars for less money than it takes Hollywood movie to send a woman to space”
Entity Linking - Candidate selection and filtering
c4
36
• Formulated as a ranking problem
• SVMrank to rank candidates• Similarity between the candidate entity and the tweet
• Temporal salience of the candidate entity
x1 x2 x3 … xn
xj
𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑒𝑒𝑖∑𝑒∈𝐸 𝑐
𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑒𝑒
is the selected candidate setm2m6
m4m3m7
Winner
Entity Linking - Disambiguation
37
Evaluation Dataset
Entity Type Annotation Tweets Entity
Movie Explicit 391 107
Implicit 207 54
NULL 117 0
Book Explicit 200 24
Implicit 190 53
NULL 70 0
• Tweets are collected in August 2014 using keywords • Manually annotated the tweets with DBpedia URL of entities
• The tweets annotated with NULL do not have either explicit or implicit mention of an entity
38
Entity Model Network Creation
• 15,000 tweets for movies and books in July 2014
• 617 movies and 102 books
• Recent 1000 tweets per entity to build its contextual knowledge
• May 2014 version of DBpedia used to extract factual knowledge
• Temporal salience is obtained for July 2014
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c4
c6
c3
c2
c9
c7
Factual Knowledge Contextual Knowledge Entity
39
• How many tweets had correct entity within selected candidate set (top-25)?• How many entities were correctly linked by our disambiguation approach?
• Importance of contextual knowledge
Evaluation - Implicit Entity Linking
Entity Type Candidate Selection Recall Disambiguation accuracy
Movie 90.33% 60.97%
Book 94.73% 61.05%
Step Entity Type Without Contextual Knowledge
With Contextual Knowledge
Candidate Selection Recall
Movie 77.29% 90.33%
Book 76.84% 94.73%
Disambiguation Accuracy
Movie 51.7% 60.97%
Book 50.0% 61.05%
40
Qualitative Error Analysis
Error Tweet Entity
Lack of contextual knowledge
‘That Movie Where Shailene Woodley Has Her First Nude Scene? The Trailer Is RIGHT HERE!: No one can say Shailene Woodley isn't brave!’
White Bird in a Blizzard
Novel entities ‘”hey, what's wrawng widdis goose?" RT @TIME: Mark Wahlberg could be starring in a movie about the BP oil spill http://ti.me/1oZh55V'
Deepwater Horizon
Cold start of entities ‘Video: George R.R. Martin's Children's Book Gets Re-releasehttp://bit.ly/1qNNH5r’
The Ice Dragon
Multiple implicit entity mentions
‘That moment when you realize that hazel grace and Augustus are brother and sister in one movie and in love battling cancer in another’
Divergent, The Fault in Our Stars
41
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
42
Implicit Relationships in Clinical Narratives
atrial fibrillation hypertension
diabetes
chest pain
weight gain
headache
lisinopril
warfarin
insulin
atenolol
medication
disease
symptomis_treated_with
has_symptom
43
• Implicit relationships:• Exist between symptoms, disorders, medications, and procedures• Can be established by leveraging domain knowledge
• The existing knowledge bases fall short in eliciting relationships• Data + Knowledge can help to elicit such implicit relationships
efficiently
Implicit Relationships in Clinical Narratives
44
A Scenario
Atrial fibrillation
Hypertension
Diabetes
Fatigue
Syncope
Weight loss
Chest painDiscomfort in chest
DizzyShortness of Breath
NauseaVomitingHeadacheCoughWeight gain
45
A Scenario
Atrial fibrillation
Hypertension
Diabetes
Fatigue
Syncope
Weight loss
Chest painDiscomfort in chest
DizzyShortness of Breath
NauseaVomitingHeadacheCoughWeight gain
Atrial fibrillation
Hypertension
Diabetes
Chest pain
Weight gain
Discomfort in chest
CoughHeadache
Edema
Shortness of Breath
Knowledge base does not know about edema. Now edema can be a symptom of any disorder in the document.
Observed Disorders
Observed Symptoms
46
Knowledge Acquisition
• Hierarchical knowledge and non-hierarchical knowledge
Hierarchical Knowledge
Retrieved from UMLS
Non-hierarchical Knowledge
Extracted from Web Resources
+Feedback from domain expert
www.nlm.nih.gov www.en.wikipedia.org
www.webmd.com www.mayoclinic.com
www.clevelandclinic.org ww.healthline.org
CUI AUI PAUI PTR
C0013404 A0052186 A0111363 A0434168.A2367943. …
C0013604 A0052723 A0135504 A0434168.A2367943
CUI AUI SAB STR
C0013404 A0052186 MSH Shortness of breath
C0013604 A0052723 MSH Edema
MRHIER
MRCONSO
47
Hypertension
Diastolic Hypertension
Pulmonary Hypertension
Renal Hypertension
Episodic Pulmonary Hypertension
Solitary Pulmonary Hypertension
Breathing Problems
Shortness of Breath
Asthma
is_symptom_of
Instances of symptomsInstances of disorders
Shortness of Breath
Hypertension
Classes of disorders Classes of symptoms
rdfs:subclassOf rdf:type
Knowledge Modeling
48
Detecting Unexplained Symptoms
• Clinical documents were semantically annotated for entities using cTAKES
• Known relationships are populated• Unexplained symptoms were detected Modeled
Knowledge
Credit:http://bit.ly/2aMWVAd
49
Information Extraction – Unknown Relationships
• Naïve method would assume relationship between unexplained symptom and all disorders in clinical narrative
• Can we leverage the knowledge we have about symptom to find most plausible disorders?
• Intuition: a symptom is most likely to be shared by similar disorders
50
1. All co-occurring disorders are candidates
Information Extraction – Unknown Relationships
D1
S
D2
D3
D4
D5
51
2. Find known disorders of the symptom
D1
S
D6
D7
D2
D3
D4
D5
Information Extraction – Unknown Relationships
52
3. Collect more knowledge about
known relationships
D1
S
D6
D7
D2
D3
D4
D5
D7
D8 D2
D10 D11
D12
D4
D14
Information Extraction – Unknown Relationships
53
4. Compare co-occurring disorders with collected
knowledge
D1
S
D6
D7
D2
D3
D4
D5
D7
D8 D2
D10 D11
D12
D4
D14
Information Extraction – Unknown Relationships
54
5. Eliminate non-matching candidate
disorders
S
D2
D4
We left with most plausible disorders for unexplained symptom. If this scenario occurs frequently, it increases the confidence on this
relationship.
Information Extraction – Unknown Relationships
55
Evaluation
• A corpus of 1,500 electronic medical records were used• Annotated with cTAKES and selected the most frequent entities
were selected• UMLS semantic types were used to categorize disorders and
symptoms• Initial knowledge base - 86 disorders, 42 symptoms, 255 disorder-
symptom relationships
56
• There were 29 distinct unexplained symptoms
• Precision of the questions generated • 1st iteration - 105 correct from 142 (73.94%)• 2nd iteration - 20 correct from 29 (68.96%)• 3rd iteration - 4 correct from 9 (44.44%)
Evaluation – Relationship Prediction
Symptom Number of unexplained instances
Edema 910
Syncope 336
Systolic Murmur 168
Tachycardia 143
Angina 136
Disorder Number of co-occurrences
Hypertension 647
Hyperlipidemia 641
Claudication 454
Coronary atherosclerosis 395
Coronary artery disease 242
Top 5 unexplained symptom Top 5 co-occurring disorders with edema
57
Evaluation – Increment in Explainability
Knowledge base Number of unexplained relationships
Increment in explainability
Initial knowledge base 2251 0%
After 1st iteration 878 60.99%
After 2nd iteration 806 64.19%
58
Summary
• Implicit information is frequent occurrence in text and ignoring them would adversely affect downstream applications.
• Linguistic and world Knowledge plays an important role in decoding implicit information.
• This dissertation demonstrated characteristics of implicit information and developed solution to capture factual implicit constructs.
Knowledge Acquisition
Knowledge Modeling
Detecting Implicit
Information
Information Extraction
UMLS
TaxonomicalDefinitional
Non-taxonomicalAssociational
Representative terms
Domain Semantics Semi-supervised
Supervised
Unsupervised
59
Contributions
• Identify and demonstrate the value of implicit information.• Study the characteristics of the implicit information manifestation.• Demonstrate the value of knowledge in extracting factual implicit
information.- Linguistic - Domain -
Contextual• Developed a framework for factual implicit information extraction.• Demonstrated the usage of the framework to solve three implicit
information extraction problems.
60
Graduate [email protected] Publications:• Sujan Perera, Cory Henson, Krishnaprasad Thirunarayan, Amit Sheth, Suhas
Nair, Semantics Driven Approach for Knowledge Acquisition from EMRs, IEEE Journal of Biomedical and Health Informatics.
• Raminta Daniulaityte, Robert Carlson, Russel Falck, Delroy Cameron, Sujan Perera, Lu Chen and Amit Sheth. I just wanted to tell you that loperamide WILL WORK': A Web-Based Study of Extra-Medical Use of Loperamide.
Conference Publications:• Sujan Perera, Pablo Mendes, Adarsh Alex, Amit Sheth, Krishnaprasad
Thirunarayan, Implicit Entity Linking in Tweets, ESWC 2016• Sujan Perera, Pablo Mendes, Amit Sheth, Krishnaprasad Thirunarayan, Adarsh Alex,
Christopher Heid, Greg Mott, Implicit Entity Recognition in Clinical Documents, *SEM 2015
• Sujan Perera, Cory Henson, Krishnaprasad Thirunarayan, Amit Sheth, Suhas Nair, Data Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in the Healthcare, BIBM 2012
• Menasha Thilakaratne, Ruvan Weerasinghe, Sujan Perera, Knowledge-driven Approach to Predict Personality Traits by Leveraging Social Media Data, WI 2016
Workshop and Posters:• Sujan Perera, Amit Sheth, Krishnaprasad Thirunarayan, Challenges in Understanding
Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help, DARE 2013
• Raminta Daniulaityte, Robert Carlson, Russel Falck, Delroy Cameron, Sujan Perera, Lu Chen, Amit Sheth. A Web-Based Study of Self-Treatment of Opioid Withdrawal Symptoms with Loperamide, CPDD 2012
Internships:• ezDI Summer 2012• IBM Watson Summer 2014 and 2015
Awards and grants:• George Thomas Graduate Fellowship • NSF travel grants: BIBM and ICHI
PC Committee:• DARE (2013), EKAW (2014, 2016), ISWC
2015, IJCAI 2016External Reviewer:• ISWC, ESWC, IJSWIS, IEEE Intelligent
Systems, Applied Ontology, ODBASE
Proposal Contributions:• eDrugTrends (NIH R01)• Healthcare Outcome Prediction (NSF-SCH)
Mentoring:• Adarsh Alex (MSc)• Menasha Tilakaratne (BSc)
61
Thank You
Mentors Collaborators
62
Coffee Mates and Colleagues
Thank You
Funding• ezDI• George Thomas Fellowship• NSF: CNS 1513721 Context-
Aware Harassment Detection on Social Media