SEASR and UIMA National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Mike Haberman [email protected]



Transcript of SEASR and UIMA

Page 1: SEASR and UIMA


National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

Mike Haberman [email protected]

Page 2: SEASR and UIMA


Unstructured Information Management Applications

Page 3: SEASR and UIMA



Page 4: SEASR and UIMA

UIMA + P.O.S. tagging

Four Analysis Engines to analyze document to record POS information.

OpenNLP Tokenizer

OpenNLP PosTagger

OpenNLP SentanceDetector POSWriter

Serialization of the UIMA CAS

Page 5: SEASR and UIMA

UIMA Structured data

•  POSWriter is a CAS Consumer

–  Extracted data from the CAS

–  Ready for import into SEASR

Page 6: SEASR and UIMA

UIMA + P.O.S. tagging: step 1

Page 7: SEASR and UIMA

UIMA + P.O.S. tagging: step 2

Page 8: SEASR and UIMA

UIMA + P.O.S. tagging: step 3

Page 9: SEASR and UIMA

UIMA + P.O.S. tagging: step 4

Page 10: SEASR and UIMA

UIMA Structured data

•  Two SEASR examples using UIMA POS data

–  Frequent patterns (rule associations) on nouns (fpgrowth)

–  Sentiment analysis on adjectives

Page 11: SEASR and UIMA

UIMA to SEASR: Experiment I

•  Finding patterns

Page 12: SEASR and UIMA

SEASR + UIMA: Frequent Patterns

Frequent Pattern Analysis on nouns

•  Goal:

–  Discover a cast of characters within the text

–  Discover nouns that frequently occur together

•  character relationships

Page 13: SEASR and UIMA

Frequent Patterns: nouns

•  Use of item sets in fpgrowth

•  What’s new:

–  handling sparse item sets

Transac'onId ItemA



1 0 1 1

2 1 1 1

3 1 0 1

4 1 0 0


Page 14: SEASR and UIMA

Frequent Patterns: nouns

•  What’s new:

–  handling sparse item sets






Page 15: SEASR and UIMA,,

Reads UIMA’s CAS consumer output •  url of the UIMA data source

Frequent Patterns: nouns

SEASR Flow (similar to fpgrowth demo) {word=tom}

{word=answer} {word=tom} {word=lady,word=spectacles,word=room,word=thing,word=boy,word=state,word=pair,word=pride,word=heart,word=style,word=service,word=pair,word=stove-lids,word=moment,word=furniture} {word=bed,word=broom,word=breath,word=punches,word=nothing,word=cat} {word=aunt,word=polly,word=moment,word=laugh} {word=boy,word=anything,word=aint,word=tricks,word=fools,word=fools,word=can't,word=dog,word=tricks,word=goodness,word=days,word=body,word=dander,word=minute,word=lick,word=duty,word=boy,word=lord,word=truth,word=goodness,word=spare,word=rod,word=child,word=good,word=book,word=sin,word=suffering,word=old,word=scratch,word=laws-a-me,word=sister,word=boy,word=thing,word=heart,word=conscience,word=heart,word=breaks,word=well-a-well,word=man,word=woman,word=days,word=trouble,word=scripture,word=hookey,word=evening,word=southwestern,word=afternoon,word=saturdays,word=boys,word=holiday,word=work,word=anything,word=duty,word=ruination,word=child}

Enter number of sentences to group

Enter support: 10%

Page 16: SEASR and UIMA

Frequent Patterns: visualization

Analysis of Tom Sawyer 10 paragraph window Support set to 10%

Page 17: SEASR and UIMA

Frequent Patterns: nouns

•  Recap: SEASR flow information

•  The repository location is:


•  Reads UIMA’s CAS consumer output

–  Select file/url of the UIMA data source


•  Similar to fpgrowth demo

Page 18: SEASR and UIMA

UIMA + SEASR: Frequent Patterns

•  Extensions

–  Analysis for separate chapters

•  Discover new relationships that occur over small windows

–  Adjectives, Adverbs

•  Common, repeating word usage, phrases

–  Entity Extraction: Dates, Locations, Geo

Page 19: SEASR and UIMA

UIMA to SEASR: Experiment II

•  Sentiment Analysis

Page 20: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Classifying text based on its sentiment

–  Determining the attitude of a speaker or a writer

–  Determining whether a review is positive/negative

Page 21: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Ask: What emotion is being conveyed within a body of text?

–  Look at only adjectives (UIMA POS)

•  lots of issues, challenges, and but’s “but … “

Page 22: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Need to Answer:

–  What emotions to track?

–  How to measure/classify an adjective to one of the selected emotions?

–  How to visualize the results

Page 23: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Which emotions:




•  Parrot’s classification (2001)

–  six core emotions

–  Love, Joy, Surprise, Anger, Sadness, Fear

Page 24: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

Page 25: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  How to classify adjectives:

–  Lots of metrics we could use …

•  Lists of adjectives already classified


–  Need a “nearness” metric for missing adjectives

–  How about the thesaurus game ?

Page 26: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Using only a thesaurus, find a path between two words

–  no antonyms

–  no colloquialisms or slang

Page 27: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  How to get from delightful to rainy ?

['delightful', 'fair', 'balmy', 'moist', 'rainy'].

['sexy', 'provocative', 'blue', 'joyless’]

['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]

•  sexy to joyless?

•  bitter to lovable?

Page 28: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Use this game as a metric for measuring a given adjective to one of the six emotions.

•  Assume the longer the path, the “farther away” the two words are.

•  address some of issues

Page 29: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  SynNet: a traversable graph of synonyms (adjectives)

Page 30: SEASR and UIMA

SynNet: rainy to pleasant

Page 31: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  SynNet Metrics

•  Common nodes

•  Path length

•  Symmetric: a->b->c c->b->a

•  Link strength:

•  tangy->sweet

•  sweet->lovable

•  Use of slang or informal usage

Page 32: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Common Nodes

•  depth of common

Page 33: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Symmetry of path in common nodes

Page 34: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Find the shortest path between adjective and each emotion:

•  ['delightful', 'beatific', 'joyful']

•  ['delightful', 'ineffable', 'unspeakable', 'fearful']

•  Pick the emotion with shortest path length

•  tie breaking procedures

Page 35: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Not a perfect solution

–  still need context to get quality

•  Vain –  ['vain', 'insignificant', 'contemptible', 'hateful'] –  ['vain', 'misleading', 'puzzling', 'surprising’]

•  Animal –  ['animal', 'sensual', 'pleasing', 'joyful'] –  ['animal', 'bestial', 'vile', 'hateful'] –  ['animal', 'gross', 'shocking', 'fearful'] –  ['animal', 'gross', 'grievous', 'sorrowful']

•  Negation –  “My mother was not a hateful person.”

Page 36: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  A word about WordNet


•  English nouns, verbs, adjectives and adverbs organized into sets of synonyms (synsets)

Page 37: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Adjective islands

•  There is no path from delightful to happy

•  happy: {beaming, beamy, effulgent, felicitous, glad, happy, radiant, refulgent, well-chosen}

Page 38: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Process Overview

•  Extract the adjectives (UIMA POS analysis)

•  Read in adjectives (SEASR library)

•  Label each adjective (SynNet)

•  Summarize windows of adjectives

•  lots of experimentation here

•  Visualize the windows

Page 39: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Visualization

•  New SEASR visualization component

•  Based on flare ActionScript Library


•  Still in development


Page 40: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

Page 41: SEASR and UIMA

UIMA + SEASR: Sentiment Analysis

•  Extensions

•  Adverbs, nouns, verbs

•  Analysis of metrics, etc

•  Goal and Relevancy

•  Two new components

•  SynNet

•  Flash based visualization of sequential based data