Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo...

65
Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel

Transcript of Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo...

Page 1: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Text Mining Webinar

The Textprocessing Extension

Rosaria Silipo and Kilian Thiel

Page 2: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

KNIME Text Mining Webinar 2

Agenda

• Text Mining Goals and Usage

• Enrichment & Preprocessing

• Data Types & Structures

• Visualization

• Topic Detection

• Sentiment Analysis

KNIME Copyright 2013

Page 3: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

KNIME Text Mining Webinar 3

Install TextProcessing Extension

KNIME:

www.knime.org

Install Textprocessing Extension

under KNIME Labs

KNIME Copyright 2013

Page 4: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Examples

Example Workflows available on the KNIME

public server.

KNIME Copyright 2013 KNIME Text Mining Webinar 4

Page 5: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Filtering,

Stemming, ...

5

Text Mining Workflow

Create

New

Document

Enrichment Transformation Preprocessing

Frequencies

Visualization Data Mining

Read or Create a

new Document

Add Tags (POS, Sentiment,

cities, domain

specific terms, ...)

Bow

TF abs, TF rel, ,

IDF, ...

Classification for

Topic Detection

Tag Cloud and Document

Properties Visualization

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 6: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

6

1 - Create a Document

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 7: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

7

Document

Encapsulates text, author,

title, source, category,

and type

New Data Types

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 8: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

From a Folder

8

The output is

a list of

Documents

?

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 9: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

From PUBMED

9

The output is a

list of Documents

KNIME Copyright 2013 KNIME Text Mining Webinar

Destination

Folder MUST be

EMPTY!

Page 10: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Strings to Document

10KNIME Copyright 2013 KNIME Text Mining Webinar

Page 11: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

From RSS Feeds

11

http://feeds.nytimes.com/nyt/rss/World Palladian Nodes

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 12: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

12

Reviews of Restaurants in Berlin from

TripAdvisor

Self-downloaded with RSS Feeder

The Data Set

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 13: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

13

2 - Enrichment

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 14: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

14

Document

Encapsulates text, author,

title, source, category,

and type

Term

Encapsulates a term

New Data Types

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 15: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

15

Enrichment (Tagging)

Enrichment nodes (mostly) change the

granularity of terms.

– Multiword detection, named entity recognition,

part of speech definition, …

– To each detected entity (term) a tag is added,

specifying its type.

– To avoid intersection of granularity the last

node dominates.

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 16: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

16

In case of intersections of granularity the last node overwrites.

Example: “The gene interleukin 6 interacts ....”

1.POS tagger: “The\DT gene\NN interleukin\NN 6\CD interacts \VBZ ”

2.NE tagger: “The\DT gene\NN interleukin 6\GENE interacts \VBZ ”

Tagger Conflict Resolution

POS

Tagger

NE

Tagger

Adds POS tags to

terms.

Adds NE tags to terms and overrides other

conflicting tags.

Overwrite!

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 17: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Tagging

17

No settingsLanguage +

POS Model

No settings

Abner Model

Named Entity

Dictionary Column

Tag Type

Tag Value

Named Entity

Tags can be set

as unmodifiable

Dictionary Column

Tag Type

Tag Value

Matching StrategyKNIME Copyright 2013 KNIME Text Mining Webinar

Page 18: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

18

Unmodifiable Named Entity Tags

Named Entities Tags attached

through enrichment nodes can

be set as unmodifiable

Unmodifiable Tags are not

affected by any preprocessing

nodes (stemming, filtering,

etc.)

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 19: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Workflow

Strings to Document Workflow + POS Tagger

19KNIME Copyright 2013 KNIME Text Mining Webinar

Page 20: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

20

3 - Transformation

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 21: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Data Types and Features

21

• Sentence

• Term

• Molecule Structure

• Document Vector

• String

• Document

• Tags

• Meta Info

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 22: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

22

Parsing / Tokenization

Parser nodes parse documents by applying

standard tokenization via OpenNLP

tokenizer.

Each token is a term consisting of a single

word.

Tags are applied to terms.

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 23: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

23

Parsing

Annotations label a

section part in the

document (like

“Abstract”, “Title”,

etc.)

Terms consist of

tokens and tags

Document

Sections:

Paragraphs:

Sentences:

Terms:

Token:

Tags:

se1 se2

p1 p2

s1 s2 s3

Abstract

Title

te1 te2 te3 te4 te5

to1 to2 tokens tokens tokens

ta1 tags tags tags tagsta2

to3

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 24: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

The Bag of Words

24KNIME Copyright 2013 KNIME Text Mining Webinar

List (Bag) of Terms (Words)

identified in Document

Each Term is

extracted with

Tags (NN, VB, …)

No Settings

required

Page 25: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Data and Sentence Extractor

26KNIME Copyright 2013 KNIME Text Mining Webinar

Page 26: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Conversions

27KNIME Copyright 2013 KNIME Text Mining Webinar

Molecular

Structure

Page 27: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Workflow

KNIME Copyright 2013 KNIME Text Mining Webinar 28

Page 28: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

29

4 - Preprocessing

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 29: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

30

Normal– Faster

– Preprocesses only terms of the term column

– Documents are not changed

Deep– Slower

– Preprocesses terms of the term column

– Terms in documents are changed as well

– Unchanged documents can be appended

Preprocessing Mode

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 30: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Filtering by Tags and Terms

KNIME Copyright 2013 KNIME Text Mining Webinar 31

biomedical named entity tags

Any tags

numbers, words <N chars,

modifiable terms

POS, chemical, and pharma tags

French Treebank (POS) tags

Stop word terms

RegEx terms

STTS tags

Remove Punctuation from document

Standard Named Entity Filter tags

Page 31: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Converting and Replacing

KNIME Copyright 2013 KNIME Text Mining Webinar 32

Groups rows by term value

Term case conversion

RegEx based term replacer

Term replacement with dictionary word

Introducing Hyphenation

(Liang’s algorithm)

Page 32: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Stemming

• Kuhlen Stemmer (English only)

• Porter (English only)

• Snowball (English, German, French, ...)

Stemmed term replaces original term!

KNIME Copyright 2013 KNIME Text Mining Webinar 33

convert

convertingconvert[]

Page 33: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Workflow

KNIME Copyright 2013 KNIME Text Mining Webinar 34

Page 34: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

35

5 - Frequencies

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 35: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Frequency Measures

• TF

• TF absolute = # occurr. of term t

• TF relative = # occurr. of term t/ # terms

• IDF = log(1 + # docs / # docs with term t)

• ICF = log(1 + # cat./ # cat. with term t)

• IDF * TF36KNIME Copyright 2013 KNIME Text Mining Webinar

Term Frequency

Inverse Document Frequency

Inverse Category Frequency

Page 36: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Frequency Filter

KNIME Copyright 2013 KNIME Text Mining Webinar 37

Frequency Column

Top K Rows

K

Min max

thresholds

Values between

min threshold and

max threshold

Page 37: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Workflow

KNIME Copyright 2013 KNIME Text Mining Webinar 40

Page 38: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

41

6 - Visualization

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 39: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Document Viewer

KNIME Copyright 2013 KNIME Text Mining Webinar 42

Double-click

opens the

documentSearch engines listed in KNIME->Textprecessing-

>Search Engine Preferences

Right-click word list

search engines

Page 40: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Document Viewer

KNIME Copyright 2013 KNIME Text Mining Webinar 43

Previous and

next document

Page 41: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Tag Cloud

KNIME Copyright 2013 KNIME Text Mining Webinar 44

Adj, verbs and nouns

as same word

Page 42: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Asian Restaurants

KNIME Copyright 2013 KNIME Text Mining Webinar 45

Page 43: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

German Food Restaurants

KNIME Copyright 2013 KNIME Text Mining Webinar 46

Page 44: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Fast Food Restaurants

KNIME Copyright 2013 KNIME Text Mining Webinar 47

Page 45: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Hiliting in Tag Clouds

KNIME Copyright 2013 KNIME Text Mining Webinar 48

vietnamis

Interactive

Table

Page 46: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Workflow

KNIME Copyright 2013 KNIME Text Mining Webinar 49

Page 47: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

50

7 - Topic Classification

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 48: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Document Vector

51KNIME Copyright 2013 KNIME Text Mining Webinar

Bitvector or

frequency

measureDocument Vector: Documents

represented in the terms space

Page 49: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Term Vector

52KNIME Copyright 2013 KNIME Text Mining Webinar

Term Vector: Terms represented

in the documents space

Bitvector or

frequency

measure

Page 50: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Topic Detection Goal

Possible Topics:

- Asian Restaurants

- German Food

- Fast Food

Pre-labeled data set available!

Target Topic is in Document Category.

KNIME Copyright 2013 KNIME Text Mining Webinar 53

Page 51: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Topic Detection

After the Document Vector Transformation,

topic detection becomes just another data

analytics problem.

KNIME Copyright 2013 KNIME Text Mining Webinar 54

80% for training set, 20% for test set.

Target = Category

Page 52: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Classification Sub-Workflow

KNIME Copyright 2013 KNIME Text Mining Webinar 55

Prepare input

data as word in

Document 1/0

Extract

Category

Training/

Testing

X-Validation

Wrong docs

Page 53: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Problem 1

Sometimes a pre-labeled data set is not

available.

1. Use a Clustering technique

2. Find a similar pre-labeled data sets that

you can adapt to the current problem

KNIME Copyright 2013 KNIME Text Mining Webinar 56

Page 54: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Problem 2

The vector generated by the Document

Vector node can be high dimensional.

To reduce the input space dimensionality you

can:

- Filter words by frequency

- Detect keywords and only use the most

important ones.

KNIME Copyright 2013 KNIME Text Mining Webinar 57

Page 55: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Keywords Extractor Nodes

KNIME Copyright 2013 KNIME Text Mining Webinar 58

From: "KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor" by Yukio Ohsawa.

From:"Keyword extraction from a single document using word co-occurrence statistical information" by Y.Matsuo and M. Ishizuka.

Max. # keywords

per doc

Page 56: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

59

8 - Sentiment Analysis

KNIME Copyright 2013 KNIME Text Mining Webinar

Page 57: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Sentiment Corpus

• MPQA Corpus

with negative

and positive

words

• Tag Words

according to

Corpus with a

Dictionary

Tagger Node

KNIME Copyright 2013 KNIME Text Mining Webinar 60

Page 58: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Dictionary Tagger Node

KNIME Copyright 2013 KNIME Text Mining Webinar 61

List of Words

from Corpus

Tag Value to attach to

Words in document

and in Corpus List

Tag Type to attach to

Words in document

and in Corpus List

Each Tag Type has a

list of possible Tag

Values

NE = Named Entities

Page 59: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Sentiment By Document

KNIME Copyright 2013 KNIME Text Mining Webinar 62

PERSON Tag for

POSITIVE Words

TIME Tag for

NEGATIVE Words

ATTITUDE = +/-1 for

POSITIVE or

NEGATIVE Words

MEAN(ATTITUDE)

on all words for

each document

Bin1 = negative documents

Bin2, Bin3 = neutral documents

Bin4 = positive documentsAdjust Bin boundaries in

Auto-Binner node for more

accurate results

Page 60: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Sentiment by Category

• Tag Cloud on all words in all documents for

a given category

• Words in tag clouds are colored by

sentiment

KNIME Copyright 2013 KNIME Text Mining Webinar 63

New column NE

with values:

PERSON, TIME,

neutral (when

missing value)

Color by

NE valueAsian

Fast

Food

German

Cuisine

Page 61: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Tag Cloud by Category

Asian Fast Food German

KNIME Copyright 2013 KNIME Text Mining Webinar 64

Page 62: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Sentiment by Word

• Find Most Positive/Most Negative

Document

• Build Tag Cloud with words colored by

sentiment

KNIME Copyright 2013 KNIME Text Mining Webinar 65

Sorting

Ascending

Sorting

Descending

Page 63: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Tag Cloud of Worst/Best Doc

Most Positive Most Negative

KNIME Copyright 2013 KNIME Text Mining Webinar 66

Page 64: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

Improvements

• Add polarity change for negations

• Add polarity changes for enhancements

• Improve positive/negative dictionary

• Improve bin distribution (skeweness

towards positive)

• Improve list of stop words

• Remove Names of Burgers?

KNIME Copyright 2013 KNIME Text Mining Webinar 67

Page 65: Text Mining Webinar - KNIME€¦ · Text Mining Webinar The Textprocessing Extension Rosaria Silipo and Kilian Thiel. KNIME Text Mining Webinar 2 Agenda ... Text Mining Workflow Create

68

Thank you

KNIME Copyright 2013 KNIME Text Mining Webinar

[email protected]