Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio...

32
Samuel Rönnqvist Knowledge-lean Text Mining Lectio praecursoria

Transcript of Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio...

Page 1: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Samuel Rönnqvist

Knowledge-lean Text Mining

Lectio praecursoria

Page 2: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Motivation of Text Mining

“As long as the centuries continue to unfold,the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe [...]

When that time comes, a project, until thenneglected because the need for it was not felt,will have to be undertaken.”

– Denis Diderot, 1755

Introduction | Methods | Applications | Conclusions

Page 3: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Motivation of Text Mining

“As long as the centuries continue to unfold,the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe [...]

When that time comes, a project, until thenneglected because the need for it was not felt,will have to be undertaken.”

– Denis Diderot, 1755

Since the 1990s,the amount of readily available digital text has exploded.

Introduction | Methods | Applications | Conclusions

Page 4: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Definition of Text Mining

Text mining is computational text analysis for understanding abundant text,focusing on what it tells about the world.

Introduction | Methods | Applications | Conclusions

Page 5: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Foundations of Text Mining

Text mining applies language processing for knowledge discovery,primarily by means of machine learning.

Linguistics

Computational Linguistics

Machine Learning

Mathematics

Data Mining (Application-oriented

knowledge discovery)

Text Mining

Computer era

Internet eraNLP

(Language processing tools)

Domain of application

Introduction | Methods | Applications | Conclusions

Page 6: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Problems of Text Mining

Language understanding is a hallmark of intelligence (cf. Turing)

→ Human language is challenging for Artificial Intelligence

→ Language processing typically requires substantial manual encoding of knowledge (by experts)

→ Knowledge resource development is a bottleneckfor the application of text mining to novel tasks

How to democratize text mining?

Introduction | Methods | Applications | Conclusions

Page 7: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Knowledge-lean Text Mining

Approaches to knowledge leanness:

1. Interactive visualization2. Data-driven modeling

Introduction | Methods | Applications | Conclusions

Page 8: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Approaches to Knowledge-lean Text Mining

1. Interactive visualization: joining computational and human processing

Models provide useful abstractions and structuring of vast data.

Interaction can enable close-knit collaborative loops betweencomputer and user for data exploration and model optimization.

It helps integrate the users knowledge and intuition, without coding→Knowledge leanness!

Data Model User

Visual Interactive Interface

AbstractionsObservations Understanding

World

Observations

Introduction | Methods | Applications | ConclusionsIntroduction | Methods | Applications | Conclusions

Page 9: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Approaches to Knowledge-lean Text Mining

2. Data-driven modeling: it’s all about co-occurrence

Using heuristics (and some machine learning) for discovering structure in data:

Relation modeling

– Entity co-occurrences → entity networks

Semantic modeling

– Word co-occurrence within documents → topic models

– Word co-occurrence within sentence-level contexts → word vectors

Semantic-predictive modeling

– Co-occurrence between data types (text and event data)→ data expansion for predictive modeling

Introduction | Methods | Applications | Conclusions

Page 10: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Approaches to Knowledge-lean Text Mining

2. Data-driven modeling: it’s all about co-occurrence

Using heuristics (and some machine learning) for discovering structure in data:

Relation modeling

– Entity co-occurrences → entity networks

Semantic modeling

– Word co-occurrence within documents → topic models

– Word co-occurrence within sentence-level contexts → word vectors

Semantic-predictive modeling

– Co-occurrence between data types (text and event data)→ data expansion for predictive modeling

Introduction | Methods | Applications | Conclusions

Page 11: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Data-driven Semantic Modeling

Can we derive word meaning from word co-occurrence patterns in raw text?

“You shall know a word by the company it keeps”– J.R. Firth (1957)

Introduction | Methods | Applications | Conclusions

Page 12: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Data-driven Semantic Modeling

Word wt

Input layer

Projection layer

Output layer

Word wt+1

Word wt+2

Word wt-2

Word wt-1

Deriving word meaning from co-occurrences:For instance, by training a neural network to obtain word vectors.

Target word

Context words

Millions or billions of words

of raw textWord vectors

Word2Vec by Mikolov et al. (2013)Introduction | Methods | Applications | Conclusions

Page 13: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Data-driven Semantic Modeling

Word wt

Input layer

Projection layer

Output layer

Word wt+1

Word wt+2

Word wt-2

Word wt-1

Deriving word meaning from co-occurrences:For instance, by training a neural network to obtain word vectors.

Target word

Context words

Millions or billions of words

of raw textWord vectors

Word2Vec by Mikolov et al. (2013)Introduction | Methods | Applications | Conclusions

Page 14: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Data-driven Semantic Modeling

Word wt

Input layer

Projection layer

Output layer

Word wt+1

Word wt+2

Word wt-2

Word wt-1

Deriving word meaning from co-occurrences:For instance, by training a neural network to obtain word vectors.

Target word

Context words

Millions or billions of words

of raw textWord vectors

0 0 0 1 0

0.1 -0.3 -0.10.9 0.6

Word2Vec by Mikolov et al. (2013)Introduction | Methods | Applications | Conclusions

Page 15: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Word vectors embed words into a continuous semantic space: distinct symbols → comparable coordinates

Data-driven Semantic Modeling

MondayTuesday Paris

London

apple

fruit

crisisturmoil

realizerealise

In reality: vector(“fruit”) = [-0.39712, 0.31127, 0.1399, -0.99456, -0.0081426, 0.51614, 0.043088, ...]

(e.g., 100 latent dimensions)Introduction | Methods | Applications | Conclusions

Page 16: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

A vector model may describe the meaning of a million words.

Visualization may provide easier insight.

Visualizing Semantic Vector Models

Introduction | Methods | Applications | Conclusions

Page 17: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

A vector model may describe the meaning of a million words.

Visualization may provide easier insight.

Visualizing Semantic Vector Models

Example: Visualizing topic structures in a text collection

Introduction | Methods | Applications | Conclusions

Page 18: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

A vector model may describe the meaning of a million words.

Visualization may provide easier insight.

Visualizing Semantic Vector Models

Example: Visualizing topic structures in a text collection

Introduction | Methods | Applications | Conclusions

Page 19: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Semantic representations are useful asdata abstractions in prediction tasks, too.

For instance, to detect financial crises based on news.

Predicting with Semantic Vector Models

Introduction | Methods | Applications | Conclusions

Page 20: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Example: Bank distress levels in Europe 2007-2014, from 6.6M news articles

Knowledge-lean approach:

– 243 known events, heuristically expanded to 700k training instances

– Sentence vectors as representations

Predicting with Semantic Vector Models

2007 2008 2009 2010 2011 2012 2013 2014

Page 21: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Two-step neural network training:

– Unsupervised learning of sentence vectors

– Supervised learning to predict events

Predicting with Semantic Vector Models

Event

Introduction | Methods | Applications | Conclusions

Page 22: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Text data enable detecting and describing events,e.g., as crisis hits Belgium in September 2008:

Predicting with Semantic Vector Models

Saturday, 27 September 2008 (relevance 0.921, rank 2): "Fortis investors face a weekend of uncertainty after the banking and insurance group went out of its way on Friday to reassure them that it was solvent and in no danger of collapse following market talk the company could become another casualty of the credit crisis."

Saturday, 27 September 2008 (relevance 0.917, rank 3): "As of Saturday, financial authorities were contacting other institutions, a source familiar with the situation told Reuters, although no particular solution was preferred and nothing concrete was likely to emerge before Sunday."

Sunday, 28 September 2008 (relevance 0.758, rank 6): "BRUSSELS (Reuters) - Belgium's national pride and thousands of jobs are at stake as the Belgian and Dutch governments, central banks and regulators seek to secure the future of financial services group Fortis."

Monday, 29 September 2008 (relevance 0.889, rank 5): "Belgian, Dutch and Luxembourg governments rescued Fortis over the weekend to prevent a domino-like spread of failure by buying its shares for 11.2 billion euros."

Introduction | Methods | Applications | Conclusions

Page 23: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Highly relevant descriptions are found.

However, in order to better summarize textsstructure beyond sentences should be analyzed.

→ Discourse parsing

More Prediction: Discourse Parsing

Introduction | Methods | Applications | Conclusions

Page 24: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Highly relevant descriptions are found.

However, in order to better summarize textsstructure beyond sentences should be analyzed.

→ Discourse parsing

More Prediction: Discourse Parsing

Example: Implicit discourse relations

“But the market turmoil could be partially beneficial for some small businesses. In a sagging market, the Federal Reserve System might flood the market with funds, and that should bring interest rates down.”

Relation type? (Contrast, cause, conjunction, restatement, ...)

Introduction | Methods | Applications | Conclusions

Page 25: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Highly relevant descriptions are found.

However, in order to better summarize textsstructure beyond sentences should be analyzed.

→ Discourse parsing

More Prediction: Discourse Parsing

Example: Implicit discourse relations

“But the market turmoil could be partially beneficial for some small businesses. [Since] In a sagging market, the Federal Reserve System might flood the market with funds, and that should bring interest rates down.”

Relation type: Contingency (cause)

Introduction | Methods | Applications | Conclusions

Page 26: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Highly relevant descriptions are found.

However, in order to better summarize textsstructure beyond sentences should be analyzed.

→ Discourse parsing

More Prediction: Discourse Parsing

Model? Discourse relation type(Sentence 1, Sentence 2)

Introduction | Methods | Applications | Conclusions

Example: Implicit discourse relations

“But the market turmoil could be partially beneficial for some small businesses. [Since] In a sagging market, the Federal Reserve System might flood the market with funds, and that should bring interest rates down.”

Relation type: Contingency (cause)

Page 27: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

More Prediction: Discourse Parsing

Deep neural network architectures for predicting implicit discourse relations using word vectors (embeddings)

1. Feed-forward network 2. Recurrent network with attention

Introduction | Methods | Applications | Conclusions

Page 28: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

More Prediction: Discourse Parsing

Deep neural network architectures for predicting implicit discourse relations using word vectors (embeddings)

1. Feed-forward network 2. Recurrent network with attention

Introduction | Methods | Applications | Conclusions

Page 29: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Because word vectors are learned from raw text, they are language independent.

Discourse parsing on Chinese:

More Prediction: Discourse Parsing

CONJUNCTIONRelation:

会会会谈谈谈 就就就 一一一些些些 原原原则则则 和和和 具具具体体体 问问问题题题 进进进行行行 了了了 深深深入入入 讨讨讨论论论 ,,, 达达达成成成 了了了 一一一些些些 谅谅谅解解解

双双双方方方 一一一致致致 认认认为为为 会会会谈谈谈 具具具有有有 积积积极极极 成成成果果果

In the talks, they discussed some principles and specific questions in depth, and reached some understandings

Both sides agree that the talks have positive results

(State-of-the-art performance predicting implicit relations)

Introduction | Methods | Applications | Conclusions

Page 30: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

More Prediction: Discourse Parsing

CONJUNCTIONRelation:

会会会谈谈谈 就就就 一一一些些些 原原原则则则 和和和 具具具体体体 问问问题题题 进进进行行行 了了了 深深深入入入 讨讨讨论论论 ,,, 达达达成成成 了了了 一一一些些些 谅谅谅解解解

双双双方方方 一一一致致致 认认认为为为 会会会谈谈谈 具具具有有有 积积积极极极 成成成果果果

In the talks, they discussed some principles and specific questions in depth, and reached some understandings

Both sides agree that the talks have positive results

Complex models for complex problems become increasingly difficult to interpret.

Visualizing the network’s attention helps highlight what it focuses on as it makes its decisions.

s of models.

Introduction | Methods | Applications | Conclusions

Page 31: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Modeling and Visualization

CONJUNCTIONRelation:

会会会谈谈谈 就就就 一一一些些些 原原原则则则 和和和 具具具体体体 问问问题题题 进进进行行行 了了了 深深深入入入 讨讨讨论论论 ,,, 达达达成成成 了了了 一一一些些些 谅谅谅解解解

双双双方方方 一一一致致致 认认认为为为 会会会谈谈谈 具具具有有有 积积积极极极 成成成果果果

In the talks, they discussed some principles and specific questions in depth, and reached some understandings

Both sides agree that the talks have positive results

Data Model User

Visual Interactive Interface

AbstractionsObservations Understanding

World

Observations

Model visualization may provide insight intothe model, data and underlying problem.

Visualization of model output, model-structured data and internal model parameters.

Introduction | Methods | Applications | Conclusions

Page 32: Knowledge-lean Text Miningusers.abo.fi/sronnqvi/pub/lectio.pdf · Knowledge-lean Text Mining Lectio praecursoria. Motivation of Text Mining “As long as the centuries continue to

Knowledge-lean text mining:

– Seeks to avoid the bottleneck of knowledge encoding

– Supports easy exploration of new tasks, domains of application, and languages→ Thereby helping to democratize text mining

– Can be achieved through a combination of data-driven modeling and visualization (scalable, quantitative modeling + integration of versatile, qualitative human understanding)

– In particular, my thesis demonstrates knowledge-lean approaches for:– Introducing text mining to the domain of systemic financial risk– Open-ended exploration of topics in text– Multilingual parsing of discourse structure

Conclusions

Introduction | Methods | Applications | Conclusions