Samuel Rönnqvist
Knowledge-lean Text Mining
Lectio praecursoria
Motivation of Text Mining
“As long as the centuries continue to unfold,the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe [...]
When that time comes, a project, until thenneglected because the need for it was not felt,will have to be undertaken.”
– Denis Diderot, 1755
Introduction | Methods | Applications | Conclusions
Motivation of Text Mining
“As long as the centuries continue to unfold,the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe [...]
When that time comes, a project, until thenneglected because the need for it was not felt,will have to be undertaken.”
– Denis Diderot, 1755
Since the 1990s,the amount of readily available digital text has exploded.
Introduction | Methods | Applications | Conclusions
Definition of Text Mining
Text mining is computational text analysis for understanding abundant text,focusing on what it tells about the world.
Introduction | Methods | Applications | Conclusions
Foundations of Text Mining
Text mining applies language processing for knowledge discovery,primarily by means of machine learning.
Linguistics
Computational Linguistics
Machine Learning
Mathematics
Data Mining (Application-oriented
knowledge discovery)
Text Mining
Computer era
Internet eraNLP
(Language processing tools)
Domain of application
Introduction | Methods | Applications | Conclusions
Problems of Text Mining
Language understanding is a hallmark of intelligence (cf. Turing)
→ Human language is challenging for Artificial Intelligence
→ Language processing typically requires substantial manual encoding of knowledge (by experts)
→ Knowledge resource development is a bottleneckfor the application of text mining to novel tasks
How to democratize text mining?
Introduction | Methods | Applications | Conclusions
Knowledge-lean Text Mining
Approaches to knowledge leanness:
1. Interactive visualization2. Data-driven modeling
Introduction | Methods | Applications | Conclusions
Approaches to Knowledge-lean Text Mining
1. Interactive visualization: joining computational and human processing
Models provide useful abstractions and structuring of vast data.
Interaction can enable close-knit collaborative loops betweencomputer and user for data exploration and model optimization.
It helps integrate the users knowledge and intuition, without coding→Knowledge leanness!
Data Model User
Visual Interactive Interface
AbstractionsObservations Understanding
World
Observations
Introduction | Methods | Applications | ConclusionsIntroduction | Methods | Applications | Conclusions
Approaches to Knowledge-lean Text Mining
2. Data-driven modeling: it’s all about co-occurrence
Using heuristics (and some machine learning) for discovering structure in data:
Relation modeling
– Entity co-occurrences → entity networks
Semantic modeling
– Word co-occurrence within documents → topic models
– Word co-occurrence within sentence-level contexts → word vectors
Semantic-predictive modeling
– Co-occurrence between data types (text and event data)→ data expansion for predictive modeling
Introduction | Methods | Applications | Conclusions
Approaches to Knowledge-lean Text Mining
2. Data-driven modeling: it’s all about co-occurrence
Using heuristics (and some machine learning) for discovering structure in data:
Relation modeling
– Entity co-occurrences → entity networks
Semantic modeling
– Word co-occurrence within documents → topic models
– Word co-occurrence within sentence-level contexts → word vectors
Semantic-predictive modeling
– Co-occurrence between data types (text and event data)→ data expansion for predictive modeling
Introduction | Methods | Applications | Conclusions
Data-driven Semantic Modeling
Can we derive word meaning from word co-occurrence patterns in raw text?
“You shall know a word by the company it keeps”– J.R. Firth (1957)
Introduction | Methods | Applications | Conclusions
Data-driven Semantic Modeling
Word wt
Input layer
Projection layer
Output layer
Word wt+1
Word wt+2
Word wt-2
Word wt-1
Deriving word meaning from co-occurrences:For instance, by training a neural network to obtain word vectors.
Target word
Context words
Millions or billions of words
of raw textWord vectors
Word2Vec by Mikolov et al. (2013)Introduction | Methods | Applications | Conclusions
Data-driven Semantic Modeling
Word wt
Input layer
Projection layer
Output layer
Word wt+1
Word wt+2
Word wt-2
Word wt-1
Deriving word meaning from co-occurrences:For instance, by training a neural network to obtain word vectors.
Target word
Context words
Millions or billions of words
of raw textWord vectors
Word2Vec by Mikolov et al. (2013)Introduction | Methods | Applications | Conclusions
Data-driven Semantic Modeling
Word wt
Input layer
Projection layer
Output layer
Word wt+1
Word wt+2
Word wt-2
Word wt-1
Deriving word meaning from co-occurrences:For instance, by training a neural network to obtain word vectors.
Target word
Context words
Millions or billions of words
of raw textWord vectors
0 0 0 1 0
0.1 -0.3 -0.10.9 0.6
Word2Vec by Mikolov et al. (2013)Introduction | Methods | Applications | Conclusions
Word vectors embed words into a continuous semantic space: distinct symbols → comparable coordinates
Data-driven Semantic Modeling
MondayTuesday Paris
London
apple
fruit
crisisturmoil
realizerealise
In reality: vector(“fruit”) = [-0.39712, 0.31127, 0.1399, -0.99456, -0.0081426, 0.51614, 0.043088, ...]
(e.g., 100 latent dimensions)Introduction | Methods | Applications | Conclusions
A vector model may describe the meaning of a million words.
Visualization may provide easier insight.
Visualizing Semantic Vector Models
Introduction | Methods | Applications | Conclusions
A vector model may describe the meaning of a million words.
Visualization may provide easier insight.
Visualizing Semantic Vector Models
Example: Visualizing topic structures in a text collection
Introduction | Methods | Applications | Conclusions
A vector model may describe the meaning of a million words.
Visualization may provide easier insight.
Visualizing Semantic Vector Models
Example: Visualizing topic structures in a text collection
Introduction | Methods | Applications | Conclusions
Semantic representations are useful asdata abstractions in prediction tasks, too.
For instance, to detect financial crises based on news.
Predicting with Semantic Vector Models
Introduction | Methods | Applications | Conclusions
Example: Bank distress levels in Europe 2007-2014, from 6.6M news articles
Knowledge-lean approach:
– 243 known events, heuristically expanded to 700k training instances
– Sentence vectors as representations
Predicting with Semantic Vector Models
2007 2008 2009 2010 2011 2012 2013 2014
Two-step neural network training:
– Unsupervised learning of sentence vectors
– Supervised learning to predict events
Predicting with Semantic Vector Models
Event
Introduction | Methods | Applications | Conclusions
Text data enable detecting and describing events,e.g., as crisis hits Belgium in September 2008:
Predicting with Semantic Vector Models
Saturday, 27 September 2008 (relevance 0.921, rank 2): "Fortis investors face a weekend of uncertainty after the banking and insurance group went out of its way on Friday to reassure them that it was solvent and in no danger of collapse following market talk the company could become another casualty of the credit crisis."
Saturday, 27 September 2008 (relevance 0.917, rank 3): "As of Saturday, financial authorities were contacting other institutions, a source familiar with the situation told Reuters, although no particular solution was preferred and nothing concrete was likely to emerge before Sunday."
Sunday, 28 September 2008 (relevance 0.758, rank 6): "BRUSSELS (Reuters) - Belgium's national pride and thousands of jobs are at stake as the Belgian and Dutch governments, central banks and regulators seek to secure the future of financial services group Fortis."
Monday, 29 September 2008 (relevance 0.889, rank 5): "Belgian, Dutch and Luxembourg governments rescued Fortis over the weekend to prevent a domino-like spread of failure by buying its shares for 11.2 billion euros."
Introduction | Methods | Applications | Conclusions
Highly relevant descriptions are found.
However, in order to better summarize textsstructure beyond sentences should be analyzed.
→ Discourse parsing
More Prediction: Discourse Parsing
Introduction | Methods | Applications | Conclusions
Highly relevant descriptions are found.
However, in order to better summarize textsstructure beyond sentences should be analyzed.
→ Discourse parsing
More Prediction: Discourse Parsing
Example: Implicit discourse relations
“But the market turmoil could be partially beneficial for some small businesses. In a sagging market, the Federal Reserve System might flood the market with funds, and that should bring interest rates down.”
Relation type? (Contrast, cause, conjunction, restatement, ...)
Introduction | Methods | Applications | Conclusions
Highly relevant descriptions are found.
However, in order to better summarize textsstructure beyond sentences should be analyzed.
→ Discourse parsing
More Prediction: Discourse Parsing
Example: Implicit discourse relations
“But the market turmoil could be partially beneficial for some small businesses. [Since] In a sagging market, the Federal Reserve System might flood the market with funds, and that should bring interest rates down.”
Relation type: Contingency (cause)
Introduction | Methods | Applications | Conclusions
Highly relevant descriptions are found.
However, in order to better summarize textsstructure beyond sentences should be analyzed.
→ Discourse parsing
More Prediction: Discourse Parsing
Model? Discourse relation type(Sentence 1, Sentence 2)
Introduction | Methods | Applications | Conclusions
Example: Implicit discourse relations
“But the market turmoil could be partially beneficial for some small businesses. [Since] In a sagging market, the Federal Reserve System might flood the market with funds, and that should bring interest rates down.”
Relation type: Contingency (cause)
More Prediction: Discourse Parsing
Deep neural network architectures for predicting implicit discourse relations using word vectors (embeddings)
1. Feed-forward network 2. Recurrent network with attention
Introduction | Methods | Applications | Conclusions
More Prediction: Discourse Parsing
Deep neural network architectures for predicting implicit discourse relations using word vectors (embeddings)
1. Feed-forward network 2. Recurrent network with attention
Introduction | Methods | Applications | Conclusions
Because word vectors are learned from raw text, they are language independent.
Discourse parsing on Chinese:
More Prediction: Discourse Parsing
CONJUNCTIONRelation:
会会会谈谈谈 就就就 一一一些些些 原原原则则则 和和和 具具具体体体 问问问题题题 进进进行行行 了了了 深深深入入入 讨讨讨论论论 ,,, 达达达成成成 了了了 一一一些些些 谅谅谅解解解
双双双方方方 一一一致致致 认认认为为为 会会会谈谈谈 具具具有有有 积积积极极极 成成成果果果
In the talks, they discussed some principles and specific questions in depth, and reached some understandings
Both sides agree that the talks have positive results
(State-of-the-art performance predicting implicit relations)
Introduction | Methods | Applications | Conclusions
More Prediction: Discourse Parsing
CONJUNCTIONRelation:
会会会谈谈谈 就就就 一一一些些些 原原原则则则 和和和 具具具体体体 问问问题题题 进进进行行行 了了了 深深深入入入 讨讨讨论论论 ,,, 达达达成成成 了了了 一一一些些些 谅谅谅解解解
双双双方方方 一一一致致致 认认认为为为 会会会谈谈谈 具具具有有有 积积积极极极 成成成果果果
In the talks, they discussed some principles and specific questions in depth, and reached some understandings
Both sides agree that the talks have positive results
Complex models for complex problems become increasingly difficult to interpret.
Visualizing the network’s attention helps highlight what it focuses on as it makes its decisions.
s of models.
Introduction | Methods | Applications | Conclusions
Modeling and Visualization
CONJUNCTIONRelation:
会会会谈谈谈 就就就 一一一些些些 原原原则则则 和和和 具具具体体体 问问问题题题 进进进行行行 了了了 深深深入入入 讨讨讨论论论 ,,, 达达达成成成 了了了 一一一些些些 谅谅谅解解解
双双双方方方 一一一致致致 认认认为为为 会会会谈谈谈 具具具有有有 积积积极极极 成成成果果果
In the talks, they discussed some principles and specific questions in depth, and reached some understandings
Both sides agree that the talks have positive results
Data Model User
Visual Interactive Interface
AbstractionsObservations Understanding
World
Observations
Model visualization may provide insight intothe model, data and underlying problem.
Visualization of model output, model-structured data and internal model parameters.
Introduction | Methods | Applications | Conclusions
Knowledge-lean text mining:
– Seeks to avoid the bottleneck of knowledge encoding
– Supports easy exploration of new tasks, domains of application, and languages→ Thereby helping to democratize text mining
– Can be achieved through a combination of data-driven modeling and visualization (scalable, quantitative modeling + integration of versatile, qualitative human understanding)
– In particular, my thesis demonstrates knowledge-lean approaches for:– Introducing text mining to the domain of systemic financial risk– Open-ended exploration of topics in text– Multilingual parsing of discourse structure
Conclusions
Introduction | Methods | Applications | Conclusions
Top Related