Text mining meets neural nets

58
Dan Sullivan October 21, 2015 Portland, OR Text Mining Meets Neural Nets: Mining the Biomedical Literature

Transcript of Text mining meets neural nets

Page 1: Text mining meets neural nets

Dan SullivanOctober 21, 2015

Portland, OR

Text Mining Meets Neural Nets: Mining the Biomedical Literature

Page 2: Text mining meets neural nets

*Overview

* Introduction to Natural Language Processing and Text Mining

* Linguistic and Statistical Approaches

*Critiquing Classifier Results

* A New Dawn: Deep Learning

* What’s Next

Page 3: Text mining meets neural nets

*My Background

* Enterprise Architect, Big Data and Analytics

* Former Research Scientist, bioinformatics institute

* Completing PhD in Computational Biology with focus on text mining

*Author

*Contact*[email protected]*@dsapptech*Linkedin.com/in/dansullivanpdx

Page 4: Text mining meets neural nets

*Introduction to Natural Language

Processing and Text Mining

Page 5: Text mining meets neural nets

*“Text is unstructured”

Page 6: Text mining meets neural nets

*Unstructured?

Page 7: Text mining meets neural nets

Manual procedures are time consuming and costly

Volume of literature continues to grow

Commonly used search techniques, such as keyword, similarity searching, metadata filtering, etc. can still yield volumes of literature that are difficult to analyze manually

Some success with popular tools but limitations

Challenges in Text Analysis

Page 8: Text mining meets neural nets

*Dominant Eras in NLP

* Linguistic (from 1960s)* Focus on syntax* Transformational Grammar * Sentence parsing

*Statistical (from 1990s)* Focus on words, ngrams, etc.* Statistics and Probability* Related work in Information

Retrieval* Topic Modeling and Classification

* Deep Learning (from ~2006)* Focus on multi-layered neural net

computing non-linear functions* Light on theory, heavy on

engineering* Multiple NLP tasks

Page 9: Text mining meets neural nets

*Symbolic vs Sub-Symbolic

VS.

Page 10: Text mining meets neural nets

*Linguistic and Statistical

Approaches

http://www.slideshare.net/DanSullivan10/text-mining-meets-neural-nets

http://www.slideshare.net/DanSullivan10/text-mining-meets-neural-nets

http://www.slideshare.net/DanSullivan10/text-mining-meets-neural-nets

Page 11: Text mining meets neural nets

*Linguistic Approaches

Page 12: Text mining meets neural nets

*Linguistic Approaches -

SyntaxImage: http://www.nltk.org/book_1ed/ch08.html

Page 13: Text mining meets neural nets

*Linguistic Approaches - Semantics

Stephen H. Chen et al. Physiol. Genomics 2005;22:257-267

Page 14: Text mining meets neural nets

*Statistical Approaches

Page 15: Text mining meets neural nets

*Statistical Approach: Topic

Models

* Technique for identify dominant themes in document

* Does not require training

* Multiple Algorithms* Probabilistic Latent Semantic Indexing

(PLSI)* Latent Dirichlet allocation (LDA)

*Assumptions*Documents about a mixture of topics*Words used in document attributable to

topic

Source: http://www.keepcalm-o-matic.co.uk/p/keep-calm-theres-no-training-today/

Page 16: Text mining meets neural nets

Debt, Law, Graduation

Debt, EU, Greece, Euro

Source: http://www.nytimes.com/pages/business/index.html April 27, 2015

EU, Greece, Negotiations, Varoufakis

Page 17: Text mining meets neural nets

*Topic Modeling Techniques

* Topics represented by words; documents about a set of topics*Doc 1: 50% politics, 50% presidential*Doc 2: 25% CPU, 30% memory, 45% I/O*Doc 3: 30% cholesterol, 40% arteries, 30% heart

* Learning Topics*Assign each word to a topic*For each word and topic, compute* Probability of topic given a document P(topic|doc)* Probability of word given a topic P(word|topic)* Reassign word to new topic with probability

P(topic|doc) * P(word|topic)* Reassignment based on probability that topic T

generated use of word W

TOPICS

Page 18: Text mining meets neural nets

Image Source: David Blei, “Probabilistic Topic Models” http://yosinski.com/mlss12/MLSS-2012-Blei-Probabilistic-Topic-Models/

Page 19: Text mining meets neural nets

* 3 Key Components* Data* Representation scheme* Algorithms

* Data * Positive examples – Examples from

representative corpus* Negative examples – Randomly selected

from same publications

* Representation* TF-IDF* Vector space representation* Cosine of vectors measure of similarity

* Algorithms* Supervised learning

* SVMs* Ridge Classifier* Perceptrons* kNN* SGD Classifier* Naïve Bayes* Random Forest* AdaBoost *Training a Text Classifier

Page 20: Text mining meets neural nets

*Text Classification Process

Source: Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python:Analyzing Text with Natural Language Toolkit. http://www.nltk.org/book/

Page 21: Text mining meets neural nets

*Term Frequency (TF) tf(t,d) = # of occurrences of t in dt is a termd is a document

* Inverse Document Frequency (IDF)idf(t,D) = log(N / |{d in D : t in d}|)D is set of documentsN is number of document

*TF-IDF = tf(t,d) * idf(t,D)

*TF-IDF is * large when high term frequency in document

and low term frequency in all documents*small when term appears in many documents

*Representation: TF-IDF

Page 22: Text mining meets neural nets

The 1 0 0 0 0 0 0Esp8 0 1 0 0 0 0 0gene 0 0 1 0 0 0 0is 0 0 0 1 0 0 0a 0 0 0 0 1 0 0known 0 0 0 0 0 1 0virulence 0 0 0 0 0 0 1

translocates reduced levels of Esp8 host cell

Sentence 1 0.193 0.2828 0.078 0.0001 0.389 0.0144 0.011

Sentence 2 0 0.0091 0.0621 0 0 0 0

Sentence 3 0 0 0 0 0.028 0.0113 0

Sentence 4 0.021 0 0 0 0 0 0

One Hot Representation

TF-IDF Representation

*Sparse Representations

Page 23: Text mining meets neural nets

* Bag of words model

* Ignores structure (syntax) and meaning (semantics) of sentences

* Representation vector length is the size of set of unique words in corpus

* Stemming used to remove morphological differences

* Each word is assigned an index in the representation vector, V

* The value V[i] is non-zero if word appears in sentence represented by vector

* The non-zero value is a function of the frequency of the word in the sentence and the frequency of the term in the corpus

*Representation: Vector Space

Page 24: Text mining meets neural nets

Support Vector Machine (SVM) is large margin classifier

Commonly used in text classification

Initial results based on life sciences sentence classifier

Image Source:http://en.wikipedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png

*Classification Algorithms

Page 25: Text mining meets neural nets

*Critiquing Classifier Results

Page 26: Text mining meets neural nets

Non-VF, Predicted VF: “Collectively, these data suggest that EPEC 30-5-1(3) translocates reduced levels of

EspB into the host cell.”

“Data were log-transformed to correct for heterogeneity of the variances where necessary.”

“Subsequently, the kanamycin resistance cassette from pVK4 was cloned into thePstI site of pMP3, and the resulting plasmid pMP4 was used to target a disruption in the cesF region of EHEC strain 85-170.”

VF, Predicted Non-VF “Here, it is reported that the pO157-encoded Type V-secreted serine protease

EspP influences the intestinal colonization of calves. “

“Here, we report that intragastric inoculation of a Shiga toxin 2 (Stx2)-producing E. coli O157:H7 clinical isolate into infant rabbits led to severe diarrhea and intestinal inflammation but no signs of HUS. “

“The DsbLI system also comprises a functional redox pair”

Virulence Factor (VF)-Misclassification

Examples

Page 27: Text mining meets neural nets

Adding additional examples is not likely to substantially improve results as seen by error curve

Preliminary Results-Training

Error

0 2000 4000 6000 8000 100000

0.050.1

0.150.2

0.250.3

0.350.4

0.450.5

All

Training ErrorValidation Error

Page 28: Text mining meets neural nets

8 Alternative AlgorithmsSelect 10,000 most important features using chi-square

Alternative Supervised Learning Algorithms

Page 29: Text mining meets neural nets

* Increase quantity of data (not always helpful; see error curves)

* Improve quality of data* Utilize multiple supervised algorithms,

ensemble and non-ensemble* Use unlabeled data and semi-supervised

techniques

* Feature Selection

* Parameter Tuning

* Feature Engineering

* Given:* High quality data in sufficient quantity* State of the art machine learning algorithms

* How to improve results: Change Representation?

*Improving Quality

Page 30: Text mining meets neural nets

*TF-IDF*Loss of syntactic and

semantic information

*No relation between term index and meaning

*No support for disambiguation

*Feature engineering extends vector representation or substitute specific for more general terms – a crude way to capture semantic properties

*Representation Schemes

Ideal Representation◦ Capture semantic

similarity of words

◦ Does not require feature engineering

◦ Minimal pre-processing, e.g. no mapping to ontologies

◦ Improves precision and recall

Page 31: Text mining meets neural nets

*A New Dawn: Deep Learning

Page 32: Text mining meets neural nets

*Word Embeddings

*Dense vector representation (n = 50 … 300 or more)

*Capture semantics – similar words close by cosine measure

*Captures language features*Syntactic relations*Semantic relations

Page 33: Text mining meets neural nets

*Dense Word Representation

[0.160610 -0.547976 -0.444522 -0.037896 0.044305 0.245423 -0.261498 0.000294 -0.275621 -0.021201 -0.432955 0.388905 0.106494 0.405797 -0.159357 -0.073897 0.177182 0.043535 0.600987 0.064762 -0.348964 0.189289 0.650318 0.112554 0.374456 -0.227780 0.208623 0.065362 0.235401 -0.118003 0.032858 -0.309767 0.024085 -0.055148 0.158807 0.171749 -0.153825 0.090301 0.033275 0.089936 0.187864 -0.044472 0.421533 0.209217 -0.142092 0.153070 -0.168291 -0.052823 -0.090984 0.018695 -0.265503 -0.055572 -0.212252 -0.326411 -0.083590 -0.009575 -0.125065 0.376738 0.059734 -0.005585 -0.085654 0.111499 -0.099688 0.147020 -0.419087 -0.042069 -0.241274 0.154339 -0.008625 -0.298928 0.060612 0.216670 -0.080013 -0.218985 -0.805539 0.298797 0.089364 0.071044 0.390878 0.167600 -0.101478 -0.017312 -0.260500 0.392749 0.184021 -0.258466 -0.222133 0.357018 -0.244508 0.221385 -0.012634 -0.073752 -0.409362 0.113296 0.048397 0.000424 0.146018 -0.060891 -0.139045 -0.180432 0.014984 0.023384 -0.032300 -0.161608 -0.188434 0.018036 0.023236 0.060335 -0.173066 0.053327 0.523037 -0.330135 -0.014888 -0.124564 0.046332 -0.124301 0.029865 0.144504 0.163142 -0.018653 -0.140519 0.060562 0.098858 -0.128970 0.762193 -0.230067 -0.226374 0.100086 0.367147 0.160035 0.148644 -0.087583 0.248333 -0.033163 -0.312134 0.162414 0.047267 0.383573 -0.271765 -0.019852 -0.033213 0.340789 0.151498 -0.195642 -0.105429 -0.172337 0.115681 0.033890 -0.026444 -0.048083 -0.039565 -0.159685 -0.211830 0.191293 0.049531 -0.008248 0.119094 0.091608 -0.077601 -0.050206 0.147080 -0.217278 -0.039298 -0.303386 0.543094 -0.198962 -0.122825 -0.135449 0.190148 0.262060 0.146498 -0.236863 0.140620 0.128250 -0.157921 -0.119241 0.059280 -0.003679 0.091986 0.105117 0.117597 -0.187521 -0.388895 0.166485 0.149918 0.066284 0.210502 0.484910 0.396106 -0.118060 -0.076609 -0.326138 -0.305618 -0.297695 -0.078404 -0.210814 0.423335 -0.377239 -0.323599 0.282586]

immune_system

Page 34: Text mining meets neural nets

*Large volume of data*Billions of words in context*Multiple passes over data

*Algorithms*Word2Vec*CBOW*Skip-gram

*GloVe

*Linguistic terms with similar distributions have similar meaning* Learning Word Representation

T. Mikolov, et. al. “Efficient Estimation of Word Representations in Vector Space.” 2013. http://arxiv.org/pdf/1301.3781.pdf

Page 35: Text mining meets neural nets

*Skip-gram predicts surrounding wordsImage:

https://drive.google.com/file/d/0B7XkCwpI5KDYRWRnd1RzWXQ2TWc

Page 36: Text mining meets neural nets

*CBOW predicts current wordImage:

https://drive.google.com/file/d/0B7XkCwpI5KDYRWRnd1RzWXQ2TWc

Page 37: Text mining meets neural nets

*Word Similarity - Malaria

Page 38: Text mining meets neural nets

*Word Similarity: Alanine (Amino Acid)

Page 39: Text mining meets neural nets

*Word Similarity: Leukocyte

Page 40: Text mining meets neural nets

*Word Similarity: Shigella

Page 41: Text mining meets neural nets

*Analogy I (correct)

Heart : Cardiovascular as Kidney:

Page 42: Text mining meets neural nets

*Analogy II (near miss)

Salmonella : Proteobacteria Staphylococcus

Page 43: Text mining meets neural nets

*Analogy III (miss)

Salmonella : Enterobacteriacea as Staphylococcus

Staphylococcaceae

Page 44: Text mining meets neural nets

*Quick Intro to Neural Networks

Page 45: Text mining meets neural nets

*Feed forward neural networkImage: http://u.cs.biu.ac.il/~yogo/nnlp.pdf

Page 46: Text mining meets neural nets

*Calculating with Neural Netshttps://en.wikibooks.org/wiki/Artificial_Neural_Networks/

Activation_Functions

Page 47: Text mining meets neural nets

*Key Characteristics

* Non-linear Activation Function*Sigmoid*Hyberbolic tangent (tanh)*Rectifier (ReLU)

* Word embeddings

* Window size

* Loss function*Binary*Multiclass*Cross-entropy

Page 48: Text mining meets neural nets

*Training a Neural Network – Stochastic

Gradient DescentImages: http://u.cs.biu.ac.il/~yogo/nnlp.pdf; http://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/

Page 49: Text mining meets neural nets

*Convolutional Neural Network for TextImage: https://aclweb.org/anthology/P/P14/P14-2105.xhtml

Page 50: Text mining meets neural nets

*Sentence Classification with Convolutional

Networks

Page 51: Text mining meets neural nets

*What’s Next?

Page 52: Text mining meets neural nets

*Survey n-dimensional Word Embedding Space

Image: http://greg.org/archive/2010/07/05/the_planck_all-sky_survey.html

Page 53: Text mining meets neural nets

*Formalize a Mathematical Model of

Semanticshttp://riotwire.com/column/immigrants-socialists-and-semantics-oh-my/

Page 54: Text mining meets neural nets

*Tools and References

Page 55: Text mining meets neural nets

*Word Embedding Tools

* Word2Vec – command line tool* Gensim – Python topic modeling tool

with word2vec module* GloVe (Global Vector for Word

Representation) – command line tool

Page 56: Text mining meets neural nets

*Deep Learning Tools

* Theano: Python CPU/GPU symbolic expression compiler

* Torch: Scientific framework for LuaJIT

* PyLearn2: Python deep learning platform

* Lasange: light weight framework on Theano

* Keras: Python library for working with Theano

* DeepDist: Deep Learning on Spark

* Deeplearning4J: Java and Scala, integrated with Hadoop and Spark

Page 57: Text mining meets neural nets

*References

*Deep Learning Bibliography - http://memkite.com/deep-learning-bibliography/

* Deep Learning Reading List –http://deeplearning.net/reading-list/

*Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).

* Goldberg, Yav. “A Primer on Neural Network Models for Natural Language Processing” http://u.cs.biu.ac.il/~yogo/nnlp.pdf

Page 58: Text mining meets neural nets

*Q & A