Natural Language Processing (NLP)
-
Upload
yuriy-guts -
Category
Data & Analytics
-
view
1.719 -
download
2
Transcript of Natural Language Processing (NLP)
![Page 1: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/1.jpg)
Natural Language ProcessingYuriy Guts – Jul 09, 2016
![Page 2: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/2.jpg)
Who Is This Guy?
Data Science Team Lead
Sr. Data Scientist
Software Architect, R&D Engineer
I also teach Machine Learning:
![Page 3: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/3.jpg)
What is NLP?Study of interaction between computers and human languages
NLP = Computer Science + AI + Computational Linguistics
![Page 4: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/4.jpg)
Common NLP Tasks
Easy Medium Hard
• Chunking
• Part-of-Speech Tagging
• Named Entity Recognition
• Spam Detection
• Thesaurus
• Syntactic Parsing
• Word Sense Disambiguation
• Sentiment Analysis
• Topic Modeling
• Information Retrieval
• Machine Translation
• Text Generation
• Automatic Summarization
• Question Answering
• Conversational Interfaces
![Page 5: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/5.jpg)
Interdisciplinary Tasks: Speech-to-Text
![Page 6: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/6.jpg)
Interdisciplinary Tasks: Image Captioning
![Page 7: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/7.jpg)
What Makes NLP so Hard?
![Page 8: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/8.jpg)
Ambiguity
![Page 9: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/9.jpg)
Non-Standard Language
Also: neologisms, complex entity names, phrasal verbs/idioms
![Page 10: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/10.jpg)
More Complex Languages Than English
• German: Donaudampfschiffahrtsgesellschaftskapitän (5 “words”)
• Chinese: 50,000 different characters (2-3k to read a newspaper)
• Japanese: 3 writing systems
• Thai: Ambiguous word boundaries and sentence concepts
• Slavic: Different word forms depending on gender, case, tense
![Page 11: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/11.jpg)
Write Traditional “If-Then-Else” Rules?BIG NOPE!
Leads to very large and complex codebases.Still struggles to capture trivial cases (for a human).
![Page 12: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/12.jpg)
Better Approach: Machine Learning
“ • A computer program is said to learn from experience E
• with respect to some class of tasks T and performance measure P,
• if its performance at tasks in T, as measured by P,
• improves with experience E.
— Tom M. Mitchell
![Page 13: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/13.jpg)
Part 1Essential Machine Learning Background for NLP
![Page 14: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/14.jpg)
Before We Begin: Disclaimer
• This will be a very quick description of ML. By no means exhaustive.
• Only the essential background for what we’ll have in Part 2.
• To fit everything into a small timeframe, I’ll simplify some aspects.
• I encourage you to read ML books or watch videos to dig deeper.
![Page 15: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/15.jpg)
Common ML Tasks
• Regression
• Classification (Binary or Multi-Class)
1. Supervised Learning
2. Unsupervised Learning
• Clustering
• Anomaly Detection
• Latent Variable Models (Dimensionality Reduction, EM, …)
![Page 16: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/16.jpg)
![Page 17: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/17.jpg)
![Page 18: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/18.jpg)
RegressionPredict a continuous dependent variable
based on independent predictors
![Page 19: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/19.jpg)
![Page 20: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/20.jpg)
![Page 21: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/21.jpg)
![Page 22: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/22.jpg)
Linear Regression
![Page 23: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/23.jpg)
![Page 24: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/24.jpg)
After adding polynomial features
![Page 25: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/25.jpg)
![Page 26: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/26.jpg)
ClassificationAssign an observation to some categoryfrom a known discrete list of categories
![Page 27: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/27.jpg)
Logistic Regression
Class A
Class B
(Multi-class extension = Softmax Regression)
![Page 28: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/28.jpg)
Neural Networksand Backpropagation Algorithm
![Page 29: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/29.jpg)
http://playground.tensorflow.org/
![Page 30: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/30.jpg)
ClusteringGroup objects in such a way
that objects in the same group are similar,and objects in the different groups are not
![Page 31: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/31.jpg)
K-Means Clustering
![Page 32: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/32.jpg)
EvaluationHow do we know if an ML model is good?What do we do if something goes wrong?
![Page 33: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/33.jpg)
Underfitting & Overfitting
![Page 34: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/34.jpg)
Development & Troubleshooting
• Picking the right metric: MAE, RMSE, AUC, Cross-Entropy, Log-Loss
• Training Set / Validation Set / Test Set split
• Picking hyperparameters against Validation Set
• Regularization to prevent OF
• Plotting learning curves to check for UF/OF
![Page 35: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/35.jpg)
Deep Learning
• Core idea: instead of hand-crafting complex features, use increased computing capacity and build a deep computation graph that will try to learn feature representations on its own.End-to-end learning rather than a cascade of apps.
• Works best with lots of homogeneous, spatially related features(image pixels, character sequences, audio signal measurements).Usually works poorly otherwise.
• State-of-the-art and/or superhuman performance on many tasks.
• Typically requires massive amounts of data and training resources.
• But: a very young field. Theories not strongly established, views change.
![Page 36: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/36.jpg)
Example: Convolutional Neural Network
![Page 37: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/37.jpg)
Part 2NLP Challenges And Approaches
![Page 38: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/38.jpg)
“Classical” NLP Pipeline
Tokenization
Morphology
Syntax
Semantics
Discourse
Break text into sentences and words, lemmatize
Part of speech (POS) tagging, stemming, NER
Constituency/dependency parsing
Coreference resolution, wordsense disambiguation
Task-dependent (sentiment, …)
![Page 39: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/39.jpg)
Often Relies on Language Banks
• WordNet (ontology, semantic similarity tree)
• Penn Treebank (POS, grammar rules)
• PropBank (semantic propositions)
• …Dozens of them!
![Page 40: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/40.jpg)
Tokenization & Stemming
![Page 41: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/41.jpg)
POS/NER Tagging
![Page 42: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/42.jpg)
Parsing (LPCFG)
![Page 43: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/43.jpg)
“Classical” way: Training a NER TaggerTask: Predict whether the word is a PERSON, LOCATION, DATE or OTHER.
Could be more than 3 NER tags (e.g. MUC-7 contains 7 tags).
1. Current word.2. Previous, next word (context).3. POS tags of current word and nearby words.4. NER label for previous word.5. Word substrings (e.g. ends in “burg”, contains “oxa” etc.)6. Word shape (internal capitalization, numerals, dashes etc.).7. …on and on and on…
Features:
![Page 44: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/44.jpg)
Feature Representation: Bag of Words
A single word is a one-hot encoding vector with the size of the dictionary :(
![Page 45: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/45.jpg)
Problem
• Manually designed features are often over-specified, incomplete, take a long time to design and validate.
• Often requires PhD-level knowledge of the domain.
• Researchers spend literally decades hand-crafting features.
• Bag of words model is very high-dimensional and sparse,cannot capture semantics or morphology.
Maybe Deep Learning can help?
![Page 46: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/46.jpg)
Deep Learning for NLP
• Core enabling idea: represent words as dense vectors[0 1 0 0 0 0 0 0 0] [0.315 0.136 0.831]
• Try to capture semantic and morphologic similarity so that the features for “similar” words are “similar”(e.g. closer in Euclidean space).
• Natural language is context dependent: use context for learning.
• Straightforward (but slow) way: build a co-occurrence matrix and SVD it.
![Page 47: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/47.jpg)
Embedding Methods: Word2Vec
CBoW version: predict center word from context Skip-gram version: predict context from center word
![Page 48: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/48.jpg)
Benefits
• Learns features of each word on its own, given a text corpus.
• No heavy preprocessing is required, just a corpus.
• Word vectors can be used as features for lots of supervised learning applications: POS, NER, chunking, semantic role labeling. All with pretty much the same network architecture.
• Similarities and linear relationships between word vectors.
• A bit more modern representation: GloVe, but requires more RAM.
![Page 49: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/49.jpg)
Linearities
![Page 50: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/50.jpg)
Training a NER Tagger: Deep Learning
Just replace this with NER tag (or POS tag, chunk end, etc.)
![Page 51: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/51.jpg)
Language ModelingAssign high probabilities to well-formed sentences
(crucial for text generation, speech recognition, machine translation)
![Page 52: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/52.jpg)
“Classical” Way: N-Grams
Problem: doesn’t scale well to bigger N. N = 5 is pretty much the limit.
![Page 53: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/53.jpg)
Deep Learning Way: Recurrent NN (RNN)
Can use past information without restricting the size of the context.But: in practice, can’t recall information that came in a long time ago.
![Page 54: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/54.jpg)
Long Short Term Memory Network (LSTM)
Contains gates that control forgetting, adding, updating and outputting information.Surprisingly amazing performance at language tasks compared to vanilla RNN.
![Page 55: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/55.jpg)
Tackling Hard Tasks
Deep Learning enables end-to-end learning for MachineTranslation, Image Captioning,Text Generation, Summarization:
NLP tasks which are inherentlyvery hard!
RNN for Machine Translation
![Page 56: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/56.jpg)
Hottest Current Research
• Attention Networks
• Dynamic Memory Networks
(see ICML 2016 proceedings)
![Page 57: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/57.jpg)
Tools I Used
• NLTK (Python)
• Gensim (Python)
• Stanford CoreNLP (Java with bindings)
• Apache OpenNLP (Java with bindings)
Deep Learning Frameworks with GPU Support:
• Torch (Torch-RNN) (Lua)
• TensorFlow, Theano, Keras (Python)
![Page 58: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/58.jpg)
NLP Progress for Ukrainian
• Ukrainian lemma dictionary with POS tagshttps://github.com/arysin/dict_uk
• Ukrainian lemmatizer plugin for ElasticSearchhttps://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
• lang-uk project (1M corpus, NER, tokenization, etc.)https://github.com/lang-uk
![Page 59: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/59.jpg)
Demo 1: Exploring Semantic Properties Of ASOIAF (“Game of Thrones”)Demo 2: Topic Modeling for DOU.UA Comments
![Page 60: Natural Language Processing (NLP)](https://reader034.fdocuments.in/reader034/viewer/2022042619/5870639b1a28ab48378b4739/html5/thumbnails/60.jpg)
GitHub Repos with IPython Notebooks
• https://github.com/YuriyGuts/thrones2vec
• https://github.com/YuriyGuts/dou-topic-modeling