Deep Learning & NLP: Graphs to the Rescue!
-
Upload
roelof-pieters -
Category
Presentations & Public Speaking
-
view
1.078 -
download
0
description
Transcript of Deep Learning & NLP: Graphs to the Rescue!
![Page 1: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/1.jpg)
Deep Learning & NLPGraphs to the Rescue! (or not yet…)
Roelof Pieters, KTH/CSC, Graph Technologies R&D
Stockholm, Sics, October 21 2014
Twitter: @graphificwww.csc.kth.se/~roelof/
![Page 2: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/2.jpg)
DefinitionsMachine Learning
Improving some task T based on experience E with respect to performance measure P. - T. Mitchell (1997)
Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task (or tasks drawn from a population of similar tasks) more effectively the next time. - H. Simon (1983)
2
![Page 3: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/3.jpg)
DefinitionsRepresentation learning
Attempts to automatically learn good features or representations
Deep learning
Attempt to learn multiple levels of representation of increasing complexity/abstraction
3
![Page 4: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/4.jpg)
Overview
1. From Machine Learning to Deep Learning
2. Natural Language Processing
3. Graph-Based Approaches to DL+NLP
4
![Page 5: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/5.jpg)
1. from Machine Learning
to Deep Learning
5
![Page 6: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/6.jpg)
Perceptron
6
![Page 7: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/7.jpg)
Perceptron
6
• Rosenblatt 1957
![Page 8: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/8.jpg)
Perceptron
6
• Rosenblatt 1957 • Minsky & Papert 1969
![Page 9: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/9.jpg)
Perceptron
6
• Rosenblatt 1957 • Minsky & Papert 1969
The world believed Minsky & Papert…
![Page 10: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/10.jpg)
2th gen Perceptron• Quest to make it non-linear
• no result…
7
Until finally…
• Rumelhart, Hinton & Williams, 1986
• Multi-Layered Perceptrons (MLP) !!!
• Backpropagation (Bryson & Ho 1969)(Rumelhart, Hinton & Williams, 1986)
![Page 11: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/11.jpg)
• Forward Propagation :
• Sum inputs, produce activation, feed-forward
8
![Page 12: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/12.jpg)
• Back Propagation of Error
• Calculate total error at the top
• Calculate contributions to error at each step going backwards
9
![Page 13: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/13.jpg)
Phase 1: PropagationEach propagation involves the following steps:
1. Forward propagation of a training pattern's input through the neural network in order to generate the propagation's output activations.
2. Backward propagation of the propagation's output activations through the neural network using the training pattern target in order to generate the deltas of all output and hidden neurons.
Phase 2: Weight update For each weight-synapse follow the following steps:
1. Multiply its output delta and input activation to get the gradient of the weight.
2. Subtract a ratio (percentage) of the gradient from the weight.
10
![Page 14: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/14.jpg)
Perceptron Network: SVM
11
• Vapnik et al. 1992; 1995.
• Cortes & Vapnik 1995
Source: Cortes & Vapnik 1995
![Page 15: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/15.jpg)
Perceptron Network: SVM
11
• Vapnik et al. 1992; 1995.
• Cortes & Vapnik 1995
Source: Cortes & Vapnik 1995
Kernel SVM
![Page 16: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/16.jpg)
“2006”
12
![Page 17: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/17.jpg)
“2006”• Faster machines (GPU’s!)
12
![Page 18: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/18.jpg)
“2006”• Faster machines (GPU’s!)
• More data
12
![Page 19: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/19.jpg)
“2006”• Faster machines (GPU’s!)
• More data
• New methods for unsupervised pre-training
12
![Page 20: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/20.jpg)
“2006”• New methods for unsupervised pre-training
13
• Stacked RBM’s (Deep Belief Networks [DBN’s] )
• Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
• Hinton, G. E. and Salakhutdinov, R. R, Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.
• (Stacked) Autoencoders
• Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19
![Page 21: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/21.jpg)
Pretraining: Stacked RBM’s
• Iterative pre-training construction of Deep Belief Network (DBN) (Hinton et al., 2006)
14
from: Larochelle et al. (2007). An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation.
![Page 22: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/22.jpg)
Pretraining: Stacked Denoising Auto-encoder
15
• Stacking Auto-Encoders
from: Bengio ICML 2009
![Page 23: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/23.jpg)
Pretraining: Stacked Denoising Auto-encoder
16
• (Vincent et al, 2008)
• Good vs Corrupted context
from: Vincent et al 2010
![Page 24: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/24.jpg)
Pretraining: Stacked Denoising Auto-encoder
16
• (Vincent et al, 2008)
• Good vs Corrupted context
from: Vincent et al 2010Raw input
![Page 25: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/25.jpg)
Pretraining: Stacked Denoising Auto-encoder
16
• (Vincent et al, 2008)
• Good vs Corrupted context
from: Vincent et al 2010Corrupted input Raw input
![Page 26: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/26.jpg)
Pretraining: Stacked Denoising Auto-encoder
16
• (Vincent et al, 2008)
• Good vs Corrupted context
from: Vincent et al 2010
Hidden code (representation)
Corrupted input Raw input
![Page 27: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/27.jpg)
Pretraining: Stacked Denoising Auto-encoder
16
• (Vincent et al, 2008)
• Good vs Corrupted context
from: Vincent et al 2010
Hidden code (representation)
Corrupted input Raw input reconstruction
![Page 28: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/28.jpg)
Pretraining: Stacked Denoising Auto-encoder
16
• (Vincent et al, 2008)
• Good vs Corrupted context
from: Vincent et al 2010
Hidden code (representation)
Corrupted input Raw input reconstruction
KL(reconstruction | raw input)
![Page 29: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/29.jpg)
17
![Page 30: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/30.jpg)
Convolutional Neural Networks (CNNs) • Fukushima 1980; LeCun et al. 1998; Behnke 2003; Simard et al. 2003…
• Hinton et al. 2006; Bengio et al. 2007; Ranzato et al. 2007
• Sparse connectivity:
18
• MaxPooling
• Shared weights:
(Figures from http://deeplearning.net/tutorial/lenet.html)
![Page 31: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/31.jpg)
Pretraining• Why does Pretraining work so well? (Erhan et al. 2010)
• Better Generalisation
19
without unsupervised pretraining with unsupervised pretraining)
Figures from Erhan et al. 2010
![Page 32: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/32.jpg)
Pretraining
20
Figures from Erhan et al. 2010
![Page 33: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/33.jpg)
–Andrew Ng
“I’ve worked all my life in Machine Learning, and I’ve never seen one algorithm knock over
benchmarks like Deep Learning”
21
![Page 34: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/34.jpg)
The (god)fathers of DL
22
![Page 35: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/35.jpg)
The (god)fathers of DL
22
![Page 36: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/36.jpg)
The (god)fathers of DL
22
![Page 37: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/37.jpg)
DL: (Every)where ?
23
![Page 38: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/38.jpg)
DL: (Every)where ?• Language Modeling (2012, Mikolov et al)
23
![Page 39: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/39.jpg)
DL: (Every)where ?• Language Modeling (2012, Mikolov et al)
• Image Recognition (Krizhevsky won 2012 ImageNet competition)
23
![Page 40: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/40.jpg)
DL: (Every)where ?• Language Modeling (2012, Mikolov et al)
• Image Recognition (Krizhevsky won 2012 ImageNet competition)
• Sentiment Classification (2011, Socher et al)
23
![Page 41: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/41.jpg)
DL: (Every)where ?• Language Modeling (2012, Mikolov et al)
• Image Recognition (Krizhevsky won 2012 ImageNet competition)
• Sentiment Classification (2011, Socher et al)
• Speech Recognition (2010, Dahl et al)
23
![Page 42: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/42.jpg)
DL: (Every)where ?• Language Modeling (2012, Mikolov et al)
• Image Recognition (Krizhevsky won 2012 ImageNet competition)
• Sentiment Classification (2011, Socher et al)
• Speech Recognition (2010, Dahl et al)
• MNIST hand-written digit recognition (Ciresan et al, 2010)
23
![Page 43: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/43.jpg)
24
![Page 44: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/44.jpg)
So: Why Deep?Deep Architectures can be representationally efficient
• Fewer computational units for same function
Deep Representations might allow for a hierarchy or representation
• Allows non-local generalisation
• Comprehensibility
Multiple levels of latent variables allow combinatorial sharing of statistical strength
25
![Page 45: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/45.jpg)
So: Why Deep?Generalizing better to new tasks & domains
Can learn good intermediate representations shared across tasks
Distributed representations
Unsupervised Learning
Multiple levels of representation
26
![Page 46: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/46.jpg)
Diff Levels of Abstraction• Hierarchical Learning
• Natural progression from low level to high level structure as seen in natural complexity
• Easier to monitor what is being learnt and to guide the machine to better subspaces
• A good lower level representation can be used for many distinct tasks
27
![Page 47: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/47.jpg)
Generalizable Learning• Shared Low Level Representations
• Multi-Task Learning
• Unsupervised Training
28
![Page 48: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/48.jpg)
Generalizable Learning• Shared Low Level Representations
• Multi-Task Learning
• Unsupervised Training
28
• Partial Feature Sharing
• Mixed Mode Learning
• Composition of Functions
![Page 49: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/49.jpg)
29
No More Handcrafted Features !
![Page 50: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/50.jpg)
2. Natural Language Processing
30
![Page 51: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/51.jpg)
DL + NLP• Language Modeling
• Bengio et al. (2000, 2003): via Neural network
• Mnih and Hinton (2007): via RBMs
• Pos, Chunking, NER, SRL
• Collobert and Weston 2008
• Socher et al 2011; Socher 2014
31
![Page 52: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/52.jpg)
Language Modeling• Word Embeddings (Bengio et al, 2001; Bengio et
al, 2003) based on idea of distributed representations for symbols (Hinton 1986)
• Neural Word embeddings (Turian et al 2010; Collobert et al. 2011)
32
![Page 53: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/53.jpg)
Word Embeddings• Collobert & Weston 2008; Collobert et al. 2011
• similar to word vector learning, but uses instead of single scalar score, a Softmax/Maxent classifier
33word embeddings in from lookup table. From Collobert et al. 2011
![Page 54: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/54.jpg)
Word Embeddings• Collobert & Weston 2008; Collobert et al. 2011
• similar to word vector learning, but uses instead of single scalar score, a Softmax/Maxent classifier
34
Figure from Socher et al. Tutorial ACL 2012.
![Page 55: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/55.jpg)
35
Figure from Socher et al. Tutorial ACL 2012.
![Page 56: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/56.jpg)
• window approach
36source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
• sentence approach
![Page 57: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/57.jpg)
• Multi-task learning
37source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
![Page 58: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/58.jpg)
38source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
General Deep Architecture for NLP
![Page 59: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/59.jpg)
38source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
Basic features
General Deep Architecture for NLP
![Page 60: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/60.jpg)
38source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
Basic features
Embeddings
General Deep Architecture for NLP
![Page 61: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/61.jpg)
38source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
Basic features
Embeddings
Convolution
General Deep Architecture for NLP
![Page 62: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/62.jpg)
38source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
Basic features
Embeddings
Convolution
Max pooling
General Deep Architecture for NLP
![Page 63: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/63.jpg)
38source: Collobert & Weston, Deep Learning for Natural Language Processing. 2009 Nips
Basic features
Embeddings
Convolution
Max pooling
“Supervised” learning
General Deep Architecture for NLP
![Page 64: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/64.jpg)
Word Embeddings• Unsupervised Word Representations (Turian et al
2010)
• evaluates Brown clusters, C&W (Collobert and Weston 2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words -> Brown clusters win out with a small margin on both NER and chunking.
• more info: http://metaoptimize.com/projects/wordreprs/
39
![Page 65: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/65.jpg)
40
t-SNE visualizations of word embeddings. Left: Number Region; Right: Jobs Region. From Turian et al. 2011
![Page 66: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/66.jpg)
41http://metaoptimize.com/projects/wordreprs/
![Page 67: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/67.jpg)
Word Embeddings• Collobert & Weston 2008; Collobert et al. 2011
• Propose a unified neural network architecture, for many NLP tasks:
• part-of-speech tagging, chunking, named entity recognition, and semantic role labeling
• no hand-made input features
• learns internal representations on the basis of vast amounts of mostly unlabeled training data.
42
![Page 68: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/68.jpg)
Word Embeddings• Recurrent Neural Network (Mikolov et al. 2010;
Mikolov et al. 2013a)
43
W(‘‘woman")−W(‘‘man") ≃ W(‘‘aunt")−W(‘‘uncle") W(‘‘woman")−W(‘‘man") ≃ W(‘‘queen")−W(‘‘king")
Figures from Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations
![Page 69: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/69.jpg)
• Mikolov et al. 2013b
44
Figures from Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b). Efficient Estimation of Word Representations in Vector Space
![Page 70: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/70.jpg)
Word Embeddings• Recursive (Tensor) Network (Socher et al. 2011;
Socher 2014)
45
![Page 71: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/71.jpg)
Vector Space Model
46
![Page 72: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/72.jpg)
47
![Page 73: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/73.jpg)
48
![Page 74: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/74.jpg)
49
![Page 75: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/75.jpg)
50
![Page 76: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/76.jpg)
51
![Page 77: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/77.jpg)
52
![Page 78: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/78.jpg)
53
![Page 79: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/79.jpg)
3. Graph-Based Approaches to DL+NLP
54
• A) NLP “naturally encoded”
• B) Genetic Finite State Machine
• C) Neural net within Graph
![Page 80: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/80.jpg)
Graph-Based NLP
• Graphs have a “natural affinity” with NLP [ feel free to quote me on that ;) ]
• relation-oriented
• index-free adjacency
55
![Page 81: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/81.jpg)
Whats in a Graph ?
56
Figure from Buerli & Obispo (2012).
![Page 82: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/82.jpg)
Whats in a Graph ?• Graph Databases: Neo4j, OrientDB, InfoGrid, Titan,
FlockDB, ArangoDB, InfiniteGraph, AllegroGraph, DEX, GraphBase, and HyperGraphDB
• Distributed graph processing toolkits (based on MapReduce, HDFS, and custom BSP engines): Bagel, Hama, Giraph, PEGASUS, Faunus, Flink
• in-memory graph packages designed for massive shared-memory (NetworkX, Gephi, MTGL, Boost, uRika, and STINGER)
57
![Page 83: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/83.jpg)
A. NLP “naturally encoded”
58
• Captures:
• Redundancies
• Gapped Subsequences
• Collapsible Structures From Ganesan 2013
• ie: graph-based opinion summarization (Ganesan et al. 2010; Genevan 2013)
Natural Affinity, Say what?
![Page 84: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/84.jpg)
Summarization Graph
59From Ganesan 2013
![Page 85: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/85.jpg)
Natural Affinity?
• Demo time!
60
![Page 86: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/86.jpg)
B. Finite State Graph• Bastani 2014a; 2014b; 2014c
• Probabilistic feature hierarchy
• Grammatical inference by genetic algorithms
61more info: https://github.com/kbastani/graphify
Figure from Bastani 2014a
![Page 87: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/87.jpg)
Finite State Graph
62
• Bastani 2014
• training phase:
all figures from Bastani 2014b
![Page 88: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/88.jpg)
Finite State Graph
62
• Bastani 2014
• training phase:
all figures from Bastani 2014b
![Page 89: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/89.jpg)
Finite State Graph
62
• Bastani 2014
• training phase:
all figures from Bastani 2014b
![Page 90: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/90.jpg)
Finite State Graph
62
• Bastani 2014
• training phase:
all figures from Bastani 2014b
![Page 91: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/91.jpg)
• sentimentanalysis
• error: 0.3
63
Figure from Bastani 2014c
![Page 92: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/92.jpg)
Conceptual Hierarchical Graph
• Demo time!
64
![Page 93: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/93.jpg)
C. Factor Graph• Factor graph in which the factors themselves contain a deep neural net.
• Factor graph:
• bipartite graph representing the factorization of a function (Kschischang et al. 2001; Frey 2002)
• can combine Bayesian networks (BNs) and Markov random fields (MRFs).
65
Figure from Frey 2002
![Page 94: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/94.jpg)
Factor Graph• Factor graph with “deep factors” (Mirowski & LeCun 2009)
• Dynamic Time Series modeling
66
![Page 95: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/95.jpg)
Energy-Based Graph• LeCun et al. 1998, handwriting recognition
system
• “Graph Transformer Networks”
• Instead of normalised HMM, energy based factor graph (without normalization)
• LeCun et al. 2006.
• Energy-Based Learning
67
![Page 96: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/96.jpg)
and Finally…And finally…
What you’ve all been waiting for…
68
![Page 97: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/97.jpg)
and Finally…And finally…
What you’ve all been waiting for…
68
Which Net is currently the Biggest ?
![Page 98: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/98.jpg)
and Finally…And finally…
What you’ve all been waiting for…
68
Which Net is currently the Biggest ?
the Deepest
![Page 99: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/99.jpg)
and Finally…And finally…
What you’ve all been waiting for…
68
Which Net is currently the Biggest ?
the Deepest
The most Bad-ass ?
![Page 100: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/100.jpg)
69source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014
Winners of: Large Scale Visual Recognition Challenge 2014
(ILSVRC2014) 19 September 2014
![Page 101: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/101.jpg)
69source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014
Winners of: Large Scale Visual Recognition Challenge 2014
(ILSVRC2014) 19 September 2014
GoogLeNet
Convolution Pooling Softmax Other
![Page 102: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/102.jpg)
69source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014
GoogLeNet
Convolution Pooling Softmax Other
Winners of: Large Scale Visual Recognition Challenge 2014
(ILSVRC2014) 19 September 2014
GoogLeNet
Convolution Pooling Softmax Other
![Page 103: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/103.jpg)
70source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014
Inception
Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules. Can remove fully connected layers on top completely Number of parameters is reduced to 5 million
256 480 480 512
512 512 832 832 1024
Computional cost is increased by less than 2X compared to Krizhevsky’s network. (<1.5Bn operations/evaluation)
![Page 104: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/104.jpg)
71
Classification results on ImageNet 2012
Team Year Place Error (top-5) Uses external data
SuperVision 2012 - 16.4% no
SuperVision 2012 1st 15.3% ImageNet 22k
Clarifai 2013 - 11.7% no
Clarifai 2013 1st 11.2% ImageNet 22k
MSRA 2014 3rd 7.35% no
VGG 2014 2nd 7.32% no
GoogLeNet 2014 1st 6.67% no
Final Detection Results Team Year Place mAP e x t e r n a l
data ensemble c o n t e x t u a l
model approach
UvA-Euvision 2013 1st 22.6% none ? yes F i s h e r vectors
Deep Insight 2014 3rd 40.5% I L S V R C 1 2 Classification + Localization
3 models yes ConvNet
C U H K DeepID-Net
2014 2nd 40.7% I L S V R C 1 2 Classification + Localization
? no ConvNet
GoogLeNet 2014 1st 43.9% I L S V R C 1 2 Classification
6 models no ConvNet
Detection results
source: Szegedy et al. Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014
![Page 105: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/105.jpg)
Wanna Play?• cuda-convnet2 (Alex Krizhevsky, Toronto) (c++/
CUDA, optimized for GTX 580) https://code.google.com/p/cuda-convnet2/
• Caffe (Berkeley) (Cuda/OpenCL, Theano, Python) http://caffe.berkeleyvision.org/
• OverFeat (NYU) http://cilvr.nyu.edu/doku.php?id=code:start
72
![Page 106: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/106.jpg)
Wanna Play?• Theano - CPU/GPU symbolic expression compiler in python
(from LISA lab at University of Montreal). http://deeplearning.net/software/theano/
• Pylearn2 - Pylearn2 is a library designed to make machine learning research easy. http://deeplearning.net/software/pylearn2/
• Torch - provides a Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/
• more info: http://deeplearning.net/software links/
73
(slide partially stolen from: J. Sullivan, Convolutional Neural Networks & Computer Vision, Machine Learning meetup at Spotify, Stockholm, June 9
2014)
![Page 107: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/107.jpg)
Fin.
Questions / Discussion … ?
74
![Page 108: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/108.jpg)
Bibliography: Definitions• Mitchell, T. M. (1997). Machine Learning (1st ed.). New York, NY,
USA: McGraw-Hill, Inc.
• Simon, H.A. (1983). Why should machines learn? in: Machine Learning: An Artificial Intelligence Approach, (R. Michalski, J. Carbonell, T. Mitchell, eds) Tioga Press, 25-38.
75
![Page 109: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/109.jpg)
Bibliography: History• Rosenblatt, Frank (1957), The Perceptron--a perceiving and recognizing automaton. Report
85-460-1, Cornell Aeronautical Laboratory.
• Minsky & Papert (1969), Perceptrons: an introduction to computational geometry.
• Bryson, A.E.; W.F. Denham; S.E. Dreyfus (1963) Optimal programming problems with inequality constraints. I: Necessary conditions for extremal solutions. AIAA J. 1, 11 2544-2550.
• Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (1986). "Learning representations by back-propagating errors". Nature 323 (6088): 533–536.
• Boser, B. E., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144–152. ACM Press.
• Cortes, C. and Vapnik, V. (1995), Support-vector network. Machine Learning, 20:273–297.
• Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. In Proceedings of the 24th International Conference on Machine Learning (pp. 473–480). New York, NY, USA: ACM.
• Vincent, P., Larochelle, H., & Lajoie, I. (2010), Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.
76
![Page 110: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/110.jpg)
Bibliography: History - CNN’s• Fukushima, Kunihiko (1980). "Neocognitron: A Self-organizing Neural Network Model for a
Mechanism of Pattern Recognition Unaffected by Shift in Position". Biological Cybernetics 36 (4): 193–202. doi:10.1007/BF00344251. PMID 7370364. Retrieved 16 November 2013.
• LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-based learning applied to document recognition". Proceedings of the IEEE 86 (11): 2278–2324.
• S. Behnke. Hierarchical Neural Networks for Image Interpretation, volume 2766 of Lecture Notes in Computer Science. Springer, 2003.
• Simard, Patrice, David Steinkraus, and John C. Platt. "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis." In ICDAR, vol. 3, pp. 958-962. 2003.
• Hinton, GE; Osindero, S; Teh, YW (Jul 2006). "A fast learning algorithm for deep belief nets.". Neural computation 18 (7): 1527–54.
• Bengio, Yoshua; Lamblin, Pascal; Popovici, Dan; Larochelle, Hugo (2007). "Greedy Layer-Wise Training of Deep Networks". Advances in Neural Information Processing Systems: 153–160.
• Ranzato, MarcAurelio; Poultney, Christopher; Chopra, Sumit; LeCun, Yann (2007). "Efficient Learning of Sparse Representations with an Energy-Based Model". Advances in Neural Information Processing Systems.
77
![Page 111: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/111.jpg)
Bibliography: DL• Bengio, Y., Ducharme, R., & Vincent, P. (2001). A Neural Probabilistic Language Model.
In T. K. Leen & T. G. Dietterich (Eds.), Advances in Neural Information Processing Systems 13 (NIPS’00). MIT Press.
• Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A Neural Probabilistic Language Model. The Journal of Machine Learning Research, 3, 1137–1155.
• Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19
• Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society (Vol. 1, p. 12).
• Hinton, G. E. and Salakhutdinov, R. R, (2006) Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.
• Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
• Erhan, D., Bengio, Y., & Courville, A. (2010). Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 625–660.
78
![Page 112: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/112.jpg)
Bibliography: DL• P. Vincent, P., Larochelle, H., Bengio, Y. and Manzagol, P. A. (2008) Extracting and
composing robust features with denoising autoencoders. In ICML.
• Vincent, P., Larochelle, H., & Lajoie, I. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408. Bengui 2009
• Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012) Imagenet classification with deep convolutional neural networks. In NIPS.
• Socher, Richard, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. (2011). Semi-supervised recursive autoencoders for predict- ing sentiment distributions. In Proceedings of the 2011 Conference on Empiri- cal Methods in Natural Language Processing (EMNLP).
• Dahl, G. E., Ranzato, M. A., Mohamed, A. and Hinton, G. E. (2010) Phone recognition with the mean-covariance restricted Boltzmann machine. In NIPS.
• Ciresan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition. CoRR.
• Szegedy et al. (2014) Going deeper with convolutions (GoogLeNet ), ILSVRC2014, 19 Sep 2014
79
![Page 113: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/113.jpg)
Bibliography: NLP• Turian, J., Ratinov, L., & Bengio, Y. (2010). Word Representations: A Simple and
General Method for Semi-supervised Learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 384–394). Stroudsburg, PA, USA: Association for Computational Linguistics.
• Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. Proceedings of the 25th International Conference ….
• Collobert, R., Weston, J., & Bottou, L. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493-2537.
• Collobert & Weston, Deep Learning for Natural Language Processing (2009) Nips Tutorial
• Mikolov, T., Yih, W., & Zweig, G. (2013a). Linguistic Regularities in Continuous Space Word Representations. HLT-NAACL, (June), 746–751.
• Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b). Efficient Estimation of Word Representations in Vector Space, 1–12. Computation and Language.
80
![Page 114: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/114.jpg)
• Bengio, Y. and Bengio, S (2000) Modeling high- dimensional discrete data with multi-layer neural networks. In Proceedings of NIPS 12
• Mnih, A. and Hinton, G. E. (2007) Three New Graphical Models for Statistical Language Modelling. International Conference on Machine Learning, Corvallis, Oregon.
• Socher, R., Bengio, Y., & Manning, C. (2012). Deep Learning for NLP (without Magic). Tutorial Abstracts of ACL 2012.
• Socher, R. (2014). recursive deep learning for natural language processing and computer vision. Dissertation.
81
Bibliography: NLP
![Page 115: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/115.jpg)
Bibliography: Graph-Based Approaches
• Frey, B. (2002). Extending factor graphs so as to unify directed and undirected graphical models. Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence 19 (UAI 03), Morgan Kaufmann, CA, Acapulco, Mexico, 257–264.
• F. R. Kschischang, B. J. Frey, H. A. L. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519.
• Mirowski, P., & LeCun, Y. (2009). Dynamic factor graphs for time series modeling. Machine Learning and Knowledge Discovery.
• LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE November 1998.
• LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M. A., & Huang, F. J. (2006). A Tutorial on Energy-Based Learning 1 Introduction : Energy-Based Models, 1–59.
82
![Page 116: Deep Learning & NLP: Graphs to the Rescue!](https://reader034.fdocuments.in/reader034/viewer/2022052303/547e71145906b5b5718b468c/html5/thumbnails/116.jpg)
Bibliography: Graph-Based Approaches• Buerli, M., & Obispo, C. (2012). The current state of graph databases.
Department of Computer Science, Cal Poly San Luis Obispo
• Ganesan, K., Zhai, C., & Han, J. (2010). Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), (August), 340–348.
• Ganesan, K. (2013). Opinion Driven Decision Support System. PhD Dissertation, University of Illinois.
• Bastani, K. 2014a, Hierarchical Pattern Recognition, Blog: Meaning Of, June 17, 2014
• Bastani, K. 2014b, Using a Graph Database for Deep Learning Text Classification, Blog: Meaning Of, August 26, 2014
• Bastani, K. 2014c, Deep Learning Sentiment Analysis for Movie Reviews using Neo4j, Blog: Meaning Of, September 15, 2014
83