Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients...
Transcript of Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients...
![Page 1: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/1.jpg)
Lecture6:NeuralNetworks
AlanRi5er(many slides from Greg Durrett)
![Page 2: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/2.jpg)
ThisLecture
‣ Feedforwardneuralnetworks+backpropaga?on
‣ Neuralnetworkbasics
‣ Applica?ons
‣ Neuralnetworkhistory
‣ Implemen?ngneuralnetworks(if?me)
![Page 3: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/3.jpg)
History:NN“darkages”
‣ Convnets:appliedtoMNISTbyLeCunin1998
‣ LSTMs:HochreiterandSchmidhuber(1997)
‣ Henderson(2003):neuralshiS-reduceparser,notSOTA
![Page 4: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/4.jpg)
2008-2013:Aglimmeroflight…
‣ CollobertandWeston2011:“NLP(almost)fromscratch”
‣ Feedforwardneuralnetsinducefeaturesforsequen?alCRFs(“neuralCRF”)
‣ 2008versionwasmarredbybadexperiments,claimedSOTAbutwasn’t,2011version?edSOTA
‣ Socher2011-2014:tree-structuredRNNsworkingokay
‣ Krizhevskeyetal.(2012):AlexNetforvision
![Page 5: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/5.jpg)
2014:Stuffstartsworking
‣ Sutskeveretal.+Bahdanauetal.:seq2seqforneuralMT(LSTMsworkforNLP?)
‣ Kim(2014)+Kalchbrenneretal.(2014):sentenceclassifica?on/sen?ment(convnetsworkforNLP?)
‣ 2015:explosionofneuralnetsforeverythingunderthesun
‣ ChenandManningtransi?on-baseddependencyparser(evenfeedforwardnetworksworkwellforNLP?)
![Page 6: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/6.jpg)
Whydidn’ttheyworkbefore?
‣ Datasetstoosmall:forMT,notreallybe5erun?lyouhave1M+parallelsentences(andreallyneedalotmore)
‣Op,miza,onnotwellunderstood:goodini?aliza?on,per-featurescaling+momentum(Adagrad/Adadelta/Adam)workbestout-of-the-box
‣ Regulariza,on:dropoutispre5yhelpful
‣ Inputs:needwordrepresenta?onstohavetherightcon?nuousseman?cs
‣ Computersnotbigenough:can’trunforenoughitera?ons
![Page 7: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/7.jpg)
NeuralNetBasics
![Page 8: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/8.jpg)
NeuralNetworks
‣ Howcanwedononlinearclassifica?on?Kernelsaretooslow…
‣Wanttolearnintermediateconjunc?vefeaturesoftheinput
argmaxyw>f(x, y)‣ Linearclassifica?on:
themoviewasnotallthatgood
I[containsnot&containsgood]
![Page 9: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/9.jpg)
NeuralNetworks:XOR
x1
x2
x1 x2
1 1111
100 0
00
0
0
1 0
1
x1, x2
(generally x = (x1, . . . , xm))
y
(generally y = (y1, . . . , yn)) y = x1 XOR x2
‣ Let’sseehowwecanuseneuralnetstolearnasimplenonlinearfunc?on
‣ Inputs
‣ Output
![Page 10: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/10.jpg)
NeuralNetworks:XOR
x1
x2
x1 x2 x1 XOR x2
1 1111
100 0
00
0
0
1 0
1“or”
y = a1x1 + a2x2 Xy = a1x1 + a2x2 + a3 tanh(x1 + x2)
(looks like action potential in neuron)
![Page 11: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/11.jpg)
NeuralNetworks:XORy = a1x1 + a2x2
x1
x2
x1 x2 x1 XOR x2
1 1111
100 0
00
0
0
1 0
1
Xy = a1x1 + a2x2 + a3 tanh(x1 + x2)
x2
x1
“or”y = �x1 � x2 + 2 tanh(x1 + x2)
![Page 12: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/12.jpg)
NeuralNetworks:XOR
x1
x2
0
1 -1
0
x2
x1
[not]
[good] y = �2x1 � x2 + 2 tanh(x1 + x2)
I
I
themoviewasnotallthatgood
![Page 13: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/13.jpg)
NeuralNetworks
Takenfromh5p://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Warp space
ShiftNonlinear transformation
Linear model: y = w · x+ b
y = g(w · x+ b)y = g(Wx+ b)
![Page 14: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/14.jpg)
NeuralNetworks
Takenfromh5p://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Linearclassifier Neuralnetwork…possiblebecausewetransformedthespace!
![Page 15: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/15.jpg)
DeepNeuralNetworks
Adopted from Chris Dyer
}outputoffirstlayer
z = g(Vg(Wx+ b) + c)
z = g(Vy + c)
Input Second Layer
FirstLayer
“Feedforward”computa?on(notrecurrent)
z = V(Wx+ b) + c
Check:whathappensifnononlinearity?Morepowerfulthanbasiclinearmodels?
![Page 16: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/16.jpg)
DeepNeuralNetworks
Takenfromh5p://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
![Page 17: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/17.jpg)
FeedforwardNetworks,Backpropaga?on
![Page 18: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/18.jpg)
Logis?cRegressionwithNNs
P (y|x) = exp(w>f(x, y))Py0 exp(w>f(x, y0))
‣ Singlescalarprobability
P (y|x) = softmax�[w>f(x, y)]y2Y
� ‣ Computescoresforallpossiblelabelsatonce(returnsvector)
softmax(p)i =exp(pi)Pi0 exp(pi0)
‣ soSmax:expsandnormalizesagivenvector
P (y|x) = softmax(Wf(x)) ‣Weightvectorperclass;Wis[numclassesxnumfeats]
P (y|x) = softmax(Wg(V f(x))) ‣ Nowonehiddenlayer
![Page 19: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/19.jpg)
NeuralNetworksforClassifica?on
V
nfeatures
dhiddenunits
dxnmatrix num_classesxdmatrix
soSmaxWf(x)
z
nonlinearity(tanh,relu,…)
g P(y
|x)
P (y|x) = softmax(Wg(V f(x)))num_classes
probs
![Page 20: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/20.jpg)
TrainingNeuralNetworks
‣Maximizeloglikelihoodoftrainingdata
‣ i*:indexofthegoldlabel
‣ ei:1intheithrow,zeroelsewhere.Dotbythis=selectithindex
z = g(V f(x))P (y|x) = softmax(Wz)
L(x, i⇤) = Wz · ei⇤ � logX
j
exp(Wz) · ej
L(x, i⇤) = logP (y = i⇤|x) = log (softmax(Wz) · ei⇤)
![Page 21: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/21.jpg)
Compu?ngGradients
‣ GradientwithrespecttoW
ifi=i*zj � P (y = i|x)zj
�P (y = i|x)zj
@
@WijL(x, i⇤) =
zj � P (y = i|x)zj
�P (y = i|x)zj otherwise
‣ Lookslikelogis?cregressionwithzasthefeatures!
i
j
{
L(x, i⇤) = Wz · ei⇤ � logX
j
exp(Wz) · ej
W
![Page 22: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/22.jpg)
NeuralNetworksforClassifica?on
V soSmaxWf(x)
zg P
(y|x)
P (y|x) = softmax(Wg(V f(x)))
@L@Wz
![Page 23: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/23.jpg)
Compu?ngGradients:Backpropaga?onz = g(V f(x))
Ac?va?onsathiddenlayer
‣ GradientwithrespecttoV:applythechainrule
err(root) = ei⇤ � P (y|x)dim=m dim=d
@L(x, i⇤)@z
= err(z) = W>err(root)
L(x, i⇤) = Wz · ei⇤ � logX
j
exp(Wz) · ej
[somemath…]
@L(x, i⇤)@Vij
=@L(x, i⇤)
@z
@z
@Vij
![Page 24: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/24.jpg)
Backpropaga?on:Picture
V soSmaxWf(x)
zg P
(y|x)
P (y|x) = softmax(Wg(V f(x)))
@L@W err(root)err(z)
z
‣ CanforgeteverythingaSerz,treatitastheoutputandkeepbackpropping
![Page 25: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/25.jpg)
Backpropaga?on:Takeaways
‣ GradientsofoutputweightsWareeasytocompute—lookslikelogis?cregressionwithhiddenlayerzasfeaturevector
‣ Cancomputederiva?veoflosswithrespecttoztoforman“errorsignal”forbackpropaga?on
‣ Easytoupdateparametersbasedon“errorsignal”fromnextlayer,keeppushingerrorsignalbackasbackpropaga?on
‣ Needtorememberthevaluesfromtheforwardcomputa?on
![Page 26: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/26.jpg)
Applica?ons
![Page 27: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/27.jpg)
NLPwithFeedforwardNetworks
Bothaetal.(2017)
…
Fedraisesinterestratesinorderto…
f(x)?? emb(raises)
‣Wordembeddingsforeachwordforminput
‣ ~1000featureshere—smallerfeaturevectorthaninsparsemodels,buteveryfeaturefiresoneveryexample
emb(interest)
emb(rates)‣Weightmatrixlearnsposi?on-dependent
processingofthewords
previousword
currword
nextword
otherwords,feats,etc.
‣ Part-of-speechtaggingwithFFNNs
![Page 28: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/28.jpg)
NLPwithFeedforwardNetworks
‣ Hiddenlayermixesthesedifferentsignalsandlearnsfeatureconjunc?ons
Bothaetal.(2017)
![Page 29: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/29.jpg)
NLPwithFeedforwardNetworks‣Mul?lingualtaggingresults:
Bothaetal.(2017)
‣ GillickusedLSTMs;thisissmaller,faster,andbe5er
![Page 30: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/30.jpg)
Sen?mentAnalysis‣ DeepAveragingNetworks:feedforwardneuralnetworkonaverageofwordembeddingsfrominput
Iyyeretal.(2015)
![Page 31: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/31.jpg)
Sen?mentAnalysis
{
{Bag-of-words
TreeRNNs/CNNS/LSTMS
WangandManning(2012)
Kim(2014)
Iyyeretal.(2015)
![Page 32: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/32.jpg)
CoreferenceResolu?on‣ Feedforwardnetworksiden?fycoreferencearcs
ClarkandManning(2015),Wisemanetal.(2015)
PresidentObamasigned…
Helatergaveaspeech…
?
![Page 33: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/33.jpg)
Implementa?onDetails
![Page 34: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/34.jpg)
Computa?onGraphs
‣ Compu?nggradientsishard!
‣ Automa?cdifferen?a?on:instrumentcodetokeeptrackofderiva?ves
y = x * x (y,dy) = (x * x, 2 * x * dx)codegen
‣ Computa?onisnowsomethingweneedtoreasonaboutsymbolically
‣ UsealibrarylikePytorchorTensorflow.Thisclass:Pytorch
![Page 35: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/35.jpg)
Computa?onGraphsinPytorch
P (y|x) = softmax(Wg(V f(x)))
class FFNN(nn.Module): def __init__(self, inp, hid, out): super(FFNN, self).__init__() self.V = nn.Linear(inp, hid) self.g = nn.Tanh() self.W = nn.Linear(hid, out) self.softmax = nn.Softmax(dim=0)
def forward(self, x): return self.softmax(self.W(self.g(self.V(x))))
‣ Defineforwardpassfor
![Page 36: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/36.jpg)
Computa?onGraphsinPytorch
P (y|x) = softmax(Wg(V f(x)))
ffnn = FFNN()
loss.backward()
probs = ffnn.forward(input)loss = torch.neg(torch.log(probs)).dot(gold_label)
optimizer.step()
def make_update(input, gold_label):
ffnn.zero_grad() # clear gradient variables
ei*: one-hot vector of the label (e.g., [0, 1, 0])
![Page 37: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/37.jpg)
TrainingaModel
Defineacomputa?ongraph
Foreachepoch:
Computelossonbatch
Foreachbatchofdata:
Decodetestset
Autogradtocomputegradientsandtakestep
![Page 38: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/38.jpg)
Batching
‣ Batchingdatagivesspeedupsduetomoreefficientmatrixopera?ons
‣ Needtomakethecomputa?ongraphprocessabatchatthesame?me
probs = ffnn.forward(input) # [batch_size, num_classes]loss = torch.sum(torch.neg(torch.log(probs)).dot(gold_label))
...
‣ Batchsizesfrom1-100oSenworkwell
def make_update(input, gold_label)
# input is [batch_size, num_feats] # gold_label is [batch_size, num_classes]
...
![Page 39: Lecture 6: Neural Networksaritter.github.io/courses/5525_slides_v2/lec6-nn.pdf‣ Compu?ng gradients is hard! ‣ Automa?c differen?a?on: instrument code to keep track of deriva?ves](https://reader033.fdocuments.in/reader033/viewer/2022052020/6034a179f30d0173b260c671/html5/thumbnails/39.jpg)
NextTime
‣Moreimplementa?ondetails:prac?caltrainingtechniques
‣Wordrepresenta?ons/wordvectors
‣ word2vec,GloVe