Texas A&M Institute of Data Science Tutorial Workshop Series
Introduction to Deep Learning
by Boris Hanin
June 12, 2020
Deeplearninglutorial ③ 0p¥ : Use D to obtain
setting ① * set .:God : ① What is a NN ?
\* f Coca) x llcscaj -0*7② How NNS are used ?
③ What kinds of choice are made by°
e.g .
linear regression
engineers ? ④ Testy : See how well Q*
④ Main use cases and failure modes? performs on unseen data :
- I Has a UGG 0*5 ?Supervised Learning--
① DataAcqoisio obtain dataset salient Features :
÷:":c: :c:::÷÷÷i÷:÷÷¥i .:c : :""② todeseection : choose model
. Nws both interpolate and
xi→ MCKIE) extrapolatewhere ⑦ = vector of param s
- e. g . McKie) -_ a ,Kit .. . tanka
,O -- Cai)
Neutrals : Ey : oct) = Reluct) = Max { 0 , t }• Neural nets are built of
"
neurons"
off . soto) sczscx)I ⑨ - 4rCx -yd)x= (x
; . .- pen) '→ 2- (xjo) = 6 (bt '494 . . - tknwn) ✓ oy gE- ( by w )
• •¥bias weights x
÷÷: iii.miiiiiieiiiii:/•
"
Def"
A neural network is a collection
of neurons and wiringdia-g.amT
" architecture"
⑦ = ( all b's.
W 's)
"
•
rc.io#-/l..✓ ✓ O
'-
Xz•-• 2 fher
° In practice NNS have " layers"
: x NC :O)• I
÷¥¥¥::÷:2 =# layers ③ optimize E by gradient descent on•
• Logo) fool .CO) ; e.cCo7=lfGd - racial
input 1st layer 2nd layerx ↳ ad" i→ secs) ↳ urge;
④ Testing : Draw new Coc , Ha))I and check whether
• layers to heirarchical repsTypical : in
"no ,> 1
/N(xi0*)xfG① Datsun : D= fish.fm} Empirically : deeper is better *
② Architecture : choose o's,
wiring diagram , depth , width
③ Randomly initialize O - { W , b }
Maines : . flaw) = { to .
,
'ch has human
o ( w
① NL PY *
• sq.=
"
the cat is big . Self- Driving Cars
. f- Gcn) -"
le chat est grand"
o Facial Rec
• Google Translate ③ Reinforcement Learning• och -- state of system
- Siri (e.g . position of chess board)
- Chat Bots I often) -- reward - masc action③ Computer Vision (e.g ,
best next move )• och = image
• AlphaGo' Exploration by AU
N
NeuralNetoptimization-foo-fo.qoe.co) okey : How to choose 7,
IBI ?
W •- • Intuition : ① small 2. as slow but accurate
I '.I ' large A <→ fast but noisy
x .
•¥: •
② why choose same 2 for all params ?"
\.# rise; o) might be very sensitive to
learning rate wage) E ,but not to 02 .
d• GD : SW = - 2W Io ③ In practice : find 1 by grid search-
OWon log -space
• Compute }Iw using" backpropagation
"
④ why keep a constant during train?
÷:÷÷÷÷÷÷÷÷÷. . . ¥÷÷÷÷÷÷÷:÷::::"bastitgeh That , feel.CO) a Lo ⑤ small IBI as noisy but fast
large CBI as accurate butslow
• god : o small batches mean [ 2 . IBI # const)less computation ⑥ I , IBI are inversely related
Architecture Selection :×, yw . I NGS- o)
g-
• \ o
• Best architecture is always data - dependent : o-
o- J .
•# 00
• µ ,p ←, { Tranformer -based Kz \ /Recurrent CLS -1Mt Attention)
.- .
Wh Nz Nz-_W .
I -- Er Er.EE
.• cuts convolutional
,Residual ow ow
-
product x# layersJacobians
• Still leaves many choices:
• details of wiring (width depth . . : 181W I = O or + a
• choice of r ( = ReLV) " exploding 1 vanishing gradients"
u how to initialize and to
optimize• Empirical : deep is good
*
*= but often less stable
Residual Network : . In a Cmu Net every layer has-
channels xcxiy) structure
#①I -
- -
④- or,
° Every neuron in layer sharesx - raft Nz weightsaddition
output - actor, Gc) + valour, Gcs) : j ! ! ! ! it . . . ^•0 • co •
Intuition : Nj = g-the order ÷#• . ..
#correction to /•¥€ . .Krs x #
✓• oo#
etso.
. inputs are nxn RGB a c,
'
r
. C l
④/ Key : Images .. c l l
•
go y '
c (c
/] :O :: anerierarc-h.ae : i . I • '
ROB so and 104 R G B all neuro 1st
channels look for same pattern •
Challenges 89.9T. -7 99.9 's. hydrant
-
IF① New Use Cases : I STOP I
• PDE ( fluids , physics , chemical . ..) 1--1
• Biology ( genomics)I
② Distribution Shift( nature of data change) : I→ change in hardware
• sunny vs . cloudy• issue : NNS tend to be
brittle
Top Related