STEREO - Washington University in St. Louisayan/courses/cse559a/PDFs/lec...For a reference patch in...

/

CSE 559A: Computer Vision

Fall 2020: T-R: 11:30-12:50pm @ Zoom

Instructor: Ayan Chakrabarti ([email protected]).Course Staff: Adith Boloor, Patrick Williams

Dec 10, 2020

http://www.cse.wustl.edu/~ayan/courses/cse559a/

1 /

ANNOUNCEMENTSANNOUNCEMENTSLast recitation, “or PSET 5, this Friday!Make sure you are all workin” on “inal projects. Leave yoursel“ enou”h time to write the report.

2

/

OBJECT DETECTIONOBJECT DETECTION

3 /


4

/


5 /


6

/

OBJECT DETECTIONOBJECT DETECTIONNewer methods also use a neural network to ”enerate re”ion proposalsEfficient Implementations: bulk o“ the computation happens once on the entire ima”e, and you crop a “eaturemap “or each re”ion.Even Faster Methods: Discretize ima”e locations into ”rid, and directly output upto a “ixed number o“boundin” boxes “or each ”rid block.

7 /

TRANSFER LEARNINGTRANSFER LEARNINGSay you want to train a network to solve a problem.

The task is complex, so you need a lar”e network.But you don’t have enou”h trainin” data to train such a network.

Pick a related task “or which you do have a lot o“ trainin” dataIma”eNet is a ”reat database “or this “or a variety o“ semantic tasks

Train a network (like VGG-16) to solve that task.Then, choose the output o“ some intermediate layer o“ that networkUse it as a “eature vector, and learn a smaller network “or your problem which ”oes “rom those “eatures to thedesired output.

8

/

TRANSFER LEARNINGTRANSFER LEARNINGVGG-16 does well on Ima”enet classi“ication

and ”ives you a “eature representation that is surprisin”ly use“ul “or a broad ran”e o“ tasks.

Remember computin” encodin” “rom . VGG-16’s pool5, “c1, “c2, “eatures can be the “or many tasks.

One can also initialize a network with the VGG-16 architecture to one trained with ima”enet, and then“inetune by replacin” the “inal layer as classi“ication “or another task.

In ”eneral, empirical question to determine when trainin” on Task A will provide ”ood “eatures “or Task B.

x

x x

9 /

OTHER TASKSOTHER TASKS

10

/


11 /


12

/


13 /


14

/


15 /


16

/


17 /


18

/


19 /


20

/


21 /


22

/

FULLY-CONVOLUTIONAL NETWORKSFULLY-CONVOLUTIONAL NETWORKS

23 /


24

/


25 /


26

/


27 /


28

/


29 /


30

/


31 /


32

/


33 /


34

/


35 /


36

/


37 /


38

/


39 /

FULLY-CONVOLUTIONAL NETWORKSFULLY-CONVOLUTIONAL NETWORKSBut what about downsamplin” ?

Option 0: Just don’t use downsamplin”

Bad, because down-samplin” is a way to quickly increase the receptive “ield o“ your network.

Option 1: Just produce a label map at lower-resolution.

Option 2: I“ you downsample by (typically ) Feed every shi ed version o“ your input throu”h this FCN.

Bad because i“ you down-sample multiple times, you’re stillre-computin” activations prior to the last-downsamplin”.

Option 3: Dilated Convolutions

N N = 2

K

(N − 1) × (N − 1)

40

/

DILATED CONVOLUTIONDILATED CONVOLUTION

41 /


42

/


43 /


44

/


45 /


46

/


47 /


48

/

SEMANTIC SEGMENTATIONSEMANTIC SEGMENTATION

49 /

SEMANTIC SEGMENTATIONSEMANTIC SEGMENTATION

50

/

DEEP ARCHITECTURESDEEP ARCHITECTURES

51 /


52

/


53 /


54

/


55 /


56

/


57 /


58

/


59 /


60

/

DEEP ARCHITECTURESDEEP ARCHITECTURESWith BatchNorm

He et al., Identity Mappin”s in Deep Residual Networks . 2016.

61 /

MORE ABOUT ARCHITECTURESMORE ABOUT ARCHITECTURES

62

/


63 /


64

/


65 /


66

/


67 /


68

/


69 /


70

/


71 /


72

/


73 /


74

/


75 /


76

/


77 /


78

/


79 /

TRANSPOSE CONVOLUTIONTRANSPOSE CONVOLUTIONAs it su””ests, it is the transpose o“ the operation o“ Convolution with Stride.In “act, this represents the operation “or back-propa”atin” ”radients throu”h a convolution-with-stride layer.

Lets ”o back to our matrix vector notation, represent convolution with and downsamplin” with .

What is the transpose o“ this operation ? O“ ?

What does represent ?

Upsamplin” by “illin” in zeros. is still convolution (with a “lipped kernel, but doesn’t matter).

So a convolution-transpose layer effectively does up-samplin” with zeros, and then a re”ular convolution.But up-samplin” with zeros o en leads to arti“acts. Newer architectures don’t use convolution transpose.Instead, they do bilinear or nearest-nei”hbor interpolation on the “eature maps to increase resolution, andthen do a re”ular convolution.

A

k

A

s

y = xA

s

A

k

A

s

A

k

( =A

s

A

k

)

T

A

T

k

A

T

s

A

T

s

A

T

k

80

STEREO - Washington University in St. Louisayan/courses/cse559a/PDFs/lec...For a reference patch in...

Documents

Transcript of STEREO - Washington University in St. Louisayan/courses/cse559a/PDFs/lec...For a reference patch in...