Deep TEN: Texture Encoding Network IEEE 2016 Conference on · IEEE 2016 Conference on Computer...
Transcript of Deep TEN: Texture Encoding Network IEEE 2016 Conference on · IEEE 2016 Conference on Computer...
![Page 1: Deep TEN: Texture Encoding Network IEEE 2016 Conference on · IEEE 2016 Conference on Computer Vision and Pattern Recognition Deep TEN: Texture Encoding Network Hang Zhang, JiaXue,](https://reader036.fdocuments.in/reader036/viewer/2022062507/5fcc91d74510046e8c42e67c/html5/thumbnails/1.jpg)
IEEE 2016 Conference on Computer Vision and Pattern
Recognition Deep TEN: Texture Encoding Network
Hang Zhang, Jia Xue, Kristin Dana {zhang.hang, jia.xue}@rutgers.edu, [email protected], Department of Electrical and Computer Engineering, Rutgers University.
Overview:Ø Encoding-Net(anewCNNarchitecture)withanovel
EncodingLayer.Ø State-of-theartresultsontexturerecognition
(minc-2500,FMD,GTOS,4D-lightdatasets).Ø Flexible deep learning framework (arbitrary image size and
easy to transfer learned features).
EncodingLayer:Ø Residual Encoder
• Given a set of visual descriptors 𝑋 = {𝑥%, . . 𝑥(} and a learnedcodebook 𝐶 = {𝑐%, … 𝑐-}.
• Each descriptor 𝑥. can be assigned with a weight 𝑎.0to eachcodeword 𝑐0.
• Theresidualencoderaggregatetheresidualswithassignmentweights
𝑒0 =2𝑒.0
(
.3%
=2𝑎.0𝑟.0
(
.3%
Experiments:Ø Dataset
• Material & texture datasets: MINC-2500, KTH, FMD, 4D-light, GTOS
• General recognition datasets: MIT-Indoor, Caltech-101
Ø Baselines• FV-SIFT (128 Gaussian Components, 32𝐾 ⇒ 512)• FV-CNN (Cimpoi et al. VGG-VD & ResNet, 32GMM)
Featureextraction
DictionaryLearning
Encoding Classifier
Histogram Encoding
Dictionary
SIFT / Filter Bank Responses
Fisher Vector
Dictionary
Pre-trained CNNs
Residual Encoding
Dictionary
Convolu!onal Layers
Off-the-Shelf
SVM SVM FC Layer
End-to-End
BoWs FV-CNN Deep-TEN
Encoding Layer
CIFAR
STL
ConvLayers
E1E2
Ø Classic Vision Approaches• Flexible by allowing arbitrary input image size.• No problem of domain transfer (features are generic).• Dictionaryencodingusuallycarriesdomaininformation.
Ø Deep learning• Preserving spatial information (texture needs orderless).• Fixed image size.• Difficulties in domain transfer.
Input
Dictionary Residuals
Assign
Aggregate
Encoding-Layer
Ø Relation to Other Approaches:• Dictionarylearning:K-meansorK-SVD• BoW,VLAD,FisherVector&Net-VLAD• GlobalPooling:Avg-pool,SPP-Net,Bilinearpool
Ø Domain Transfer• TheResidualEncodingdiscardsthefrequentlyappearing
features,whichislikelytobedomainspecific.• Foravisualfeature𝑥. thatappearsfrequentlyinthedata,itis
likelyclosetoavisualcenter𝑑0a).𝑒0 ≈ 0,since𝑟.0 = 𝑥. − 𝑑0 ≈ 0b).𝑒? ≈ 0 𝑗 ≠ 𝑘 ,since𝑎.0 ≈ 0 (soft-assignment)
Ø Compare to State-of-the-art
Ø Assignment Weights:• Soft-weighting
𝑎.0 =exp(−𝛽 𝑟.0 I)
∑ exp(−𝛽 𝑟.?I)-
?3%
• LearnablesmoothingFactor
𝑎.0 =exp(−𝑠0 𝑟.0 I)
∑ exp(−𝑠? 𝑟.?I)-
?3%
Ø Effect of Multi-size Training• Ideallyarbitraryimagesizes• Trainingwithpre-definedsizes
iterativelyw/omodifyingsolver
• Single-sizetestingforsimplicity
Ø Joint Encoding• Dictionaryencodingrepresentationislikelytocarry
domaininformation.• Thefeaturesarelikelytobegeneric.• CIFAR-10:36×36
STL-10:96×96Codehere!
(Only shallow architectures considered.)