A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks...

28
A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1

Transcript of A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks...

Page 1: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

1

A Framework of Extracting Multi-scale Features Using Multiple Convolutional

Neural NetworksKuan-Chuan Peng

Tsuhan Chen

Page 2: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

2

Introduction

• Breakthrough progress in object classification.

O. Russakovsky et al. ImageNet large scale visual recognition challenge. arXiv:1409.0575, 2014.N. Murray et al. AVA: A Large-Scale Database for Aesthetic Visual Analysis. CVPR12.

catdog

liontiger

Page 3: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

3

Introduction

• Humans are interested in more than objects.• For example, aesthetic quality.

N. Murray et al. AVA: A Large-Scale Database for Aesthetic Visual Analysis. CVPR12.

Page 4: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

4

How do machines describe images?

• Examples by state-of-art algorithm:

A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15.http://cs.stanford.edu/people/karpathy/deepimagesent/

“man in black shirt is playing guitar.”

“woman is holding bunch of bananas.”

Page 5: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

5

How do machines describe images?

• Examples by state-of-art algorithm:

A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15.http://cs.stanford.edu/people/karpathy/deepimagesent/

“man in black shirt is playing guitar.”

“woman is holding bunch of bananas.”

Page 6: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

6

How do machines describe images?

• Examples by state-of-art algorithm:

A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15.http://cs.stanford.edu/people/karpathy/deepimagesent/

“man in black shirt is playing guitar.”

“woman is holding bunch of bananas.”

Page 7: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

7

How do machines describe images?

• Examples by state-of-art algorithm:

A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15.http://cs.stanford.edu/people/karpathy/deepimagesent/

“man in black shirt is playing guitar.”

“woman is holding bunch of bananas.”

Page 8: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

8

How do experts describe images?

• Examples by the Pulitzer Prize winners:

http://www.pulitzer.org/archives/8417http://www.pulitzer.org/archives/6451

“At bath times, Danielle appears serene. But no one know what lies beyond those eyes.” (by Lane DeGregory)

“The surgery has dragged on for hours with little progress, and Mulliken, taking a breather next to an array of Sam's CAT scans, is feeling the frustration and exhaustion.” (by Tom Hallman Jr.)

Page 9: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

9

How do experts describe images?

• Images convey more than objects.

http://www.pulitzer.org/archives/8417http://www.pulitzer.org/archives/6451

“At bath times, Danielle appears serene. But no one know what lies beyond those eyes.” (by Lane DeGregory)

“The surgery has dragged on for hours with little progress, and Mulliken, taking a breather next to an array of Sam's CAT scans, is feeling the frustration and exhaustion.” (by Tom Hallman Jr.)

Page 10: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

10

Beyond Objects

• Abstract attributes matter.– Attributes relating to or involving general ideas or

qualities rather than specific people, objects, or actions. [Merriam-Webster dictionary]

• Bridge the gap between machines and humans:– Teach machines to solve abstract tasks (tasks

involving abstract attributes).

http://www.merriam-webster.com/dictionary/abstract

Page 11: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

11

Goal

• A general framework to achieve better performance in abstract tasks.– Multi-scale features by using convolutional neural

networks (CNN).

Page 12: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

12

Why CNN?

O. Russakovsky et al. ImageNet large scale visual recognition challenge. arXiv:1409.0575, 2014.L. Deng et al. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion. ICASSP13.A. Karpathy et al. Large-scale video classification with convolutional neural networks. CVPR14.

object classificationvideo classification

speech recognition

Page 13: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

13

Existing Abstract Tasks

• More and more abstract tasks are proposed.

Page 14: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

14

Artistic Style & Artist Style Classification[F. S. Khan et al. MVA14.]

Architectural Style Classification[Z. Xu et al. ECCV14.]

Page 15: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

15

amusement anger awe contentment disgust excitement fear sadEmotion Classification

[J. Machajdik et al. ACMMM10.]

Aesthetic Classification[N. Murray et al. CVPR12.]

high aesthetic quality low aesthetic quality

Page 16: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

16

Bohemian

HipsterFashion Style Classification

[M. H. Kiapour et al. ECCV14.]

Memorability Prediction[P. Isola et al. CVPR11.]

Interestingness Prediction[M. Gygli et al. ICCV13.]

Page 17: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

17

Inspiration

• It is tricky to describe abstract attributes as objects.– Not easy to “locate” abstract attributes.

• What if abstract attributes prevail everywhere?– Label-inheritable (LI) property.

contentment[J. Machajdik et al. ACMMM10.]

?

Page 18: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

18

Label-Inheritable (LI) Property

Dataset Painting-91 [1] arcDataset [2] Caltech-101 [3]

Task Artist style classification

Architectural style classification Object classification

Label Picasso Baroque Architecture Faces

Label-inheritable Yes Partial Mostly No

[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14.[2] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.[3] F.-F. Li et al. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW04.

Page 19: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

19

Label-Inheritable (LI) Property

Dataset Painting-91 [1] arcDataset [2] Caltech-101 [3]

Task Artist style classification

Architectural style classification Object classification

Label Picasso Baroque Architecture Faces

Label-inheritable Yes Partial Mostly No

[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14.[2] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.[3] F.-F. Li et al. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW04.

Page 20: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

20

Label-Inheritable (LI) Property

Dataset Painting-91 [1] arcDataset [2] Caltech-101 [3]

Task Artist style classification

Architectural style classification Object classification

Label Picasso Baroque Architecture Faces

Label-inheritable Yes Partial Mostly No

[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14.[2] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.[3] F.-F. Li et al. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW04.

Page 21: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

21

Multi-Scale CNN

• Assume LI property holds for each image and the associated label.

A. Krizhevsky et al. ImageNet classification with deep convolutional neural networks. NIPS12.

Page 22: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

22

AlexNet

• The number of nodes in output layer is changed to be the number of classes in each task.

A. Krizhevsky et al. ImageNet classification with deep convolutional neural networks. NIPS12.

Page 23: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

23

Experimental Results

Method \ Task Artist style classification

Artistic style classification

Caltech-101 object

classification(15 / 30 training

examples per class)

Architectural style

classification(10 / 25 classes)

Previous work(baseline) 53.10 [1] 62.20 [1] 83.80 / 86.50 [2] 69.17 / 46.21

[3]Single-scale CNN

(baseline) 55.15 67.37 83.45 / 88.19 70.64 / 54.84

2-scale CNN(ours) 58.11 69.67 80.19 / 87.58 74.82 / 58.89

3-scale CNN(ours) 57.91 70.96 N/A 75.32 / 59.13

[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14.[2] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. ECCV14.[3] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.

classification accuracy (%)

Label-inheritable Yes Yes Mostly No Partial

Page 24: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

24

Is it because of more training data?

• What if we train one CNN with images in different scales?

A. Krizhevsky et al. ImageNet classification with deep convolutional neural networks. NIPS12.

Page 25: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

25

Additional Results

Method \ Task Artist style classification

Artistic style classification

Caltech-101 object

classification(15 / 30 training

examples per class)

Architectural style

classification(10 / 25 classes)

Previous work(baseline) 53.10 [1] 62.20 [1] 83.80 / 86.50 [2] 69.17 / 46.21

[3]Single-scale CNN

(baseline) 55.15 67.37 83.45 / 88.19 70.64 / 54.84

2-scale CNN(ours) 58.11 69.67 80.19 / 87.58 74.82 / 58.89

1 CNN +2-scale images 46.86 61.95 N / A 67.93 / 49.06

[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14.[2] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. ECCV14.[3] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.

classification accuracy (%)

Label-inheritable Yes Yes Mostly No Partial

Page 26: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

26

Conclusion

• We proposed Multi-Scale Convolutional Neural Networks (MSCNN) based on Label-Inheritable (LI) property.– Multi-scale features.

• MSCNN can outperform the state-of-art performance on datasets where LI property holds or even partially holds.

Page 27: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

27

Towards Solving Abstract Tasks

• More CNN features to achieve better performance in abstract tasks.– Multi-scale features (ICME15).– Multi-depth features (ICIP15).– Multi-task features (submitted to ICCV15).

K.-C. Peng and T. Chen. A Framework of extracting multi-scale features using multiple convolutional neural networks. ICME15.K.-C. Peng and T. Chen. Cross-layer features in convolutional neural networks for generic classification tasks. ICIP15.K.-C. Peng and T. Chen. Toward correlating and solving abstract tasks using convolutional neural networks. Submitted to ICCV15.

Page 28: A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks Kuan-Chuan Peng Tsuhan Chen 1.

28

Q & A