Image description through fusion based recurrent multi model learning

5

Click here to load reader

Transcript of Image description through fusion based recurrent multi model learning

Page 1: Image description through fusion based recurrent multi model learning

IMAGE DESCRIPTION THROUGH FUSION BASED RECURRENT MULTI MODALLEARNING

Ram Manohar Oruganti1, Shagan Sah2, Suhas Pillai3 and Raymond Ptucha1

ABSTRACT

Index Terms

1. INTRODUCTION

Fig. 1.

Page 2: Image description through fusion based recurrent multi model learning

2. BACKGROUND

2.1 Convolutional Neural Networks

2.2 Long Short TermMemory Networks

<x1, x2, xt 1, xt, ,xT>, xt 1 xt

xt

it ftot is gt

ct,

ht,

it, ft, ot

W b

3. PROPOSED LEARNING MODEL

3.1 FRMM model

Page 3: Image description through fusion based recurrent multi model learning

Fig. 2.

3.2 FRMM variations

3.3 Image description through FRMMs

image stage language stagefusion stage

4. EXPERIMENTAL RESULTS4.1 Datasets

Page 4: Image description through fusion based recurrent multi model learning

4.2 Training detailsCaffe

4.3 Results

Model B 1 B 2 B 3 B 4AFRMM 70.2 52.8 38.3 27.6

Table I.

CNN layer B 1 B 2 B 3 B 4

AFRMM+fc8 70.2 52.8 38.3 27.6

Table II.

Model B 1 B 2 B 3 B 4 METEOR

40.4

Our model 70.2 52.8 27.6 22.5

Table III.

Model B 1 B 2 B 3 B 4 METEOR

Vinyals [13] 66.3 42.3 27.7 18.3

Table IV.

5. CONCLUSION

6. REFERENCES

, et al.arXiv preprint

arXiv:1409.0575,

Page 5: Image description through fusion based recurrent multi model learning

26th Annual Conference onNeural Information Processing Systems 2012, NIPS2012, December 3, 2012 December 6, 2012

Proceedings of the IEEE,

27th Annual Conference on Neural InformationProcessing Systems, NIPS 2013

Neural Computation,

ICASSP 2013

Computer Vision and PatternRecognition

Computer Vision and PatternRecognition

, et al.

Computer Vision and PatternRecognition

arXiv preprintarXiv:1505.00487,

, et al.Proceedings of the IEEE

International Conference on Computer Vision

, et al.arXiv

preprint arXiv:1502.03044,

arXiv preprint arXiv:1411.4555,

21stAnnual Conference on Neural InformationProcessing Systems, NIPS 2007

Advances in neural information processing systems

arXiv preprint arXiv:1410.4615,

Computer Vision and Pattern Recognition

arXiv preprint arXiv:1412.4729,

arXiv preprintarXiv:1412.6632,

Transactions of the Associationfor Computational Linguistics,

, et al.Computer Vision ECCV

2014

ICLR

Proceedings of the 40thannual meeting on association for computationallinguistics

In Proceedings of the NinthWorkshop on Statistical Machine Translation

, et al.

arXiv preprint arXiv:1411.4389,, et al.

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition

arXiv preprint arXiv:1410.1090,