Multimedia information management
-
Upload
sara-egidi -
Category
Engineering
-
view
139 -
download
0
Transcript of Multimedia information management
![Page 1: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/1.jpg)
Multimedia Information
ManagementFinal Project
Multimodal Searching
Sara EgidiFabio Greco
Alessio Villardita
![Page 2: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/2.jpg)
Roadmap●System Architecture
●Dataset
●Feature extraction
●Features quantisation
● Indexing
●Searching
● Interface Implementation
●Results
![Page 3: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/3.jpg)
Project objectivesDevelopment of a search engine that allows textual, visual and multimodal searches. Implementation steps:
Extraction of global deep featuresIndexing of visual featuresIndexing of image metadata (tags)Combination of text and visual features at search timeExtensionsMultiple layers’ visual featuresShow classification results
![Page 4: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/4.jpg)
Dataset 25000 images from Flickr
Raw - Exif Includes information on camera, settings, date, time and perhaps location.
Annotations Very few in number (29), not sufficiently large to be representative and useful for indexing purposes.
TagsPreprocessed by flickr. Tags written by users yield some meaningless and thus useless entries.
![Page 5: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/5.jpg)
Deep feature extraction
AlexNet
Global features:
FC6-Layer 6
FC7-Layer 7
FC8-Class labels
![Page 6: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/6.jpg)
Overall System Architecture
![Page 7: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/7.jpg)
Features quantizationFrom deep features to alphanumeric strings:
each component of the feature vector is associated with a unique alphanumeric keyword
to keep the feature weight into account, the float value of each component is represented as integer using Math.round and using a quantization factor Q
Example Q = 30[ 0.20 0.005 0.12 0.29 ] → [ 6 0.15 3.6 8.7 ] → [ 6 0 4 9 ] → [A1 A1 A1 A1 A1 A1 A3 A3 A3 A3 A4 A4 A4 A4 A4 A4 A4 A4 A4 ]
![Page 8: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/8.jpg)
Indexing
Id
Tags
Deep feature 6
Deep feature 7
Class label (lvl 8)
Text Query
Visual Query
Text+Visual Query
![Page 9: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/9.jpg)
Searching5 different combinations
Text
Text + uploaded image
Text + indexed image
Uploaded image
Indexed image
text
![Page 10: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/10.jpg)
Interface Implementation
![Page 11: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/11.jpg)
ResultsWithout text
(Visual Query)
With text (dog)
(Multimodal Query)
Different layers
![Page 12: Multimedia information management](https://reader036.fdocuments.in/reader036/viewer/2022062503/58d0d1ae1a28ab47238b4a6b/html5/thumbnails/12.jpg)
ReferencesM. J. Huiskes, M. S. Lew (2008). The MIR Flickr Retrieval Evaluation.
ACM International Conference on Multimedia Information Retrieval (MIR'08), Vancouver, Canada (bib)
Large Scale Deep Convolutional Neural Network Features Search with Lucene, Claudio Gennaro
The MIR Flickr Retrieval Evaluation, Mark J. Huiskes and Michael S. Lew
Source code: http://www.github.com/egidisa/MultiModalSearch