Information Extraction from Multimedia Content on the Social Web
-
Upload
colette-myers -
Category
Documents
-
view
34 -
download
1
description
Transcript of Information Extraction from Multimedia Content on the Social Web
![Page 1: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/1.jpg)
Information Extraction from Multimedia Content on the Social Web
Stefan SiersdorferL3S Research Centre, Hannover, Germany
![Page 2: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/2.jpg)
Meta Data and Visual Data on the Social Web
Meta Data:• Tags• Title Descriptions• Timestamps• Geo-Tags• Comments• Numerical Ratings• Users and Social Links
Visual Data:• Photos • Videos
How to exploit combined information from visual data and meta data?
![Page 3: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/3.jpg)
Example 1: Photos in Flickr
![Page 4: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/4.jpg)
Example 2: Videos in Youtube
![Page 5: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/5.jpg)
Social Web Environments as Graph Structure
User 1
Video 1
Video 2Video 3
User 3
User 2tag1
tag2tag3
Group 2
Entities (Nodes): • Rescources (Videos, Photos)• Users• Tags• Groups
Relationships (Edges):• User-User: Contacts, Friendship• User-Resources: Ownership, Favorite Assignment, Rating• User-Groups: Membership• Resource-Resource: visual similarity, meta data similarity
![Page 6: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/6.jpg)
User Feedback on the Social Web
• Numeric Ratings, Favorite Assignments• Comments• Clicks/Views• Contacts, Friendships• Community Tagging• Blog Entries • Upload of Content
How can exploit the community feedback?
![Page 7: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/7.jpg)
Outline• Part 1: Photos on the Social Web
1.1) Photo Attractiveness 1.2) Generating Photo Maps 1.3) Sentiment in Photos
• Part 2: Videos on the Social Web Video Tagging
![Page 8: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/8.jpg)
Part I: Photos on the Social Web
![Page 9: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/9.jpg)
1.1) Photo Attractiveness *
* Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain
![Page 10: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/10.jpg)
10
Attractiveness of Images
Landscape Portrait Flower
Which factors influence the human perception of attractiveness?
![Page 11: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/11.jpg)
11
Attractiveness Visual Features
Human visual perception mainly influenced byColor distribution
Coarseness
These are complex conceptsConvey multiple orthogonal aspects
Necessity to consider different low level features
![Page 12: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/12.jpg)
12
Attractiveness Visual Features
Color FeaturesBrightness
Contrast
Luminance, RGB
Colorfulness
Naturalness
Saturation
Mean, Variance
Intensity of the colors
Saturation is 0 for grey scale images
![Page 13: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/13.jpg)
13
Visual FeaturesCoarseness
Resolution + Acutance
Sharpness
Critical importance for final appearance of photos [Savakis 2000]
![Page 14: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/14.jpg)
Textual FeaturesWe consider user generated meta data
Correlation of topics with image appealing (ground truth: favorite assignments)
Tags seem appropriate to capture this information
![Page 15: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/15.jpg)
Attractiveness of Photos
Community-based models for classifying/ranking images according to their appeal. [WWW´09]
Content(visual features)
Metadata(textual features)
Community Feedback(photo’s interestingness) Classification &
Regression Attractiveness Models
Generator
InputsFlickr Photo Stream
cat, fence, house
#views#comments#favorites...
![Page 16: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/16.jpg)
16
Classification & Regression Models
![Page 17: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/17.jpg)
17
Experiments
![Page 18: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/18.jpg)
1.2) Generating Photo Maps *
*Work and illustrations from
David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg,Mapping the World's Photos, 18th International World Wide Web Conference, WWW 2009, Madrid, Spain
![Page 19: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/19.jpg)
Outline: Photos maps• Use geo-location, tags, and visual features of photos to
Identify popular locations and landmarks Find out location of photos Estimate representative images
![Page 20: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/20.jpg)
Spatial Clustering
Each data point corresponds to (longitude,latidue) of an image
Mean shift clustering is applied to get hierarchical structure
Most distinctive popular tags are used as labels(# photos tag in cluster/ # photos with tag in overall set)
london
paris
eiffel
louvre
trafalgarsquare
tatemodern
![Page 21: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/21.jpg)
Estimating Location of Photos without tags• Train SVMs on Clusters
Positive Examples: Photos in Clusters Negative Examples: Photos outside the Cluster
• Feature Representation Tags Visual features (SIFT)
• Best Performance for Combination of Tags and SIFT features
![Page 22: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/22.jpg)
Finding Representative Images
Construct Weighted Graph: -Weight based on visual similarity of images (using SIFT features)-Use Graph Clustering (e.g. spectral clustering) to identify tightly connected components-Choose image from this connected component
![Page 23: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/23.jpg)
Example 1: Europe
![Page 24: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/24.jpg)
Example 2:New York
![Page 25: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/25.jpg)
1.2) Sentiment in Photos *
* Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web 18th ACM Multimedia Conference (MM 2010), Florence, Italy
![Page 26: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/26.jpg)
Sentiment Analysis of Images
Data: more than 500,000 Flickr PhotosImage Features Global Color Histogram: a color is present in the image Local Color Histogram: a color is present at a particular location SIFT Visual Terms: b/w patterns rotated and scaledImage Sentiment SentiWordNet: provides sentiment values for terms
e.g. (pos, neg, obj) = (0.875, 0.0 , 0.125) for term „good“ used for obtaining sentiment categories training set + ground truth for experiments
![Page 27: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/27.jpg)
Which are the most discriminative visual terms?
• Use Mutual Information Measure to determine these features:
• Probabilities (estimated through counting in image corpus): P(t): Probability that visual term t occurs in image P(c): Probability that image has sentiment category c („pos“ or „neg“) P(t,c): Prob. that image is in category c and has visual term t
• Intuition: „Terms that have high co-occurence with a category are more characteristic for that category.“
![Page 28: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/28.jpg)
Most Discriminative FeaturesMost discriminative visual features: Extracted using the Mutual Information measure [ACM MM’11]
![Page 29: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/29.jpg)
Part 2: Videos on the Social Web *
*Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011
Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009
![Page 30: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/30.jpg)
Near-duplicate Video Content
Youtube: most important video sharing environment [SIGCOM’07]: 85 M videos, 65 k videos/day, 100 M downloads per day,
Traffic to/from Youtube = 10% / 20% of the Web total
Redundancy: 25% of the videos are near duplicates
Can we use reduandancy to obtain richer video annotations? Automatic tagging
![Page 31: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/31.jpg)
Automatic Tagging
What is it good for? Additional information Better user experience Richer feature vectors for ...
Automatic data organization (classification and clustering)
Video Search Knowledge Extraction ( creating ontologies)
![Page 32: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/32.jpg)
Overlap Graph
Video 1
Video 3
Video 2
Video 5
Video 4
Video 1
Video 5
Video 2
Video 3
Video 4
![Page 33: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/33.jpg)
Neighbor-based Tagging (1): Idea
• Video 4 contains original tags A, B; tags F,E are obtained from neighbors
• Criteria for automatic tagging: Prefer tags used by many neighbors Prefer tags from neighbors with a strong link
Video 1 Video 2 Video 3
Video 4
ABC
AE
BEF
ABFE
automaticallygenerated
![Page 34: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/34.jpg)
Neighbor-based Tagging (2): Formal
Weights correspond to
overlap
Indicator functionSum over all
neighbors
Given: GO = (VO ;EO ) directed overlap graph
with weights w(vi ;vj ) = jvi \ vj jjvj j
Relevance of tag t for video vi :
rel(t;vi ) =P
(vj ;vi )2E OI (t;vj )w(vj ;vi )
Given: GO = (VO ;EO ) directed overlap graph
with weights w(vi ;vj ) = jvi \ vj jjvj j
Relevance of tag t for video vi :
rel(t;vi ) =P
(vj ;vi )2E OI (t;vj )w(vj ;vi )
![Page 35: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/35.jpg)
Neighbor-based Tagging (3)Apply additional smoothing for redundant regions
Number of neighbors with tag t
Subsets of neighbors
Smoothing factor
Overlap Region
rel(t;v) =X
X 2P (N (v))
k(X )¡ 1X
i=0
®i ¢
¯̄¯̄¯̄v\
\
x2X
x¡[
u2N (v)¡ X
u
¯̄¯̄¯̄
jvjrel(t;v) =
X
X 2P (N (v))
k(X )¡ 1X
i=0
®i ¢
¯̄¯̄¯̄v\
\
x2X
x¡[
u2N (v)¡ X
u
¯̄¯̄¯̄
jvj
![Page 36: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/36.jpg)
TagRank
• Takes also transitive relationships into account• PageRank-like weight propagation
rel(t;vi ) = TR(vi ;t) =X
(vj ;vi )2E O
TR(vj ;t)w(vj ;vi )
or in matrix form as Eigenvector equation
T R (t) =
0
BB@
w(v1;v1) w(v1;v2) ¢¢¢ w(v1;vn )w(v2;v1) w(v2;v2) ¢¢¢ w(v2;vn )
......
......
w(vn ;v1) w(vn ;v2) ¢¢¢ w(vn ;vn )
1
CCA
T
¢
0
BB@
TR(v1; t)TR(v2; t)
...TR(vn ; t)
1
CCA
with start vector
T R (t) =³I (t;v1);: :: ; I (t;vn)
´T
rel(t;vi ) = TR(vi ;t) =X
(vj ;vi )2E O
TR(vj ;t)w(vj ;vi )
or in matrix form as Eigenvector equation
T R (t) =
0
BB@
w(v1;v1) w(v1;v2) ¢¢¢ w(v1;vn )w(v2;v1) w(v2;v2) ¢¢¢ w(v2;vn )
......
......
w(vn ;v1) w(vn ;v2) ¢¢¢ w(vn ;vn )
1
CCA
T
¢
0
BB@
TR(v1; t)TR(v2; t)
...TR(vn ; t)
1
CCA
with start vector
T R (t) =³I (t;v1);: :: ; I (t;vn)
´T
![Page 37: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/37.jpg)
Applications of Extended Tag Respresentation
• Use relevancies rel( t, vi) for constructing enriched feature vectors for videos: combine original tags with new tags weighted by relevance values
• automatic annotation : use thresholding to select most relevant tags for a given videos Manual assessment of tags show their relavance
• Data organization: Clustering and Classification experiments (Ground truth: Youtube categories of
videos) Improved performance through enriched feature representation
![Page 38: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/38.jpg)
Summary
• Social Web contains visual information (photos, videos) and meta data (tags, time stamps, social links, spatial information, ..)
• A large variety of users provide explicit and implict feedback in social web environments (ratings, views, favorite assignments, comments, content of uploaded material)
• Visual Information & annotations can be combined to obtain enhanced feature representations
• Visual information can help to establish links between resources such as videos (application: information propagation)
• Feature representations in combination with community feedback can be used for machine learning (appliciation: classification, mapping).
![Page 39: Information Extraction from Multimedia Content on the Social Web](https://reader038.fdocuments.in/reader038/viewer/2022110211/5681334c550346895d9a5182/html5/thumbnails/39.jpg)
References
Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011
Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web
18th ACM Multimedia Conference (MM 2010), Florence, Italy
Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009
Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain
David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg Mapping the World's Photos 18th International World Wide Web Conference, WWW 2009, Madrid, Spain