Video Fingerprinting and Applications: A Review

31
Video Fingerprinting and Applications: a review Jian Lu Vobile, Inc. Media Forensics & Security Conference EI’09, San Jose, CA

description

This presentation reviews the development in video fingerprinting technology in the past decade and its applications in content identification.

Transcript of Video Fingerprinting and Applications: A Review

Page 1: Video Fingerprinting and Applications: A Review

Video Fingerprinting and Applications: a review Jian Lu Vobile, Inc.

Media Forensics & Security Conference EI’09, San Jose, CA

Page 2: Video Fingerprinting and Applications: A Review

From Research to Applications

1999 1999 2008

Page 3: Video Fingerprinting and Applications: A Review

What’s Video Fingerprinting

•  A video fingerprint is a unique identifier extracted from video content –  Video fingerprints are often just string of bits,

representing some “signatures” of the video content, and usually not in fixed length.

–  Video fingerprinting refers to the process of extracting fingerprints from the video content.

–  Comparing to watermarking, fingerprinting does not add to or alter video content.

–  Also known as “robust hashing”, “perceptual hashing”, “content-based copy detection (CBCD)” in research literature.

Page 4: Video Fingerprinting and Applications: A Review

Human vs. Video Fingerprint

Human Fingerprint Video Fingerprint

Uniquely identify human Uniquely identify video

Physical form Digital form

Pictorial Time-based binary

Page 5: Video Fingerprinting and Applications: A Review

Identification by Fingerprint

Video identification

Human identification

Page 6: Video Fingerprinting and Applications: A Review

Video Fingerprinting Algorithms

Page 7: Video Fingerprinting and Applications: A Review

Desired Properties

•  Robust –  Largely invariant for the same content under various

types of processing, conversion, and manipulation. •  Discriminating

–  Distinctly different for different content.

•  Compact –  Low data rate

•  Low complexity –  Fast fingerprint generation and matching

Page 8: Video Fingerprinting and Applications: A Review

Type of Video Signatures

Spatial Signatures

Temporal Signatures

Color Signatures

Transform-D Signatures

Granularity Whole frame Group of

frames Bins of histograms

3D transforms on GOP

Blocks or other types of subdivision

Down-sampled frames

Frame transforms

Points of interest Key frames

Every frame

Page 9: Video Fingerprinting and Applications: A Review

Variants of Spatial Signatures

•  Block-based – Quantized mean block intensity – Luminance block patterns ✪

•  ordinal ranking of average block intensity

– Differential luminance block patterns ✪ •  Centroid of gradient orientations •  Dominant edge orientation

•  Points-of-interest – Corner features (Harris points) – Scale-space features

Page 10: Video Fingerprinting and Applications: A Review

An Example of Spatial Signature

Page 11: Video Fingerprinting and Applications: A Review

Variants of Temporal Signatures

•  Temporal luminance patterns –  Ordinal ranking of average frame or block intensity in

a group of frames •  Temporal differential luminance patterns ✪

–  Sum of absolute pixel or block difference – quantized and thresholded

–  Block motion vectors – histogram of quantized directions

•  Shot duration sequence

Page 12: Video Fingerprinting and Applications: A Review

Color Signatures

•  Histogram-based – Level-quantized histogram, e.g., (32, 16, 16)

for Y, U, V, followed by magnitude quantization on each bin ✪

– Level-quantized histogram, followed by ordinal ranking of histogram bins by magnitude

Page 13: Video Fingerprinting and Applications: A Review

Transform-Domain Signatures

•  Affine transformation resilient – Polar Fourier transform – Radon transform ✪ – Singular Value Decomposition

•  Energy compaction – 3D DCT – 3D Wavelet transform

Page 14: Video Fingerprinting and Applications: A Review

Which One to Use?

•  Spatial signatures, particularly block-based, are the overall category winner, and most widely used.

•  Temporal and color signatures are less robust, but can be used along with spatial signatures to enhance discriminability.

•  Transform-domain signatures are computationally expensive and not widely used in practice.

•  The weakness of block-based spatial signatures is their lack of resilience against excessive geometric distortion, e.g., rotation and cropping.

Page 15: Video Fingerprinting and Applications: A Review

Challenges of Geometric Distortions

Original

Rotation by 10 degrees Rotation + Cropping

Page 16: Video Fingerprinting and Applications: A Review

Fingerprinting performance

•  Video fingerprint using block-based spatial signatures – Data size: a few hundreds bits per frame or

<10 Kbps – Speed: 1/10 playback time (10x RT) or faster

for standard-def video.

Page 17: Video Fingerprinting and Applications: A Review

Fingerprint Matching and Search

Page 18: Video Fingerprinting and Applications: A Review

Similarity Measures

•  Distance-based ✪ –  L1 (Manhattan) or L2 (Euclidean) distance

•  For non-binary signatures •  Weights can be assigned when multiple signatures are used

–  Hamming Distance •  For binary signatures

•  Probability-based

–  Probabilistic models for common distortion vectors

Page 19: Video Fingerprinting and Applications: A Review

Complexity of Fingerprint Search

•  Exhaustive search has linear complexity, or O(K*N) –  N is the size of reference fingerprint DB, in minutes or

hours. –  K is length of the query video. –  N can be further decomposed into M*L

•  M is number of reference video fingerprints in DB •  L is the average length of video fingerprints in DB

•  The curse is on N or M, the DB size.

Page 20: Video Fingerprinting and Applications: A Review

Strategies for Fast Search

Strategies Fingerprint Search Motion Vector Search

Reduce search space ✪ LSH

Greedy search Sequential alignment Hierarchical search

Early exit Hamming distance > T SAD > T

Approximation in distance calculation Frame down-sampling Block down-sampling

Page 21: Video Fingerprinting and Applications: A Review

Locality Sensitive Hashing (LSH)

•  Consider ε-NNS problem, –  For a query point q, find an approximate point p such

that d(q,p) < (1+ε) d(q,P) –  LSH guarantees p can be found, with high probability,

in O(N1/(1+ε)) •  Geometric reasoning:

–  Close points in space are likely to be close after hashing (e.g., a projection onto a lower dimensional space)

–  By using multiple hash functions, the probability of close points falling close is increased

Page 22: Video Fingerprinting and Applications: A Review

Other Approximation Techniques

•  Multi-resolution coarse-to-fine search –  Fine-level search can be terminated (early exit) if

coarse-level search is far off. –  Rank candidates by coarse-level search scores and

take only top N candidates for fine-level search. •  Adaptive hashing – “learning to hashing”

–  Hashing is non-deterministic; system is trained to adapt to identification task and data.

–  A substantial reduction in search space.

Page 23: Video Fingerprinting and Applications: A Review

Applications

Page 24: Video Fingerprinting and Applications: A Review

UGC & P2P – copyright concerns?

•  UGC Traffic in 07/2007 (Source: comScore, November 30, 2007) –  70 million people viewed 2.5 billion videos on YouTube.com (39.4% of total

UGC audience) –  38 million people viewed 360 million videos on MySpace.com (22.6% of total

UGC audience)

•  P2P Traffic 2007 (Source: iPoque, November 28, 2007) –  Average 50-60% total Internet traffic: 49% in Middle East; 83% in Eastern

Europe. –  BitTorrent 66.7%, eDonkey 28.6% of total P2P traffic

P2P UGC

Page 25: Video Fingerprinting and Applications: A Review

Video Content Registration

•  A reference video fingerprint database is pre-populated.

•  Two types of information are stored with video fingerprint data in the reference database –  Metadata, e.g., title, owner, release date, etc. –  Business rules, e.g., allow, filter, or advertise, possibly

based on certain conditions •  MovieLabs’ Content Recognition Rules (CRR) is an industry

standard interface for expressing and exchanging rules.

Page 26: Video Fingerprinting and Applications: A Review

Video Content Filtering

Page 27: Video Fingerprinting and Applications: A Review

Video Content Tracking

Page 28: Video Fingerprinting and Applications: A Review

Example: Video Content Tracking

Page 29: Video Fingerprinting and Applications: A Review

Tracking Olympic Video Distribution

Page 30: Video Fingerprinting and Applications: A Review

Other Applications

•  Broadcast monitoring –  Audit TV program and commercial airings

•  Contextual Ads (monetization) –  Pair ads with identified content like Google AdSense

•  Video asset management –  Content-based IDs identify linkage between edits and

sources

•  Content-based video search –  Query by video clip

Page 31: Video Fingerprinting and Applications: A Review

Summary

•  Research in video fingerprinting began a decade ago; it had developed into a technology and been adopted by the industry.

•  Different types of signatures are used to form a video fingerprint, including spatial, temporal, color, and transform-domain signatures.

•  Spatial signatures are overall winner judged by multiple criteria, and widely adopted as primary signatures; temporal and color signatures can be used as secondary signatures to enhance discriminability.

•  Brute-force, exhaustive fingerprint search is an O(K*N) problem. •  Fast approximate algorithms make fingerprint search tractable and

scalable for practical applications.

•  Current applications focus on copyright enforcement, other applications being developed and experimented include contextual advertising, asset management, and content-based video search.