Fingerprinting - Audio and Video Detection Systems
Transcript of Fingerprinting - Audio and Video Detection Systems
![Page 1: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/1.jpg)
1/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
FingerprintingAudio and Video Detection Systems
Speaker: Matteo Pozzetti
Supervisor: Laura Peer
01 April 2015
![Page 2: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/2.jpg)
2/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Overview
1 Introduction
2 Audio Fingerprinting
3 Video Fingerprinting
4 Conclusion
![Page 3: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/3.jpg)
3/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Introduction
![Page 4: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/4.jpg)
4/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting
Content based fingerprinting:
Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)
Used to retrieve such an object in a database
Applications:
Search engines (Shazam)Copyright protection (YouTube)
Requirements
A fingerprint algorithm must be noise resistant andcomputationally efficient.
![Page 5: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/5.jpg)
4/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting
Content based fingerprinting:
Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)
Used to retrieve such an object in a database
Applications:
Search engines (Shazam)Copyright protection (YouTube)
Requirements
A fingerprint algorithm must be noise resistant andcomputationally efficient.
![Page 6: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/6.jpg)
4/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting
Content based fingerprinting:
Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)
Used to retrieve such an object in a database
Applications:
Search engines (Shazam)Copyright protection (YouTube)
Requirements
A fingerprint algorithm must be noise resistant andcomputationally efficient.
![Page 7: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/7.jpg)
4/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting
Content based fingerprinting:
Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)
Used to retrieve such an object in a database
Applications:
Search engines (Shazam)
Copyright protection (YouTube)
Requirements
A fingerprint algorithm must be noise resistant andcomputationally efficient.
![Page 8: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/8.jpg)
4/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting
Content based fingerprinting:
Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)
Used to retrieve such an object in a database
Applications:
Search engines (Shazam)Copyright protection (YouTube)
Requirements
A fingerprint algorithm must be noise resistant andcomputationally efficient.
![Page 9: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/9.jpg)
4/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting
Content based fingerprinting:
Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)
Used to retrieve such an object in a database
Applications:
Search engines (Shazam)Copyright protection (YouTube)
Requirements
A fingerprint algorithm must be noise resistant andcomputationally efficient.
![Page 10: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/10.jpg)
5/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Table of Contents
1 Introduction
2 Audio FingerprintingIntroductionHashingSearching and ScoringResults
3 Video Fingerprinting
4 Conclusion
![Page 11: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/11.jpg)
6/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Introduction
![Page 12: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/12.jpg)
7/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
![Page 13: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/13.jpg)
9/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Process
![Page 14: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/14.jpg)
10/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Goals
The length of the recordedsample should not be morethan 15 seconds long
The sample can come fromany portion of the originalaudio track
Robust
![Page 15: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/15.jpg)
10/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Goals
The length of the recordedsample should not be morethan 15 seconds long
The sample can come fromany portion of the originalaudio track
Robust
0 2 4 6 8 10 12 14 16 18 20−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Time, s
Nor
mal
ized
am
plitu
de
The signal in the time domain
![Page 16: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/16.jpg)
10/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Goals
The length of the recordedsample should not be morethan 15 seconds long
The sample can come fromany portion of the originalaudio track
Robust
![Page 17: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/17.jpg)
11/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Hashing
![Page 18: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/18.jpg)
12/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Transformations to the frequency domainFourier transform
Fourier transform: maps the signal x(t) : R→ C into
X (f ) =
∫ ∞∞
x(t)e−j2πftdt
which decomposes the function into its frequencies.
![Page 19: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/19.jpg)
12/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Transformations to the frequency domainFourier transform
Fourier transform: maps the signal x(t) : R→ C into
X (f ) =
∫ ∞∞
x(t)e−j2πftdt
which decomposes the function into its frequencies.
![Page 20: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/20.jpg)
13/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Short-time Fourier Transform
STFT is a Fourier-related transform used to determine thesinusoidal frequency and phase content of local sections of a signalas it changes over time.
x(t)STFT7−−−→ X (τ, f ) =
∫ ∞−∞
x(t)w(t − τ)e−j2πftdt
![Page 21: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/21.jpg)
13/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Short-time Fourier Transform
STFT is a Fourier-related transform used to determine thesinusoidal frequency and phase content of local sections of a signalas it changes over time.
x(t)STFT7−−−→ X (τ, f ) =
∫ ∞−∞
x(t)w(t − τ)e−j2πftdt
![Page 22: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/22.jpg)
14/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Constellation and Hashing
![Page 23: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/23.jpg)
14/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Constellation and Hashing
![Page 24: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/24.jpg)
14/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Constellation and Hashing
![Page 25: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/25.jpg)
14/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Constellation and Hashing
![Page 26: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/26.jpg)
15/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Hashing
Each anchor point is sequentially paired with points within itstarget zone.
(f1, f2,∆t)→ hash
30 bits hash:
10 bits frequency anchor
10 bits frequency target
10 bits time difference
Each hash is associated with thetime offset:
hash:offset = (f1, f2,∆t) : t1
![Page 27: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/27.jpg)
15/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Hashing
Each anchor point is sequentially paired with points within itstarget zone.
(f1, f2,∆t)→ hash
30 bits hash:
10 bits frequency anchor
10 bits frequency target
10 bits time difference
Each hash is associated with thetime offset:
hash:offset = (f1, f2,∆t) : t1
![Page 28: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/28.jpg)
15/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Hashing
Each anchor point is sequentially paired with points within itstarget zone.
(f1, f2,∆t)→ hash
30 bits hash:
10 bits frequency anchor
10 bits frequency target
10 bits time difference
Each hash is associated with thetime offset:
hash:offset = (f1, f2,∆t) : t1
![Page 29: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/29.jpg)
16/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching and Scoring
![Page 30: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/30.jpg)
17/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching and Scoring
We take all matching hashes of a given song and we plot their timeoffset
DB song Query songh:t h:τ
![Page 31: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/31.jpg)
17/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching and Scoring
We take all matching hashes of a given song and we plot their timeoffset
DB song Query songh:t h:τ
![Page 32: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/32.jpg)
17/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching and Scoring
We take all matching hashes of a given song and we plot their timeoffset
DB song Query songh:t h:τ
![Page 33: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/33.jpg)
17/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching and Scoring
We take all matching hashes of a given song and we plot their timeoffset
DB song Query songh:t h:τ
![Page 34: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/34.jpg)
18/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Scoring
Matching features: tk = τk + constant
For each (tk , τk) in the scatter-plot, we calculate:
∆k = tk − τk
Then we plot a histogram of these ∆k
![Page 35: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/35.jpg)
18/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Scoring
Matching features: tk = τk + constant
For each (tk , τk) in the scatter-plot, we calculate:
∆k = tk − τk
Then we plot a histogram of these ∆k
![Page 36: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/36.jpg)
18/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Scoring
Matching features: tk = τk + constant
For each (tk , τk) in the scatter-plot, we calculate:
∆k = tk − τk
Then we plot a histogram of these ∆k
![Page 37: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/37.jpg)
19/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
ScoringTrack that does not match the sample
Figure : No diagonal in the scatter-plot.
![Page 38: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/38.jpg)
20/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
ScoringTrack that matches the sample
Figure : Diagonal in the scatter-plot: cluster in the histogram.
![Page 39: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/39.jpg)
21/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
![Page 40: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/40.jpg)
22/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
The algorithm performs well with significant levels of noise
For a database of 20 about thousand tracks the search time isin the order of 5− 500 milliseconds.
![Page 41: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/41.jpg)
22/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
The algorithm performs well with significant levels of noise
For a database of 20 about thousand tracks the search time isin the order of 5− 500 milliseconds.
![Page 42: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/42.jpg)
23/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Table of Contents
1 Introduction
2 Audio Fingerprinting
3 Video FingerprintingIntroductionPreprocessingDiscrete cosine transform3D-DCTFingerprinting using TIRIsSearchingResults
4 Conclusion
![Page 43: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/43.jpg)
24/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Introduction
![Page 44: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/44.jpg)
25/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Introduction
Copy detection system
The purpose of this system is todetect copyrights violations.
A fingerprint should be
Robust to content-preservingdistortions
Discriminant
Easy to compute
Compact
![Page 45: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/45.jpg)
25/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Introduction
Copy detection system
The purpose of this system is todetect copyrights violations.
A fingerprint should be
Robust to content-preservingdistortions
Discriminant
Easy to compute
Compact
![Page 46: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/46.jpg)
25/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Introduction
Copy detection system
The purpose of this system is todetect copyrights violations.
A fingerprint should be
Robust to content-preservingdistortions
Discriminant
Easy to compute
Compact
![Page 47: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/47.jpg)
25/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Introduction
Copy detection system
The purpose of this system is todetect copyrights violations.
A fingerprint should be
Robust to content-preservingdistortions
Discriminant
Easy to compute
Compact
![Page 48: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/48.jpg)
25/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Introduction
Copy detection system
The purpose of this system is todetect copyrights violations.
A fingerprint should be
Robust to content-preservingdistortions
Discriminant
Easy to compute
Compact
![Page 49: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/49.jpg)
26/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Image and Video Representations
RBG
Colors represented as acombination of red, greenand blue (3D-space)Each component is aninteger value in [0, 255]
YUV
Y is the luminancecomponent, U and V arethe chrominancecomponents.
![Page 50: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/50.jpg)
26/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Image and Video Representations
RBG
Colors represented as acombination of red, greenand blue (3D-space)
Each component is aninteger value in [0, 255]
YUV
Y is the luminancecomponent, U and V arethe chrominancecomponents.
![Page 51: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/51.jpg)
26/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Image and Video Representations
RBG
Colors represented as acombination of red, greenand blue (3D-space)Each component is aninteger value in [0, 255]
YUV
Y is the luminancecomponent, U and V arethe chrominancecomponents.
![Page 52: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/52.jpg)
26/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Image and Video Representations
RBG
Colors represented as acombination of red, greenand blue (3D-space)Each component is aninteger value in [0, 255]
YUV
Y is the luminancecomponent, U and V arethe chrominancecomponents.
![Page 53: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/53.jpg)
26/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Image and Video Representations
RBG
Colors represented as acombination of red, greenand blue (3D-space)Each component is aninteger value in [0, 255]
YUV
Y is the luminancecomponent, U and V arethe chrominancecomponents.
![Page 54: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/54.jpg)
27/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Preprocessing
![Page 55: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/55.jpg)
28/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Preprocessing
![Page 56: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/56.jpg)
29/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Discrete cosine transform
![Page 57: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/57.jpg)
30/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Discrete cosine transform
The DCT is similar to the discrete Fourier transform: ittransforms a signal or image from the spatial domain to thefrequency domain.
It is a widely used tool in image and video processing andcompression!
It is possible to extract some relevant information from thefrequency representation
![Page 58: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/58.jpg)
30/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Discrete cosine transform
The DCT is similar to the discrete Fourier transform: ittransforms a signal or image from the spatial domain to thefrequency domain.
It is a widely used tool in image and video processing andcompression!
It is possible to extract some relevant information from thefrequency representation
![Page 59: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/59.jpg)
30/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Discrete cosine transform
The DCT is similar to the discrete Fourier transform: ittransforms a signal or image from the spatial domain to thefrequency domain.
It is a widely used tool in image and video processing andcompression!
It is possible to extract some relevant information from thefrequency representation
![Page 60: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/60.jpg)
31/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Discrete cosine transformFormula
The two-dimensional DCT of an M-by-N matrix A is defined asfollow:
Bpq = αpαq
M−1∑m=0
N−1∑n=0
Amn cosπ(2m + 1)p
2Mcos
π(2n + 1)q
2N
αp =
1√M, p = 0√
2M , 1 ≤ p ≤ M − 1
αq =
1√N, q = 0√
2N , 1 ≤ q ≤ M − 1
where 0 ≤ p ≤ M − 1 and 0 ≤ q ≤ N − 1 are the matrix indexes.
![Page 61: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/61.jpg)
32/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Discrete cosine transformExample
![Page 62: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/62.jpg)
34/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
3D-DCT
![Page 63: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/63.jpg)
35/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
3D Discrete cosine transform
3D-DCT is a DCT applied to a 3-dimensional matrix.
Time domain Frequency domain
![Page 64: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/64.jpg)
35/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
3D Discrete cosine transform
3D-DCT is a DCT applied to a 3-dimensional matrix.Time domain Frequency domain
![Page 65: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/65.jpg)
36/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting using 3D-DCT
Binary fingerprints are derived bythresholding the low-frequencycoefficients of the transform; thethreshold is the median value ofthe selected coefficients.
![Page 66: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/66.jpg)
36/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting using 3D-DCT
Binary fingerprints are derived bythresholding the low-frequencycoefficients of the transform; thethreshold is the median value ofthe selected coefficients.
![Page 67: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/67.jpg)
36/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting using 3D-DCT
Binary fingerprints are derived bythresholding the low-frequencycoefficients of the transform; thethreshold is the median value ofthe selected coefficients.
![Page 68: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/68.jpg)
37/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Fingerprinting using TIRIs
![Page 69: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/69.jpg)
38/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
TIRITemporally Informative Representative Image
A TIRI contains spatial and temporal information of a shortsegment of a video sequence.
l ′m,n =J∑
k=1
wk lm,n,k
where lm,n,k is the luminancevalue of the (m,n)th pixel in aset of J frames and wk is aweighting factor.
![Page 70: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/70.jpg)
38/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
TIRITemporally Informative Representative Image
A TIRI contains spatial and temporal information of a shortsegment of a video sequence.
l ′m,n =J∑
k=1
wk lm,n,k
where lm,n,k is the luminancevalue of the (m,n)th pixel in aset of J frames and wk is aweighting factor.
![Page 71: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/71.jpg)
39/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
TIRIWeighting factor
Experiments were made with different w : constant, linear,exponential.
Figure : (d) wk = 1 (costant), (e) wk = k (linear), (f) wk = 1.2k (exponential)
![Page 72: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/72.jpg)
40/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
TIRI-DCT
![Page 73: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/73.jpg)
41/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching
![Page 74: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/74.jpg)
42/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching requirements
After the fingerprint is extracted, the database is searched for theclosest fingerprint.
Nearest neighbor problem
Fingerprints of two different copies of the same video are similarbut not necessarily identical. We then search for the most similarfingerprint in the binary space.
Since the database is very large, the algorithm must be fast. A fastapproximate algorithm is preferred over an exact slow one.
Hamming distance
The process used to create binary fingerprints allow us to use theHamming distance as a metric of similarity.
![Page 75: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/75.jpg)
42/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching requirements
After the fingerprint is extracted, the database is searched for theclosest fingerprint.
Nearest neighbor problem
Fingerprints of two different copies of the same video are similarbut not necessarily identical. We then search for the most similarfingerprint in the binary space.
Since the database is very large, the algorithm must be fast. A fastapproximate algorithm is preferred over an exact slow one.
Hamming distance
The process used to create binary fingerprints allow us to use theHamming distance as a metric of similarity.
![Page 76: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/76.jpg)
42/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching requirements
After the fingerprint is extracted, the database is searched for theclosest fingerprint.
Nearest neighbor problem
Fingerprints of two different copies of the same video are similarbut not necessarily identical. We then search for the most similarfingerprint in the binary space.
Since the database is very large, the algorithm must be fast. A fastapproximate algorithm is preferred over an exact slow one.
Hamming distance
The process used to create binary fingerprints allow us to use theHamming distance as a metric of similarity.
![Page 77: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/77.jpg)
42/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Searching requirements
After the fingerprint is extracted, the database is searched for theclosest fingerprint.
Nearest neighbor problem
Fingerprints of two different copies of the same video are similarbut not necessarily identical. We then search for the most similarfingerprint in the binary space.
Since the database is very large, the algorithm must be fast. A fastapproximate algorithm is preferred over an exact slow one.
Hamming distance
The process used to create binary fingerprints allow us to use theHamming distance as a metric of similarity.
![Page 78: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/78.jpg)
43/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Inverted-File-Based Similarity search
0110001︸ ︷︷ ︸1st word
0010110︸ ︷︷ ︸2nd word
. . . 1110001︸ ︷︷ ︸(n − 1)th word
0010101︸ ︷︷ ︸nth word
m = word length
n =L
m
Position of a word in a fingerprint1 2 3 . . . n-1 n
11, 44,250
10, 14,99
21, 250
2 2, 865, 170,241
.... . .
...2m . . . 61, 26 4
![Page 79: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/79.jpg)
43/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Inverted-File-Based Similarity search
0110001︸ ︷︷ ︸1st word
0010110︸ ︷︷ ︸2nd word
. . . 1110001︸ ︷︷ ︸(n − 1)th word
0010101︸ ︷︷ ︸nth word
m = word length
n =L
m
Position of a word in a fingerprint1 2 3 . . . n-1 n
11, 44,250
10, 14,99
21, 250
2 2, 865, 170,241
.... . .
...2m . . . 61, 26 4
![Page 80: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/80.jpg)
44/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Inverted-File-Based Similarity search
To find a query fingerprint in a database:
The fingerprint is divided into n words (of m bits)
The query is compared with fingerprints that starts with thesame word.
If the Hamming distance is below a certain threshold, thematch is found.
Otherwise the procedure is repeated with the second word andso on.
Position of a word in a fingerprint1 2 3 . . . n-1 n
1 1, 44, 250 10, 14, 99 21, 2502 2, 86 5, 170, 241
.
.
.. . .
.
.
.2m . . . 61, 26 4
![Page 81: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/81.jpg)
44/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Inverted-File-Based Similarity search
To find a query fingerprint in a database:
The fingerprint is divided into n words (of m bits)
The query is compared with fingerprints that starts with thesame word.
If the Hamming distance is below a certain threshold, thematch is found.
Otherwise the procedure is repeated with the second word andso on.
Position of a word in a fingerprint1 2 3 . . . n-1 n
1 1, 44, 250 10, 14, 99 21, 2502 2, 86 5, 170, 241
.
.
.. . .
.
.
.2m . . . 61, 26 4
![Page 82: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/82.jpg)
44/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Inverted-File-Based Similarity search
To find a query fingerprint in a database:
The fingerprint is divided into n words (of m bits)
The query is compared with fingerprints that starts with thesame word.
If the Hamming distance is below a certain threshold, thematch is found.
Otherwise the procedure is repeated with the second word andso on.
Position of a word in a fingerprint1 2 3 . . . n-1 n
1 1, 44, 250 10, 14, 99 21, 2502 2, 86 5, 170, 241
.
.
.. . .
.
.
.2m . . . 61, 26 4
![Page 83: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/83.jpg)
44/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Inverted-File-Based Similarity search
To find a query fingerprint in a database:
The fingerprint is divided into n words (of m bits)
The query is compared with fingerprints that starts with thesame word.
If the Hamming distance is below a certain threshold, thematch is found.
Otherwise the procedure is repeated with the second word andso on.
Position of a word in a fingerprint1 2 3 . . . n-1 n
1 1, 44, 250 10, 14, 99 21, 2502 2, 86 5, 170, 241
.
.
.. . .
.
.
.2m . . . 61, 26 4
![Page 84: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/84.jpg)
45/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Cluster-based similarity search
The database is clustered into K non-overlapping groups; eachcluster has a centroid called cluster head.
The cluster head closest to the query fingerprint is found
0100001︸ ︷︷ ︸0
dist0 = 2
0010110︸ ︷︷ ︸0
dist0 = 3
1010111︸ ︷︷ ︸1
dist1 = 2
0000001︸ ︷︷ ︸0
dist0 = 1
1111101︸ ︷︷ ︸1
dist1 = 1
Cluster head = 00101
Every fingerprint belonging to this cluster are searched to finda match
If no match is found, then the cluster that is the secondclosest to the query is examined, then the third and so on
![Page 85: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/85.jpg)
45/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Cluster-based similarity search
The database is clustered into K non-overlapping groups; eachcluster has a centroid called cluster head.
The cluster head closest to the query fingerprint is found
0100001︸ ︷︷ ︸0
dist0 = 2
0010110︸ ︷︷ ︸0
dist0 = 3
1010111︸ ︷︷ ︸1
dist1 = 2
0000001︸ ︷︷ ︸0
dist0 = 1
1111101︸ ︷︷ ︸1
dist1 = 1
Cluster head = 00101
Every fingerprint belonging to this cluster are searched to finda match
If no match is found, then the cluster that is the secondclosest to the query is examined, then the third and so on
![Page 86: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/86.jpg)
45/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Cluster-based similarity search
The database is clustered into K non-overlapping groups; eachcluster has a centroid called cluster head.
The cluster head closest to the query fingerprint is found
0100001︸ ︷︷ ︸0
dist0 = 2
0010110︸ ︷︷ ︸0
dist0 = 3
1010111︸ ︷︷ ︸1
dist1 = 2
0000001︸ ︷︷ ︸0
dist0 = 1
1111101︸ ︷︷ ︸1
dist1 = 1
Cluster head = 00101
Every fingerprint belonging to this cluster are searched to finda match
If no match is found, then the cluster that is the secondclosest to the query is examined, then the third and so on
![Page 87: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/87.jpg)
45/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Cluster-based similarity search
The database is clustered into K non-overlapping groups; eachcluster has a centroid called cluster head.
The cluster head closest to the query fingerprint is found
0100001︸ ︷︷ ︸0
dist0 = 2
0010110︸ ︷︷ ︸0
dist0 = 3
1010111︸ ︷︷ ︸1
dist1 = 2
0000001︸ ︷︷ ︸0
dist0 = 1
1111101︸ ︷︷ ︸1
dist1 = 1
Cluster head = 00101
Every fingerprint belonging to this cluster are searched to finda match
If no match is found, then the cluster that is the secondclosest to the query is examined, then the third and so on
![Page 88: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/88.jpg)
46/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
![Page 89: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/89.jpg)
47/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
A database of 200 videos was created
Videos were attacked (distorted) to create query videos
Metrics
Recall It’s the true positive rate. It is a measure of robustness.
Recall =TP
TP + FN
Precision Percentage of correct hits within all detected copies.
Precision =TP
TP + FP
F-score It is the harmonic mean of precision and recall:
F = 2precision· recall
precision + recall
![Page 90: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/90.jpg)
47/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
A database of 200 videos was created
Videos were attacked (distorted) to create query videos
Metrics
Recall It’s the true positive rate. It is a measure of robustness.
Recall =TP
TP + FN
Precision Percentage of correct hits within all detected copies.
Precision =TP
TP + FP
F-score It is the harmonic mean of precision and recall:
F = 2precision· recall
precision + recall
![Page 91: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/91.jpg)
47/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
A database of 200 videos was created
Videos were attacked (distorted) to create query videos
Metrics
Recall It’s the true positive rate. It is a measure of robustness.
Recall =TP
TP + FN
Precision Percentage of correct hits within all detected copies.
Precision =TP
TP + FP
F-score It is the harmonic mean of precision and recall:
F = 2precision· recall
precision + recall
![Page 92: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/92.jpg)
47/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
A database of 200 videos was created
Videos were attacked (distorted) to create query videos
Metrics
Recall It’s the true positive rate. It is a measure of robustness.
Recall =TP
TP + FN
Precision Percentage of correct hits within all detected copies.
Precision =TP
TP + FP
F-score It is the harmonic mean of precision and recall:
F = 2precision· recall
precision + recall
![Page 93: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/93.jpg)
48/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
Attack F-score3D-DCT TIRI-DCT
Noise 1.00 0.99
Brightness 1.00 0.99
Contrast 1.00 0.99
Rotation 1.00 1.00
Time shift 0.91 0.98
Spatial shift 1.00 0.98
Frame loss 0.98 0.99
Average 0.99 0.99
Speed
TIRI-DCT is more than 3times faster than 3D-DCT
Approximate searchalgorithms are much fasterthan the exhaustive searchwhen the error rates are low.Cluster-based approach isfaster.
![Page 94: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/94.jpg)
48/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
Attack F-score3D-DCT TIRI-DCT
Noise 1.00 0.99
Brightness 1.00 0.99
Contrast 1.00 0.99
Rotation 1.00 1.00
Time shift 0.91 0.98
Spatial shift 1.00 0.98
Frame loss 0.98 0.99
Average 0.99 0.99
Speed
TIRI-DCT is more than 3times faster than 3D-DCT
Approximate searchalgorithms are much fasterthan the exhaustive searchwhen the error rates are low.Cluster-based approach isfaster.
![Page 95: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/95.jpg)
48/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
Attack F-score3D-DCT TIRI-DCT
Noise 1.00 0.99
Brightness 1.00 0.99
Contrast 1.00 0.99
Rotation 1.00 1.00
Time shift 0.91 0.98
Spatial shift 1.00 0.98
Frame loss 0.98 0.99
Average 0.99 0.99
Speed
TIRI-DCT is more than 3times faster than 3D-DCT
Approximate searchalgorithms are much fasterthan the exhaustive searchwhen the error rates are low.
Cluster-based approach isfaster.
![Page 96: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/96.jpg)
48/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Results
Attack F-score3D-DCT TIRI-DCT
Noise 1.00 0.99
Brightness 1.00 0.99
Contrast 1.00 0.99
Rotation 1.00 1.00
Time shift 0.91 0.98
Spatial shift 1.00 0.98
Frame loss 0.98 0.99
Average 0.99 0.99
Speed
TIRI-DCT is more than 3times faster than 3D-DCT
Approximate searchalgorithms are much fasterthan the exhaustive searchwhen the error rates are low.Cluster-based approach isfaster.
![Page 97: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/97.jpg)
49/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Table of Contents
1 Introduction
2 Audio Fingerprinting
3 Video Fingerprinting
4 Conclusion
![Page 98: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/98.jpg)
50/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Shazam
The algorithm is noise resistant and efficient. It can identify ashort segment of music out of a database of over a milliontracks. However the algorithm is not meant to be resistant togeneral attacks (e.g time compression, pitch alterations, . . . ).
Shazam was founded in 1999 and launched its firstidentification service in 2002
Shazam has now 400 million users in 200 countries and wasused to identify 15 billion songs
The value of the company is now half a billion dollars
![Page 99: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/99.jpg)
50/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Shazam
The algorithm is noise resistant and efficient. It can identify ashort segment of music out of a database of over a milliontracks. However the algorithm is not meant to be resistant togeneral attacks (e.g time compression, pitch alterations, . . . ).
Shazam was founded in 1999 and launched its firstidentification service in 2002
Shazam has now 400 million users in 200 countries and wasused to identify 15 billion songs
The value of the company is now half a billion dollars
![Page 100: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/100.jpg)
50/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Shazam
The algorithm is noise resistant and efficient. It can identify ashort segment of music out of a database of over a milliontracks. However the algorithm is not meant to be resistant togeneral attacks (e.g time compression, pitch alterations, . . . ).
Shazam was founded in 1999 and launched its firstidentification service in 2002
Shazam has now 400 million users in 200 countries and wasused to identify 15 billion songs
The value of the company is now half a billion dollars
![Page 101: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/101.jpg)
50/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Shazam
The algorithm is noise resistant and efficient. It can identify ashort segment of music out of a database of over a milliontracks. However the algorithm is not meant to be resistant togeneral attacks (e.g time compression, pitch alterations, . . . ).
Shazam was founded in 1999 and launched its firstidentification service in 2002
Shazam has now 400 million users in 200 countries and wasused to identify 15 billion songs
The value of the company is now half a billion dollars
![Page 102: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/102.jpg)
51/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Video fingerprinting
The proposed system is robust and it is able to find a match ina database of tens of millions of fingerprints in a few seconds
It yields a high average true positive rate of 97.6% and a lowaverage false positive rate of 1%.
![Page 103: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/103.jpg)
51/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Video fingerprinting
The proposed system is robust and it is able to find a match ina database of tens of millions of fingerprints in a few seconds
It yields a high average true positive rate of 97.6% and a lowaverage false positive rate of 1%.
![Page 104: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/104.jpg)
52/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Youtube testing
Results of Scott Smitelli tests:
Modification Performed Result
Song reversed Pass
Pitch was lowered 6% Pass
Pitch was lowered 5% Fail
Pitch was raised 5% Fail
Pitch was raised 6% Pass
Time was expanded 6% Pass
Time was expanded 5% Fail
Time was compressed 5% Fail
Time was compressed 6% Pass
44% noise, 56% song Fail
45% noise, 55% song Pass
Modification Performed Result
Volume was reduced 48 dB Fail
Volume was reduced 18 dB Fail
Volume was reduced 6 dB Fail
Volume was increased 6 dB Fail
Volume was increased 18 dB Fail
Volume was increased 48 dB Fail
Slow down 4% Pass
Slow down 3% Fail
Speed up 3% Fail
Speed up 4% Fail
Speed up 5% Pass
![Page 105: Fingerprinting - Audio and Video Detection Systems](https://reader033.fdocuments.in/reader033/viewer/2022050923/6276635e6d135549f251fb9b/html5/thumbnails/105.jpg)
53/53
Introduction Audio Fingerprinting Video Fingerprinting Conclusion
Questions?