Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid...

37
Mozilla Pyramid Vector Quantization

Transcript of Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid...

Page 1: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla

Pyramid Vector Quantization

Mozilla2

What is Pyramid Vector Quantization

A Vector Quantizer That has a simple algebraic structure To perform gain-shape quantization

Mozilla3

Motivation

Mozilla4

Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)

ndash Space filling advantage VQ codepoints tile space more efficiently

Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB

ndash Shape advantage VQ can use more points where PDF is higher

114 dB gain for 2-D Gaussian 281 for high dimension

ndash Memory advantage exploit statistical dependence between vector components

Mozilla5

Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)

ndash Space filling advantage VQ codepoints tile space more efficiently

Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB

ndash Shape advantage VQ can use more points where PDF is higher

Can be mitigated with entropy coding

ndash Memory advantage exploit statistical dependence between vector components

Transform coefficients are not strongly correlated

Mozilla6

Why Vector Quantization Important Space advantage applies even when

values are totally uncorrelated Another important advantage

ndash Can have codebooks with less than 1 bit per dimension

Mozilla7

Why Algebraic VQ Trained VQ impractical for high rates large

dimensionsndash High dimension rarr large LUTs lots of memory

Exponential in bitrate

ndash No codebook structure rarr slow search

ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search

Space-filling lattice for arbitrary dimension unknown have to approximate

ndash PVQ asymptotically optimal for Laplacian sources

Mozilla8

Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)

ndash Vector = Magnitude times Unit Vector (point on sphere)

Potential advantagesndash Can give each piece different rate allocations

Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos

ndash Implicit activity masking Can derive quantization resolution from the explicitly

coded energy

ndash Better representation of coefficients

Mozilla9

How it Works (High-Level)

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 2: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla2

What is Pyramid Vector Quantization

A Vector Quantizer That has a simple algebraic structure To perform gain-shape quantization

Mozilla3

Motivation

Mozilla4

Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)

ndash Space filling advantage VQ codepoints tile space more efficiently

Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB

ndash Shape advantage VQ can use more points where PDF is higher

114 dB gain for 2-D Gaussian 281 for high dimension

ndash Memory advantage exploit statistical dependence between vector components

Mozilla5

Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)

ndash Space filling advantage VQ codepoints tile space more efficiently

Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB

ndash Shape advantage VQ can use more points where PDF is higher

Can be mitigated with entropy coding

ndash Memory advantage exploit statistical dependence between vector components

Transform coefficients are not strongly correlated

Mozilla6

Why Vector Quantization Important Space advantage applies even when

values are totally uncorrelated Another important advantage

ndash Can have codebooks with less than 1 bit per dimension

Mozilla7

Why Algebraic VQ Trained VQ impractical for high rates large

dimensionsndash High dimension rarr large LUTs lots of memory

Exponential in bitrate

ndash No codebook structure rarr slow search

ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search

Space-filling lattice for arbitrary dimension unknown have to approximate

ndash PVQ asymptotically optimal for Laplacian sources

Mozilla8

Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)

ndash Vector = Magnitude times Unit Vector (point on sphere)

Potential advantagesndash Can give each piece different rate allocations

Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos

ndash Implicit activity masking Can derive quantization resolution from the explicitly

coded energy

ndash Better representation of coefficients

Mozilla9

How it Works (High-Level)

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 3: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla3

Motivation

Mozilla4

Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)

ndash Space filling advantage VQ codepoints tile space more efficiently

Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB

ndash Shape advantage VQ can use more points where PDF is higher

114 dB gain for 2-D Gaussian 281 for high dimension

ndash Memory advantage exploit statistical dependence between vector components

Mozilla5

Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)

ndash Space filling advantage VQ codepoints tile space more efficiently

Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB

ndash Shape advantage VQ can use more points where PDF is higher

Can be mitigated with entropy coding

ndash Memory advantage exploit statistical dependence between vector components

Transform coefficients are not strongly correlated

Mozilla6

Why Vector Quantization Important Space advantage applies even when

values are totally uncorrelated Another important advantage

ndash Can have codebooks with less than 1 bit per dimension

Mozilla7

Why Algebraic VQ Trained VQ impractical for high rates large

dimensionsndash High dimension rarr large LUTs lots of memory

Exponential in bitrate

ndash No codebook structure rarr slow search

ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search

Space-filling lattice for arbitrary dimension unknown have to approximate

ndash PVQ asymptotically optimal for Laplacian sources

Mozilla8

Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)

ndash Vector = Magnitude times Unit Vector (point on sphere)

Potential advantagesndash Can give each piece different rate allocations

Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos

ndash Implicit activity masking Can derive quantization resolution from the explicitly

coded energy

ndash Better representation of coefficients

Mozilla9

How it Works (High-Level)

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 4: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla4

Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)

ndash Space filling advantage VQ codepoints tile space more efficiently

Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB

ndash Shape advantage VQ can use more points where PDF is higher

114 dB gain for 2-D Gaussian 281 for high dimension

ndash Memory advantage exploit statistical dependence between vector components

Mozilla5

Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)

ndash Space filling advantage VQ codepoints tile space more efficiently

Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB

ndash Shape advantage VQ can use more points where PDF is higher

Can be mitigated with entropy coding

ndash Memory advantage exploit statistical dependence between vector components

Transform coefficients are not strongly correlated

Mozilla6

Why Vector Quantization Important Space advantage applies even when

values are totally uncorrelated Another important advantage

ndash Can have codebooks with less than 1 bit per dimension

Mozilla7

Why Algebraic VQ Trained VQ impractical for high rates large

dimensionsndash High dimension rarr large LUTs lots of memory

Exponential in bitrate

ndash No codebook structure rarr slow search

ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search

Space-filling lattice for arbitrary dimension unknown have to approximate

ndash PVQ asymptotically optimal for Laplacian sources

Mozilla8

Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)

ndash Vector = Magnitude times Unit Vector (point on sphere)

Potential advantagesndash Can give each piece different rate allocations

Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos

ndash Implicit activity masking Can derive quantization resolution from the explicitly

coded energy

ndash Better representation of coefficients

Mozilla9

How it Works (High-Level)

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 5: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla5

Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)

ndash Space filling advantage VQ codepoints tile space more efficiently

Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB

ndash Shape advantage VQ can use more points where PDF is higher

Can be mitigated with entropy coding

ndash Memory advantage exploit statistical dependence between vector components

Transform coefficients are not strongly correlated

Mozilla6

Why Vector Quantization Important Space advantage applies even when

values are totally uncorrelated Another important advantage

ndash Can have codebooks with less than 1 bit per dimension

Mozilla7

Why Algebraic VQ Trained VQ impractical for high rates large

dimensionsndash High dimension rarr large LUTs lots of memory

Exponential in bitrate

ndash No codebook structure rarr slow search

ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search

Space-filling lattice for arbitrary dimension unknown have to approximate

ndash PVQ asymptotically optimal for Laplacian sources

Mozilla8

Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)

ndash Vector = Magnitude times Unit Vector (point on sphere)

Potential advantagesndash Can give each piece different rate allocations

Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos

ndash Implicit activity masking Can derive quantization resolution from the explicitly

coded energy

ndash Better representation of coefficients

Mozilla9

How it Works (High-Level)

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 6: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla6

Why Vector Quantization Important Space advantage applies even when

values are totally uncorrelated Another important advantage

ndash Can have codebooks with less than 1 bit per dimension

Mozilla7

Why Algebraic VQ Trained VQ impractical for high rates large

dimensionsndash High dimension rarr large LUTs lots of memory

Exponential in bitrate

ndash No codebook structure rarr slow search

ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search

Space-filling lattice for arbitrary dimension unknown have to approximate

ndash PVQ asymptotically optimal for Laplacian sources

Mozilla8

Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)

ndash Vector = Magnitude times Unit Vector (point on sphere)

Potential advantagesndash Can give each piece different rate allocations

Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos

ndash Implicit activity masking Can derive quantization resolution from the explicitly

coded energy

ndash Better representation of coefficients

Mozilla9

How it Works (High-Level)

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 7: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla7

Why Algebraic VQ Trained VQ impractical for high rates large

dimensionsndash High dimension rarr large LUTs lots of memory

Exponential in bitrate

ndash No codebook structure rarr slow search

ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search

Space-filling lattice for arbitrary dimension unknown have to approximate

ndash PVQ asymptotically optimal for Laplacian sources

Mozilla8

Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)

ndash Vector = Magnitude times Unit Vector (point on sphere)

Potential advantagesndash Can give each piece different rate allocations

Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos

ndash Implicit activity masking Can derive quantization resolution from the explicitly

coded energy

ndash Better representation of coefficients

Mozilla9

How it Works (High-Level)

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 8: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla8

Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)

ndash Vector = Magnitude times Unit Vector (point on sphere)

Potential advantagesndash Can give each piece different rate allocations

Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos

ndash Implicit activity masking Can derive quantization resolution from the explicitly

coded energy

ndash Better representation of coefficients

Mozilla9

How it Works (High-Level)

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 9: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla9

How it Works (High-Level)

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 10: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla10

Simple Case PVQ without a Predictor

Scalar quantize gain Place K unit pulses in N dimensions

ndash Up to N = 1024 dimensions for large blocks

ndash Only has N-1 degrees of freedom

Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 11: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla11

Codebook for N=3 anddifferent K

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 12: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla12

PVQ vs Scalar Quantization

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 13: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla13

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the

prediction as ldquospecialrdquondash They are much more likely

Subtracting and coding the residual would lose energy preservation

Solution align the codebook axes with the prediction and treat one dimension differently

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 14: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla14

2-D Projection Example

Input

Input

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 15: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla15

2-D Projection Example

Prediction

Input

Input + Prediction

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 16: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla16

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 17: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla17

2-D Projection Example

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 18: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla18

2-D Projection Example

θ

Prediction

Input

Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 19: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla19

2-D Projection Example Input + Prediction Compute Householder

Reflection Apply Reflection Compute amp

code angle Code other

dimensions

Prediction

Input

θ

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 20: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla20

What does this accomplish Creates another ldquointuitiverdquo parameter θ

ndash ldquoHow much like the predictor are werdquo

ndash θ = 0 rarr use predictor exactly

θ determines how many pulses go in the ldquopredictionrdquo direction

ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down

Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)

ndash Can repeat for more predictors

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 21: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla21

Details

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 22: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla22

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands

ndash Gain theta etc signaled separately for each band

ndash Layout ad-hoc for now

Scan order in each band optimized for decreasing average variance

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 23: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla23

Band Structure

Scan order is possibly over-fit

4x48x8

16x16

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 24: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla24

To Predict or Not to Predict θ gt π2 rarr Prediction not helping

ndash Could code large θrsquos but doesnrsquot seem that useful

ndash Need to handle zero predictors anyway

Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with

fixed order-0 probability per band (5 of KF rate)

ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 25: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla25

Quantization Matrix Simple approach (what wersquore doing now)

ndash Separate quantization resolution for each band Keep flat quantization within bands

Advanced approachndash Scaling after normalization complicated

Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further

ndash Better() Pre-scale vector by quantization factors

ndash Effects on energy preservation

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 26: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla26

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 27: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla27

Quantization Matrix Example

Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Metrics +15 PSNR +12 SSIM -18 PSNR-HVS

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 28: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla28

Activity Masking Goal Use better resolution in flat areas

ndash Low contrast rarr low energy (gain)

ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx

Currently wrongincomplete working on updates

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 29: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla29

Activity Masking Step 1 Compand gain (g)

ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)

ndash Quantize ĝ = (Qgĥ)β encode ĥ

β = 1(1-2α)

Qg = (Qβ)β

ndash Offset steps so at least one value of ĥ gives same gain as the prediction

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 30: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla30

Activity Masking cotd Step 2 Choose θ resolution

ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)

asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ

ndash Qθ = (dĝdĥ)ĝ = βĥ

Make sure Qθ evenly divides π2

When ĝ is small force Qθ = π2

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 31: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla31

Activity Masking cotd Step 3 Choose K

ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D

pvq)

Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant

Dpvq

= distortion due to PVQ on last N ndash 1 dimensions

ndash Distortion due to scalar quantizing gain (dĝdĥ)212

ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion

ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 32: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla32

Loss Robustness K asymp (ĥβ) sin ϑ radicN2

ndash ĥ is offset by the companded reference gain so can be wrong if there are losses

ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream

Remove dependence on ĥ

ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ

θ) = (ϑQ

θ)β

ndash (ϑQθ) is the index encoded in the bitstream

Since Qθ not exact canrsquot cap ϑ le π2 in bitstream

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 33: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla33

Inter-band Masking ĝ is per-band but traditional activity masking is

per-blockndash Could just sum ĝ over all bands

ndash Actual model is that energy in one band masks energy in another

Lower bands appear to mask higher but not other way around

Still very early not much is tuned

ndash ρ = (ĝh

2(ĝh

2 + ηĝl

2))α η controls amount of masking

ρĥ then used to derive Qθ and K instead of ĥ

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 34: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla34

Calibration Activity masking always increases rate

ndash Scale base quantizer in each band to reduce rate Q = Q

0L(1β ndash 1)

L is the maximum luma value

ndash Just an approximation seems to work okay

ndash AM currently disabled for chroma

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 35: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla35

Activity Masking Example

No activity masking (base Q=23)

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 36: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla36

Activity Masking Example

Activity masking (base Q=23)

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
Page 37: Pyramid Vector Quantization - Xiph.orgtterribe/daala/pvq201404.pdf · 2 Mozilla What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform

Mozilla37

Open Issues Better entropy coding

ndash Everything order-0

ndash Take advantage of correlation in gainθnorefetc

Better RDOndash Currently iterating over small range of gains θs

ndash Rate estimates very approximate

Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc

  • Slide 1
  • Slide 2
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37