Story Compression: Aggregating News...

1
Story Compression: Aggregating News Feeds Joseph W. Barker Advisor: James W. Davis Ohio State University What is Story Compression? News broadcasts from multiple sources tend to cover same stories Stories have content overlap General content covered by multiple sources Specific content covered by one source Information gathering Waste time if view all broadcasts (general content → redundancy) Miss information if only view one broadcast (specific content) Answer: Story Compression Detect general vs. specific content and create single story from all broadcasts with no redundancy Overview Divide story into content segments (i.e., single idea) Video shot (continuous scene) detection Compare segments Speech/text contains most of the informational content Word similarity → Segment Similarity Detect specific vs. general segments Word Similarity Focus on concepts rather than specific word matching Graph-based hierarchy of word-concept relationships E.g., WordNet Malik et. al 2007 1 , 2 = 2∙(, 1 , 2 ) , 1 +(, 2 ) Li et. al 2003 1 , 2 = 1 , 2 tanh( , 1 , 2 ) Feline Mammal Canine Poodle Object Cat Segment Similarity Sentence similarity? Segments range from sub-sentence to multiple sentences Also, sentence boundaries (when multiple) poorly defined Sentence similarity emphasizes grammar/word order; won’t work If ordering is problematic, use unordered groups instead Solution: Graph collapsing Group of nodes collapsed to single node by summing edge weights Inspired by spectral clustering and notion of random walk on graphs Random walk between groups equivalent to random walk between collapsed nodes Segment Similarity Word Similarity Most Unique Segments Manual segmentation employed Specific content Uniqueness → overall dissimilarity Perfect dissimilarity → similarity matrix rows/columns zero except for diagonal Thus, sum of row/column should approach zero for most dissimilar segments Most Related Segments General content Related → group self- similar Perfect self-similarity → similarity matrix elements for group all one Thus, sum of elements should approach 2 ( =number in group) 0 10 20 30 40 50 60 70 80 90 100 3.3 3.35 3.4 3.45 3.5 3.55 3.6 3.65 3.7 3.75 3.8 Segment Pair Similarity (higher is better) Similarity Segment pairs (sorted) 0 5 10 15 20 25 30 35 40 45 0.014 0.016 0.018 0.02 0.022 0.024 0.026 0.028 0.03 0.032 Segment Uniqueness (lower better) Uniqueness Segments (sorted) Perfect dissimilarity Somewhat dissimilar Perfect similarity Somewhat similar Automatic Segment Detection How to decide boundaries between segments? No sentence boundaries, so text not strong indicator Shot detection: Detect visual change from one scene to another Common techniques: Temporal extent Consecutive: compare sequential pairs of frames Key frame: compare to “key” frame of previous segment Distance measures Pixel-based: Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), Normalized Cross-Correlation (NCC) Color-based (histograms): χ 2, Bhattacharyya Texture-based: Scale Invariant Feature Transform (SIFT) Towards Improving Segment Detection Common methods give mediocre performance May be due to only examining single temporal extent Possible solution: Use graph collapsing to examine all temporal extents simultaneously Sum of blocks on diagonal approaches 2 if members in segment Sum of block anti-diagonal approaches zero if corner is segment boundary Current problem: Scale of valleys (boundaries) varies quadratically with segment size, simple peak finding not good enough 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Shot Detection: Key Frame (First) Normalized threshold (1 = perfect match) F score SAD SSD NCC SIFT-MR BATTA-H16 CHI2-H16 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Shot Detection: Consecutive Normalized threshold (1 = perfect match) F score SAD SSD NCC SIFT-MR BATTA-H16 CHI2-H16 Method F TP FP FN SAD 0.747 0.596 0.081 0.322 SSD 0.746 0.595 0.044 0.362 NCC 0.770 0.626 0.009 0.365 BATTA-H16 0.779 0.638 0.125 0.237 CHI2-H16 0.210 0.117 0.005 0.878 0 2000 4000 6000 8000 10000 12000 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 Frame Anti-diagonal Sum Conclusion and Future Work Graph collapsing can be used to derive group similarity from similarity of group members Additionally, can be used to evaluate uniqueness of objects, relatedness of groups Tested with text, working on video Future work Finalize graph collapsing video segmentation Expand word similarity to include multiple languages Investigate sub-image feature extraction/matching Examine other sources (e.g., YouTube) “…declaring a public health emergency….” “…declaring a public health emergency….” ABC NBC #1) “…after the virus killed….” “…sadly had claimed 18 lives….” NBC CBS #2) “…declaring a public health emergency….” “…to repeat, declared a public health emergency….” ABC NBC #3) ABC CBS “…they’ve set up a special tent….” “…a tent has been setup….” #4) “In Boston today, the mayor sounded the alarm” ABC #1) “…moved onto the upper respiratory, which is a lot of coughing…” ABC #2) “…stay home when you are sick…” ABC #3) “…I’ve never been hit by a Mack truck…” ABC #4) “…is on the panel that decides what goes in the vaccine…” CBS #5) “…after confirmed cases of flu reach 700…” CBS #6) Consecutive Shot Detection Across All Stories Shot Detection on story FLU Video similarity Sum of diagonal blocks Frame Block Start Block End ABC CBS NBC

Transcript of Story Compression: Aggregating News...

Page 1: Story Compression: Aggregating News Feedsweb.cse.ohio-state.edu/~barker.348/cvl/barker_20150220_final.pdf · Differences (SAD), Sum of Squared Differences (SSD), Normalized Cross-Correlation

Story Compression: Aggregating News Feeds Joseph W. Barker

Advisor: James W. Davis Ohio State University

What is Story Compression? • News broadcasts from multiple sources tend to cover same stories • Stories have content overlap – General content covered by multiple sources – Specific content covered by one source

• Information gathering – Waste time if view all broadcasts (general content → redundancy) – Miss information if only view one broadcast (specific content)

• Answer: Story Compression – Detect general vs. specific content and create single story from all

broadcasts with no redundancy

Overview • Divide story into content segments (i.e., single idea) – Video shot (continuous scene) detection

• Compare segments – Speech/text contains most of the informational content – Word similarity → Segment Similarity

• Detect specific vs. general segments

Word Similarity

• Focus on concepts rather than specific word matching

• Graph-based hierarchy of word-concept relationships

– E.g., WordNet

• Malik et. al 2007

– 𝑠𝑖𝑚 𝑤1, 𝑤2 =2∙𝑑𝑖𝑠𝑡(𝑟𝑜𝑜𝑡,𝐿𝐶𝑆 𝑤1,𝑤2 )

𝑑𝑖𝑠𝑡 𝑟𝑜𝑜𝑡,𝑤1 +𝑑𝑖𝑠𝑡(𝑟𝑜𝑜𝑡,𝑤2)

• Li et. al 2003

– 𝑠𝑖𝑚 𝑤1, 𝑤2 =

𝑒−𝛼 𝑑𝑖𝑠𝑡 𝑤1,𝑤2 tanh (𝛽 𝑑𝑖𝑠𝑡 𝑟𝑜𝑜𝑡, 𝐿𝐶𝑆 𝑤1, 𝑤2 )

Feline

Mammal

Canine

Poodle

Object

Cat

Segment Similarity • Sentence similarity? – Segments range from sub-sentence to

multiple sentences – Also, sentence boundaries (when multiple)

poorly defined – Sentence similarity emphasizes

grammar/word order; won’t work

• If ordering is problematic, use unordered groups instead

• Solution: Graph collapsing – Group of nodes collapsed to single node by

summing edge weights – Inspired by spectral clustering and notion

of random walk on graphs – Random walk between groups equivalent

to random walk between collapsed nodes

Segment Similarity

Word Similarity

Most Unique Segments • Manual segmentation

employed • Specific content • Uniqueness → overall

dissimilarity • Perfect dissimilarity →

similarity matrix rows/columns zero except for diagonal

• Thus, sum of row/column should approach zero for most dissimilar segments

Most Related Segments • General content • Related → group self-

similar • Perfect self-similarity →

similarity matrix elements for group all one

• Thus, sum of elements should approach 𝑛2 (𝑛=number in group)

0 10 20 30 40 50 60 70 80 90 1003.3

3.35

3.4

3.45

3.5

3.55

3.6

3.65

3.7

3.75

3.8Segment Pair Similarity (higher is better)

Sim

ilarity

Segment pairs (sorted)

0 5 10 15 20 25 30 35 40 450.014

0.016

0.018

0.02

0.022

0.024

0.026

0.028

0.03

0.032

Segment Uniqueness (lower better)

Uniq

ueness

Segments (sorted)

Perfect dissimilarity Somewhat dissimilar

Perfect similarity Somewhat similar

Automatic Segment Detection • How to decide boundaries

between segments? – No sentence boundaries, so text

not strong indicator • Shot detection: Detect visual

change from one scene to another

• Common techniques: – Temporal extent

• Consecutive: compare sequential pairs of frames

• Key frame: compare to “key” frame of previous segment

– Distance measures • Pixel-based: Sum of Absolute

Differences (SAD), Sum of Squared Differences (SSD), Normalized Cross-Correlation (NCC)

• Color-based (histograms): χ2, Bhattacharyya

• Texture-based: Scale Invariant Feature Transform (SIFT)

Towards Improving Segment Detection • Common methods give mediocre

performance • May be due to only examining single

temporal extent • Possible solution: Use graph

collapsing to examine all temporal extents simultaneously

• Sum of blocks on diagonal approaches 𝑛2 if members in segment

• Sum of block anti-diagonal approaches zero if corner is segment boundary

• Current problem: Scale of valleys (boundaries) varies quadratically with segment size, simple peak finding not good enough

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Shot Detection: Key Frame (First)

Normalized threshold (1 = perfect match)

F s

core

SAD

SSD

NCC

SIFT-MR

BATTA-H16

CHI2-H16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Shot Detection: Consecutive

Normalized threshold (1 = perfect match)

F s

core

SAD

SSD

NCC

SIFT-MR

BATTA-H16

CHI2-H16

Method F TP FP FN

SAD 0.747 0.596 0.081 0.322

SSD 0.746 0.595 0.044 0.362

NCC 0.770 0.626 0.009 0.365

BATTA-H16 0.779 0.638 0.125 0.237

CHI2-H16 0.210 0.117 0.005 0.878

0 2000 4000 6000 8000 10000 120000.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

Frame

Anti-diagonal Sum

Conclusion and Future Work • Graph collapsing can be used to derive group similarity from

similarity of group members • Additionally, can be used to evaluate uniqueness of objects,

relatedness of groups – Tested with text, working on video

• Future work – Finalize graph collapsing video segmentation – Expand word similarity to include multiple languages – Investigate sub-image feature extraction/matching – Examine other sources (e.g., YouTube)

“…declaring a public health emergency….”

“…declaring a public health emergency….”

ABC NBC

#1)

“…after the virus killed….” “…sadly had claimed 18 lives….”

NBC

CBS

#2)

“…declaring a public health emergency….”

“…to repeat, declared a public health emergency….”

ABC NBC

#3)

ABC

CBS

“…they’ve set up a special tent….”

“…a tent has been setup….”

#4)

“In Boston today, the mayor sounded the alarm”

ABC

#1)

“…moved onto the upper respiratory, which is a lot of coughing…”

ABC

#2)

“…stay home when you are sick…”

ABC

#3)

“…I’ve never been hit by a Mack truck…”

ABC

#4)

“…is on the panel that decides what goes in the vaccine…”

CBS

#5)

“…after confirmed cases of flu reach 700…”

CBS

#6)

Consecutive Shot Detection Across All Stories

Sho

t D

etec

tio

n o

n s

tory

FLU

Video similarity

Sum of diagonal blocks

Fram

e B

lock

Sta

rt

Block End

AB

C

CB

S

NB

C