Generating a time shrunk lecture video by event
-
Upload
yara-ali -
Category
Engineering
-
view
66 -
download
1
description
Transcript of Generating a time shrunk lecture video by event
1
Generating a time shrunk lecture video by event detection
Presented by: Mona Ragheb
Yara Ali
Supervised by: Dr. Aliaa Youssif
2
Agenda
Introduction Generating lecture video using virtual camera work Event detection steps Evaluation Results Conclusion References
3
Introduction
E-learning has become a popular method used in higher education.
However, video recording by a cameraman and videoediting takes a long time and costs a great deal.
To solve this problem, a system has been developedto generate dynamic lecture video using virtual camerawork from the high resolution images recorded by a HDV (high-definition video) camcorder
4
How the system works?
The system generates a lecture video:
1. Using virtual camerawork based on shooting techniques of broadcast cameramen
2. By cropping from the high resolution image to track the region of interest (ROI) such as the instructor.
3. Generate time shrunk video using event detection
Camera motion analysis is used to detect scene changes.
5
Shooting techniques
People invariably make the same sets of mistakes when they first start shooting video:
1. Trees or telephone poles sticking out of the back of someone's head
2. Interview subjects who are just darkened blurs because there was bright light in the background
3. Boring shots of buildings with no action
6
Event Detection
Two kinds of events were detected:
1. A speech period
2. Chalkboard writing period
7
Kinds of event detection
1. Speech period:
It’s detected by voice activity detection with LPC cepstrum and classified into speech or non-speech using Mahalanobis distance
8
LPC
Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.
one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters.
A spectral envelope is a curve in the frequency-amplitude plane, derived from a Fourier magnitude spectrum. It describes one point in time (one window, to be precise).
9
How LPC works?
LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz.
The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.
use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.
10
Formants
11
Kinds of event detection (Cont.)2. Chalkboard writing period It’s detected by using a graph cuts technique to
segment a precise region of interests such as an instructor.
By deleting content-free period, i.e, period without the events of speech and writing, and fast-forwarding writing periods, our method can generate a time shrunk lecture video automatically.
12
Generating lecture video using virtual camera work A HDV camcorder is located at the back of the
classroom to videotape images with high resolution (1,400 × 810 pixels), which contain the whole area of the chalkboard, so that students can read the handwritten characters on the chalkboard.
Problem: it is impossible to display the high resolution image on the small screen of a general notebook PC.
13
Generating lecture video using virtual camera work (Cont.)
14
Solution?
1. The system detects a moving object by temporal differencing
2. The timing for virtual camerawork is detected using bilateral filtering and zero crossing.
Bilateral Filter
The bilateral filter was introduced by Tomasi et al. as a non-iterative means of smoothing images while retaining edge detail.
It involves a weighted convolution in which the weight for each pixel depends not only on its distance from the center pixel, but also its relative intensity.
15
16
Bilateral Filter• (a) and (b) show the potential of bilateral filtering for the
removal of texture. The picture "simplification" illustrated by figure 2 (b) can be useful for data reduction without loss of overall shape features in applications such as image transmission, picture editing and manipulation, image description for retrieval.
17
Generating lecture video using virtual camerawork (Cont.) If the ROI has a large movement, this period of the video is classified into
panning, and if the ROI has no motion but voice activity, this period is classified into zooming.
Panning is used to show motion and speed. It is a technique that requires practice since it has to be done in one smooth, continuous motion.
18
Event detection steps
1. Voice activity detection
2. Chalkboard writing detection1. Object detection and segmentation
2. Generation of current chalkboard image
3. Chalkboard writing detection
3. Generating a time shrunk video
19
1- Voice activity detection
1- Voice activity detection (Cont.) Whenever you do a finite Fourier transform, you're
implicitly applying it to an infinitely repeating signal. So, for instance, if the start and end of your finite sample don't match then that will look just like a discontinuity in the signal, and show up as lots of high-frequency nonsense in the Fourier transform, which you don't really want.
20
1- Voice activity detection (Cont.) If your sample happens to be a beautiful sinusoid
but an integer number of periods don't happen to fit exactly into the finite sample, your FT will show appreciable energy in all sorts of places nowhere near the real frequency. You don't want any of that.
Windowing the data makes sure that the ends match up while keeping everything reasonably smooth; this greatly reduces the sort of "spectral leakage" described in the previous paragraph
21
2-Chalkboard writing detection1-Object detection and segmentation Extracting a precise object region is needed for
detecting periods of writing characters on chalkboard.
Temporal differencing ( object detection ) is robust to lighting change.
Temporal differencing can not extract all foreground pixels of moving objects so another technique is used to support ( Graph cuts technique )
23
2-Chalkboard writing detection1-Object detection and segmentation
24
2- Chalkboard writing detection2-1-Object detection and segmentation (Cont.)
Image Segmentation
Image segmentation is an important problem in computer vision and medical image analysis.
The objective of image segmentation is to provide a visually meaningful partition of the image domain. Although it is usually an easy task for human to separate background and different objects for a given image.
25
26
2- Chalkboard writing detection2-2- Generation of current chalkboard image
27
2- Chalkboard writing detection2-3- Chalkboard writing detection
28
2-3-Chalkboard writing detection (Cont.)
29
3- Generating a time shrunk video
30
3- Generating a time shrunk video
31
Evaluation
We videotaped 3 lectures (each video is 90 min long) by HDV camcorder.
In this evaluation, we use “recall” and “precision” for determining effectiveness of detection result.
Precision Vs Recall• Precision (also called positive predictive value)
• Recall (also known as sensitivity) both calculate no of correct returned in in total frames
In this figure the relevant items are to the left of the straight line while the retrieved items are within the oval. The red regions represent errors. On theleft these are the relevant items not retrieved (false negatives), while on the right they are the retrieved items that are not relevant (false positives).
33
Results
There are 200 false positive frames in a 90 min video because of the students’ voices.
34
Results
35
Results
36
Conclusion
This paper presents a novel approach for generating a time shrunk lecture video using event detection.
Our method detects speech periods by voice activity detection and chalk board writing periods by a combination of object detection and segmentation techniques.
By deleting the content-free periods and fast-forwarding the chalkboard writing periods, our method can generate a time shrunk lecture video automatically.
The resulting generated video is about 20%∼30% shorter than the original video in time. This is almost the same as the results of manual editing by a human operator.
37
References
1. http://www.vision.cs.chubu.ac.jp/04/pdf/e-learning08.pdf
2. http://research.cs.tamu.edu/prism/lectures/sp/l9.pdf
3. http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Linear_predictive_coding.html
4. http://www.ee.columbia.edu/~dpwe/e4896/lectures/E4896-L06.pdf
38
ANY QUESTIONS?
39
THANK YOU !