Step 1. Semi-supervised

13
Step 1. Semi-supervised • Given a region, where a primitive event happens Given the beginning and end time of each instance of the primitive event

description

Step 1. Semi-supervised. Given a region, where a primitive event happens. Given the beginning and end time of each instance of the primitive event. Two resolutions: Far field (encoding using trajectory) Near and middle field (represent using appearance). Near/Mid Field. - PowerPoint PPT Presentation

Transcript of Step 1. Semi-supervised

Page 1: Step 1. Semi-supervised

Step 1. Semi-supervised

• Given a region, where a primitive event happens

• Given the beginning and end time of each instance of the primitive event

Page 2: Step 1. Semi-supervised

• Two resolutions:– Far field (encoding using trajectory)– Near and middle field (represent using

appearance)

Page 3: Step 1. Semi-supervised

Near/Mid Field

In this region, two types of primitive event are shown as exemplar: vehicle drive and human walk

Page 4: Step 1. Semi-supervised
Page 5: Step 1. Semi-supervised

Far Field

Page 6: Step 1. Semi-supervised

Basic event in far field

Page 7: Step 1. Semi-supervised

Learn motion from exemplar clips (A toy example)

Challenges:• Variety object appearance (human in different clothes)• Varity in background clutter, lighting change• Small disturbation in object’s position and/or scale• Small different in motion speed.

lighting

Motion clips to learn from

clothing, position, scale background fast unstable camera

Page 8: Step 1. Semi-supervised

Learning motion templates (A toy example)

y

x

t

Local maximum: Perturbation over a small spatial-temporal neighborhood. [dx, dy, dt, dScale]

Motion 1 (faster)

Motion 3 (slower )

A motion template is learned key-frame by key-frame. At each key-frame, a static active basis template is learned by pooling over the images of the “same” gesture from multiple videos. Since motion has variance in position, speed and scale, the correspondence is found by local max over a small spatial-temporal neighborhood .

frame 5

frame 10

frame 15

frame 1

frame 7

frame 13

frame 19

Frame 5 motion1 correspond to frame 7 of motion 3, since motion 2 is slower.

Pooling over the “same” gesture

frame 1

Page 9: Step 1. Semi-supervised

Results (A toy example)

Page 10: Step 1. Semi-supervised

• 本页预备增加一个动画上面先给一条公共时空轨迹下面给出对应不同 instance 的动画例子

Page 11: Step 1. Semi-supervised

• Zhenyu, at the beginning, we want to use the object in each 5 (or more) frames, and to align them in the spatial space using EM active basis, now I think we should align them in the spatial-temporal space. 最简单的情况就是比如一个 instance 用 10 秒另一个用 6 秒,采2 个样应该第一个每隔 5 秒,第二个每隔 3秒

• 稍微复杂一点的情况是,比如完成一个转弯动作,一个车进弯时快,出去慢,另一个进时慢,出去快

Page 12: Step 1. Semi-supervised

• 而且还有一个问题就是既然是用 coding 来做,那么选择中间的代表帧应该选择变化严重的地方选,就像Chellappa 的一篇文章,他把轨迹分成了许多可以用线性方程描述的片段。我们的代表性元就应该是这个线性段的中断点 – 学习这种选择会学出一些很有趣的东西,这个使我想起了给你发的论文 PPT 里面 CMU 的工作,他是手动的根据人的先验把动作分块,咱们这个可能将这个过程变成自动的(包括形状的复杂变化和轨迹的中间变化)

• 这样两种方法的实现方法上的不同就体现出来了,利用active basis 作为 contour 的方法可能利用不同物体在对应帧的 Active Basis 轮廓的对应更加有效,而不是利用同一物体帧间的 correspondence 。而基于轨迹的方法因为需要在不同物体之间匹配的是轨迹,所以实际操作时需要使用的是同一物体帧间的 correspondence

Page 13: Step 1. Semi-supervised

Step2. Unsupervised